Mostly thanks to this reddit discussion, I have updated my pow() approximation for C / C++. I have now two different versions:

1 2 3 4 5 6 7 8 9 |
inline double fastPow(double a, double b) { union { double d; int x[2]; } u = { a }; u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447); u.x[0] = 0; return u.d; } |

This new code uses the union trick, instead of the weird casting trick I’ve used before. This means that `-fno-strict-aliasing` is no more required any more when compiling, and it is also a bit faster because one less temporary variables is needed. When you have a little endian machine, you have to exchange u.x[0] and u.x[1]. On my PC, this version is 4.2 times faster than the much more precise pow().

Besides that, I also have now a slower approximation that has much less error when the exponent is larger than 1. It makes use exponentiation by squaring, which is exact for the integer part of the exponent, and uses only the exponent’s fraction for the approximation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
// should be much more precise with large b inline double fastPrecisePow(double a, double b) { // calculate approximation with fraction of the exponent int e = (int) b; union { double d; int x[2]; } u = { a }; u.x[1] = (int)((b - e) * (u.x[1] - 1072632447) + 1072632447); u.x[0] = 0; // exponentiation by squaring with the exponent's integer part // double r = u.d makes everything much slower, not sure why double r = 1.0; while (e) { if (e & 1) { r *= a; } a *= a; e >>= 1; } return r * u.d; } |

This code is 3.3 times faster than pow(). Writing a microbenchmark is not easy, so I have posted mine here. Here is also a Java version of the more accurate pow approximation.

Any ideas how this could be improved? Please post them!

11 Comments on "Optimized Approximative pow() in C / C++"

[…] I was hunting for a faster one. There are quite some interesting variations around in the net, like one for doubles which is quite some impressive a bit-hackery, reminds me of Quake’s sqrt(). The […]

The new approximation does not work with negative exponents because the loop will never end. Is the first approximation valid for negative exponents?

It does not work with negative constants, but you can use this trick:

a^(-b) = 1/(a^b)

so when b is negative, just calculate

1.0/pow(a, abs(b))

I have seen some people use the old fashion Taylor aproximations in they calculus but there are some better algorithms.

Don’t they use some asm and some tables for that

[…] ?????? double? ????? ??? ?? ??? Optimized Approximative pow() in C / C++ | Martin Ankerl? fastPow ??? ???? ???, ??? […]

You could modify the function like this so it can handle negative exponents:

inline double CompOgdenRobust::fastPrecisePow(double a, double b)

{

if (b >= 1;

}

return r * u.d;

}

apologies I can’t seem to post the code in the comments without it messing up

I referenced your article in my Stackoverflow answer here: http://stackoverflow.com/a/16782797/1708801

As I noted there, type punning through a union is formally undefined behavior in C++. Although many compilers support it with well defined behavior, this is not universal as we can see from this article: http://blog.regehr.org/archives/959

[…] The code below is updated with using union, you do not need -fno-strict-aliasing any more for compiling. Also, here is a more precise version of the approximation. […]

Great implementation … I would like to use it on a microcontroller, but instead of doubles i need floats, do you have any suggestions of how your function needs to be adapted to calculate pow with floats and 32b (or 16b) integers?

Regards …

Working with very (very) large numbers–BigRational–I was able to shave off time by adding in a “if exponent >= 19 { /*result * itself 19 times*/ }” chunk, which seems ideal (5, 11, 17, 23, and 31 are all slower). On 17^40000 the difference is 86ms (220 with the 19-stack vs. 306 without). Almost certainly due to the reduced creation of objects and garbage collection. On smaller (sane) values, it’s probably slower, but only by slim margins and considering what I’m working on, I’d rather have those significant gains on the excessively large scale.