Optimized pow() approximation for Java and C / C++
I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more
In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:
Approximation of pow() in Java
public static double pow(final double a, final double b) {
final int x = (int) (Double.doubleToLongBits(a) >> 32);
final int y = (int) (b * (x - 1072632447) + 1072632447);
return Double.longBitsToDouble(((long) y) << 32);
}
This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That’s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. metaheuristics or neural nets.
I use Linux, Java 1.6.0-b105 with the server VM, and execute the benchmark with this command:
sudo nice -n -20 java -cp . -server PowTest
The approximation is 27 times faster than Math.pow() on my Pentium-M. On a Pentium 4 it is 41 times faster. Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark PowTest.java and have a look, I have tried to prevent overoptimization while still having a low overhead.
Approximation of pow() in C and C++
double pow(double a, double b) {
int tmp = (*(1 + (int *)&a));
int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
double p = 0.0;
*(1 + (int * )&p) = tmp2;
return p;
}
Compiled on my Pentium-M with gcc 4.1.2:
gcc -O3 -march=pentium-m -fomit-frame-pointer -fno-strict-aliasing
This version is 7.8 times faster than pow() from the standard library.
WARNING! you HAVE to use the -fno-strict-aliasing option, or this does not work!
How the Approximation was Developed
It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.
Approximation of e^x
As described here, the paper “A Fast, Compact Approximation of the Exponential Function” develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate e^x. This macro can be transformed into Java code straightforward, which looks like this:
public static double exp(double val) {
final long tmp = (long) (1512775 * val + (1072693248 - 60801));
return Double.longBitsToDouble(tmp << 32);
}
Use Exponential Functions for a^b
Thanks to the power of math, we know that a^b can be transformed like this:
- Take exponential
a^b = e^(ln(a^b))
- Extract b
a^b = e^(ln(a)*b)
Now we have expressed the pow calculation with e^x and ln(x). We already have the e^x approximation, but no good ln(x). The old approximation is very bad, so we need a better one. So what now?
Approximation of ln(x)
Here comes the big trick: Rember that we have the nice e^x approximation? Well, ln(x) is exactly the inverse function! That means we just need to transform the above approximation so that the output of e^x is transformed back into the original input.
That’s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:
final double tmp = (Double.doubleToLongBits(val) >> 32);
Now solve the equation
tmp = (1512775 * val + (1072693248 - 60801))
for val:
- The original formula
tmp = (1512775 * val + (1072693248 - 60801))
- Perform subtraction
tmp = 1512775 * val + 1072632447
- Bring value to other side
tmp - 1072632447 = 1512775 * val
- Divide by factor
(tmp - 1072632447) / 1512775 = val
- Finally, val on the left side
val = (tmp - 1072632447) / 1512775
VoĆla, now we have a nice approximation of ln(x):
public double ln(double val) {
final double x = (Double.doubleToLongBits(val) >> 32);
return (x - 1072632447) / 1512775;
}
Combine Both Approximations
Finally we can combine the two approximations into e^(ln(a) * b):
public static double pow1(final double a, final double b) {
// calculate ln(a)
final double x = (Double.doubleToLongBits(a) >> 32);
final double ln_a = (x - 1072632447) / 1512775;
// ln(a) * b
final double tmp1 = ln_a * b;
// e^(ln(a) * b)
final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
return Double.longBitsToDouble(tmp2 << 32);
}
Between the two shifts, we can simply insert the tmp1 calculation into the tmp2 calculation to get
public static double pow2(final double a, final double b) {
final double x = (Double.doubleToLongBits(a) >> 32);
final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
return Double.longBitsToDouble(tmp2 << 32);
}
Now simplify tmp2 calculation:
- The original formula
tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))
- We can drop the factor 1512775
tmp2 = (x - 1072632447) * b + (1072693248 - 60801)
- And finally, calculate the substraction
tmp2 = b * (x - 1072632447) + 1072632447
The Result
That’s it! Add some casts, and the complete function is the same as above.
public static double pow(final double a, final double b) {
final int tmp = (int) (Double.doubleToLongBits(a) >> 32);
final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
return Double.longBitsToDouble(((long) tmp2) << 32);
}
This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence
Please post what you think about this!
Tags: C++, floating point, java, optimization, programming12 Comments
Trackbacks
- things to look at (October 4th - October 6th) | stimulant
- Martin Ankerl » Blog Archive » Optimized Exponential Functions for Java


October 5th, 2007 at 12:52 am
Would it be possible to use this with float value instead of doubles?
October 5th, 2007 at 4:39 am
I dont think so, the whole trick is based on the bit layout of the double numbers. So you need to find a very fast e^x approximation that works with floats, then you can do the rest again.
October 5th, 2007 at 7:20 am
Another benchmark: on a Pentium 4, Windows, Java 1.6.0 -server, the approximation is about 41 times faster than Math.pow()
October 5th, 2007 at 11:03 am
Not bad! But: if I want to compute a pow(), in which cases it might be ok to get an error up to 25%? Could you provide us some examples?
October 5th, 2007 at 11:19 am
In my experience the error gets bigger when the exponent gets bigger. I have written a short program that uses random numbers to try to find a combination that gets very bad errors. Base is a random double between 0 and 1000, exponent random double between 0 and 5. The biggest error after 100 million evaluations is:
a between 0 and 1000, b between 0 and 5:
Worst case:
a=512.0125338006894
b=4.914054794454942
a^b approximation: 2.5571362865152E13
a^b Math.pow(): 2.0585114388503066E13
Errors is 19.499345822682237 %
Average error is 4.021374964371438 %
—–
a between 0 and 100, b between 0 and 3:
Worst case:
a=64.00103767757574
b=2.8915318496742626
a^b approximation: 191223.125
a^b Math.pow(): 166973.39656298532
Error is 12.681378592162784 %
Average error is 2.7778168699408558 %
October 5th, 2007 at 3:23 pm
Mr Skeptic, I am using this (a modified form) in an Ant Colony Optimization algorithm. There the core algorithms needs to calculate a^b * c^d very often. I have done comparison of the optimization quality when using this optimization and when using Math.pow(), the error does not have any measurable negative effect in the examples I am using. And the optimization got a hell of a lot faster, so even if there is a slight decrease in optimization quality this is more than compensated by the higher number of iterations I am able to do now.
October 26th, 2007 at 1:46 pm
Hi,
I need have a procedure which can handle 50! and 1/50! , that is 50 factorial and 1 divided by 50 factorial….
Is there a way to handle these huge numbers
October 27th, 2007 at 8:09 am
In Java use BigInteger and BigDecimal:
http://javadoc.ankerl.com/results.html?cx=006156709672261707051%3Atrd_lbkwu2y&cof=FORID%3A9&q=bigdecimal&sa=Search#1356
February 13th, 2008 at 9:42 pm
Martin, that test program is interesting—so you ran it with random values in that range, iterated 100 million times, and chose the highest error… clever. What I also found interesting was where the base was the most off: right near the largest power of 2 in the range, but not right on it. Something to do with the bitshifting? It was almost creepy:
a between 0 and 1000, b between 0 and 5:
Worst case:
a=512.0125338006894 // why so close, but just a bit above?
b=4.914054794454942 //close to highest, but again, not quite?
a^b approximation: 2.5571362865152E13
a^b Math.pow(): 2.0585114388503066E13
Errors is 19.499345822682237 %
Just found it interesting. There’s a ton of applications for not-as-accurate, but way-faster calculations, particularly as the errors can even themselves out over many iterations. Plenty of “fuzzy” models out there that could use tricks like these, and indeed, the element of error adds a (pseudo) randomness, assuming the exact errors of all values aren’t well-known.
–CJ
February 13th, 2008 at 11:50 pm
Hi seejayjames, the behaviour of the error is really a bit strange. I think it is a bit off because of the way the constants were selected (read the paper for how they did it). This should minimize the average error.
But the 19.5% error does not look like such a big problem to me, because 512^4.9 is about 2*10^13, which is a very large number that might be out of the useful range for most usages.
I have used this a^b version as the core part in a research project at work, and it gives a huge speedup.