Optimized pow() approximation for Java, C / C++, and C#
I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more
In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:
Approximation of pow() in Java
public static double pow(final double a, final double b) {
final int x = (int) (Double.doubleToLongBits(a) >> 32);
final int y = (int) (b * (x - 1072632447) + 1072632447);
return Double.longBitsToDouble(((long) y) << 32);
}
This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That’s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. metaheuristics or neural nets.
I use Linux, Java 1.6.0-b105 with the server VM, and execute the benchmark with this command:
sudo nice -n -20 java -cp . -server PowTest
The approximation is 27 times faster than Math.pow() on my Pentium-M. On a Pentium 4 it is 41 times faster. Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark PowTest.java and have a look, I have tried to prevent overoptimization while still having a low overhead.
Approximation of pow() in C and C++
double pow(double a, double b) {
int tmp = (*(1 + (int *)&a));
int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
double p = 0.0;
*(1 + (int * )&p) = tmp2;
return p;
}
Compiled on my Pentium-M with gcc 4.1.2:
gcc -O3 -march=pentium-m -fomit-frame-pointer -fno-strict-aliasing
This version is 7.8 times faster than pow() from the standard library.
WARNING! you HAVE to use the -fno-strict-aliasing option, or this does not work!
Approximation of pow() in C#
Jason Jung has posted a port of the this code to C#:
public static double PowerA(double a, double b) {
int tmp = (int)(BitConverter.DoubleToInt64Bits(a) >> 32);
int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
return BitConverter.Int64BitsToDouble(((long)tmp2) << 32);
}
How the Approximation was Developed
It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.
Approximation of e^x
As described here, the paper “A Fast, Compact Approximation of the Exponential Function” develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate e^x. This macro can be transformed into Java code straightforward, which looks like this:
public static double exp(double val) {
final long tmp = (long) (1512775 * val + (1072693248 - 60801));
return Double.longBitsToDouble(tmp << 32);
}
Use Exponential Functions for a^b
Thanks to the power of math, we know that a^b can be transformed like this:
- Take exponential
a^b = e^(ln(a^b))
- Extract b
a^b = e^(ln(a)*b)
Now we have expressed the pow calculation with e^x and ln(x). We already have the e^x approximation, but no good ln(x). The old approximation is very bad, so we need a better one. So what now?
Approximation of ln(x)
Here comes the big trick: Rember that we have the nice e^x approximation? Well, ln(x) is exactly the inverse function! That means we just need to transform the above approximation so that the output of e^x is transformed back into the original input.
That’s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:
final double tmp = (Double.doubleToLongBits(val) >> 32);
Now solve the equation
tmp = (1512775 * val + (1072693248 - 60801))
for val:
- The original formula
tmp = (1512775 * val + (1072693248 - 60801))
- Perform subtraction
tmp = 1512775 * val + 1072632447
- Bring value to other side
tmp - 1072632447 = 1512775 * val
- Divide by factor
(tmp - 1072632447) / 1512775 = val
- Finally, val on the left side
val = (tmp - 1072632447) / 1512775
Voíla, now we have a nice approximation of ln(x):
public double ln(double val) {
final double x = (Double.doubleToLongBits(val) >> 32);
return (x - 1072632447) / 1512775;
}
Combine Both Approximations
Finally we can combine the two approximations into e^(ln(a) * b):
public static double pow1(final double a, final double b) {
// calculate ln(a)
final double x = (Double.doubleToLongBits(a) >> 32);
final double ln_a = (x - 1072632447) / 1512775;
// ln(a) * b
final double tmp1 = ln_a * b;
// e^(ln(a) * b)
final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
return Double.longBitsToDouble(tmp2 << 32);
}
Between the two shifts, we can simply insert the tmp1 calculation into the tmp2 calculation to get
public static double pow2(final double a, final double b) {
final double x = (Double.doubleToLongBits(a) >> 32);
final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
return Double.longBitsToDouble(tmp2 << 32);
}
Now simplify tmp2 calculation:
- The original formula
tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))
- We can drop the factor 1512775
tmp2 = (x - 1072632447) * b + (1072693248 - 60801)
- And finally, calculate the substraction
tmp2 = b * (x - 1072632447) + 1072632447
The Result
That’s it! Add some casts, and the complete function is the same as above.
public static double pow(final double a, final double b) {
final int tmp = (int) (Double.doubleToLongBits(a) >> 32);
final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
return Double.longBitsToDouble(((long) tmp2) << 32);
}
This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence
UPDATE Recently there several other approximative pow calculation methods have been developed, here are some others that I have found through reddit:
- Fast pow() With Adjustable Accuracy — This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.
- Fast SSE2 pow: tables or polynomials? — Uses SSE operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.
Please post what you think about this!












Would it be possible to use this with float value instead of doubles?
I dont think so, the whole trick is based on the bit layout of the double numbers. So you need to find a very fast e^x approximation that works with floats, then you can do the rest again.
Another benchmark: on a Pentium 4, Windows, Java 1.6.0 -server, the approximation is about 41 times faster than Math.pow()
Not bad! But: if I want to compute a pow(), in which cases it might be ok to get an error up to 25%? Could you provide us some examples?
In my experience the error gets bigger when the exponent gets bigger. I have written a short program that uses random numbers to try to find a combination that gets very bad errors. Base is a random double between 0 and 1000, exponent random double between 0 and 5. The biggest error after 100 million evaluations is:
a between 0 and 1000, b between 0 and 5:
Worst case:
a=512.0125338006894
b=4.914054794454942
a^b approximation: 2.5571362865152E13
a^b Math.pow(): 2.0585114388503066E13
Errors is 19.499345822682237 %
Average error is 4.021374964371438 %
—–
a between 0 and 100, b between 0 and 3:
Worst case:
a=64.00103767757574
b=2.8915318496742626
a^b approximation: 191223.125
a^b Math.pow(): 166973.39656298532
Error is 12.681378592162784 %
Average error is 2.7778168699408558 %
Mr Skeptic, I am using this (a modified form) in an Ant Colony Optimization algorithm. There the core algorithms needs to calculate a^b * c^d very often. I have done comparison of the optimization quality when using this optimization and when using Math.pow(), the error does not have any measurable negative effect in the examples I am using. And the optimization got a hell of a lot faster, so even if there is a slight decrease in optimization quality this is more than compensated by the higher number of iterations I am able to do now.
Hi,
I need have a procedure which can handle 50! and 1/50! , that is 50 factorial and 1 divided by 50 factorial….
Is there a way to handle these huge numbers
In Java use BigInteger and BigDecimal:
http://javadoc.ankerl.com/results.html?cx=006156709672261707051%3Atrd_lbkwu2y&cof=FORID%3A9&q=bigdecimal&sa=Search#1356
Martin, that test program is interesting—so you ran it with random values in that range, iterated 100 million times, and chose the highest error… clever. What I also found interesting was where the base was the most off: right near the largest power of 2 in the range, but not right on it. Something to do with the bitshifting? It was almost creepy:
a between 0 and 1000, b between 0 and 5:
Worst case:
a=512.0125338006894 // why so close, but just a bit above?
b=4.914054794454942 //close to highest, but again, not quite?
a^b approximation: 2.5571362865152E13
a^b Math.pow(): 2.0585114388503066E13
Errors is 19.499345822682237 %
Just found it interesting. There’s a ton of applications for not-as-accurate, but way-faster calculations, particularly as the errors can even themselves out over many iterations. Plenty of “fuzzy” models out there that could use tricks like these, and indeed, the element of error adds a (pseudo) randomness, assuming the exact errors of all values aren’t well-known.
–CJ
Hi seejayjames, the behaviour of the error is really a bit strange. I think it is a bit off because of the way the constants were selected (read the paper for how they did it). This should minimize the average error.
But the 19.5% error does not look like such a big problem to me, because 512^4.9 is about 2*10^13, which is a very large number that might be out of the useful range for most usages.
I have used this a^b version as the core part in a research project at work, and it gives a huge speedup.
The code in .net for the approximation:
public static double PowerA(double a, double b) { int tmp = (int)(BitConverter.DoubleToInt64Bits(a) >> 32); int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447); return BitConverter.Int64BitsToDouble(((long)tmp2) << 32); }Thanks Jason! I have integrated your code into the article.
I’m currently wrestling with an application that uses exp a lot, and am looking at your approximation. Will the smae magic numbers work on 64bit linux, or do they need to be redetermined?
I forgot to mention that we are using C++ and the g++ compiler.
Hi Stephen, if I am not mistaken the magic numbers should be the same, but you have to rewrite the function because the casts like
rely on 32 bit boundaries. I’m not sure how to best do this, I am not a C guy.
If you manage to get a working version of the code, please share it! May I ask what for you need the exp function?
Клёво, мне понравилось!
Great Routines for J2ME!
Hi,
Your aproximation seems to be really usefor for slow embeddede devices.Besides,I have no RAM free space enough to load math library. However,i tried to used your aproximation for pow in an embedded micro(JN5139) 32-bit RISC CPU using Big-Endian format and using gcc compiler and it doesn’t work. Any idea of what kind of changes can be done to make it work ?
Thanks.
Hi Ralf, sorry but I have no idea. big-endian and 32 bit sounds like it should work. I have never programmed anything for embedded devices. double should be 64 bit, int should be 32 bit.
Well, code looks OK, but I’ve Cut and Paste it into J2ME (on Windows XP, SP3),
run it like pow(1.026f, 0.077f) an the result is: 0.97423… !?
The result is [b]less[/b] than 1.0 !.
OK, I decided to use ‘powTaylor(double, double)’
from http://today.java.net/pub/a/today/2007/11/06/creating-java-me-math-pow-method.html
The results are quite accurate.
Here is some code I’ve written to complement what I have available in my J2ME libraries…..related to pow, log, etc.. There is some stuff commented out here, so don’t worry about it. The rest is good. Note I didn’t include the concept of bittians for fast trig function lookups to 8 bits accuracy (great for graphics). Good luck with these.
/** * Fast integer to the power of another integer. Note that if the exponent is < 1 * then the function returns 0, as you'd expect for an integer function. This * function relies on the JVM to catch overflow, so if the int operations just * wrap around, then this function will not return the correct value for very large * values. * * @param x int that is to be raised to a power. * @param e int that is the exponent. * @return x^e calculated as an int. */ public static int pow(int x, int e) { if (e >= 1; // next bit } return r; } // pow(int,int) /** * Another pow function that raises a float to the power of an int. This * is a common operation. This uses the same algorithm as the int,int function * but with a float as the value to be raised. As with the int algorithm, the * exponent must be positive. If e < 0, it will return 0. * * @param x float value to be raised to a power. * @param e int exponent. e < 0 returns 0. */ public static float pow(float x, int e) { if (e >= 1; // next bit } return r; } // pow(float,int) /* TODO: Later debug...not working right now. * Fast floating point (single precision) power function. x^e, where both * x and e are floats. This is based on the principle that log(x^e) = elog(x). * It doesn't matter what base the log is in. If we use 2 then, log2 can be very * fast as float is already stored as M*2^E, where M and E are the mantissa and * exponent respectively. So, using some math. * * a = M*2^E * => log2(a) = log2(M) + E * * Since M is stored as 1.xxxx (IEEE format), then log2(M) ~ 0 * * => log2(a) = E approx, with log2(M) as error factor * * In J2ME, the Float.floatToIntBits and Float.intBitsToFloat * functions can be used to get floats as IEEE binary format and vice-versa. * So, this allows the exponent and Mantissa parts to be extracted. * As mentioned above, a = 2^Eold * Mold * => log2(a) = Eold + log2(Mold) [Equation 2] * * Repeatedly squaring the mantissa, Mold, we get the sequence… * Xnew = Mold^N, where 1.0 <= Xnew Mold^N = 2^Enew * Mnew * => N*log2(Mold) = Enew + log2(Mnew) * => log2(Mold) = Enew/N + log2(Mnew)/N * * This last equation means that we need to follow the following steps. * 1. Extract Eold and Mold from the input value x * 2. Put Eold aside and the repeatedly square Mold say 7 times (N = 2^7 = 128) == Xnew * 3. Break Xnew into Enew and Mnew * 4. Concatenate Enew and the high 9 bits of Mnew (including the invisible first bit). * indicated as [Enew:1.Mnew] and we only use the top 2 bits of Mnew rounded up from 3. * This is explained in http://focus.ti.com/lit/an/spra218/spra218.pdf * 5. Normalize this into a float, using the value in step 4 as a new mantissa = log2(Mold) * 6. Use Equation 2 above to calculate log2(Xold) = Eold + log2(Mold) {from step 6} * 7. This is the answer to return. * * * The squaring can only be 7 times for single precision numbers due to the exponent size, * also explained in that article. Note that this is accurate to about 5 decimal places * after the decimal point. * * Note that it might be faster to add the original Eold to Mold as an estimate to a * converging algorithm. Note also that other log bases can be calculated by this equality. * logn(x) = log2(x)/log2(n) * * It might also be possible to make this algorithm faster using the float to int * techniques used in flog2. Requires experimentation at a future date. Chebychev * Polynomials are also worth looking at. * * @param f float to get log to the base 2 of (log2(f)). * @return float returned that holds the value of log2(f). * @see flog2 * */ /* public static float fplog2(float f) { int Xold = Float.floatToIntBits(f); // Convert to IEEE binary // Extract Eold and Mold from Xold int Eold = ((Xold >> 23) & 0xff) - 128; // Extract the exponent. int i = Xold & 0x807fffff; // Change the exponent to 0 (127) i |= 0x3f800000; // and calculate the mantissa, Mold float Mold = Float.intBitsToFloat(i); // Store Mold as a float // Mold = ((-1.0/3 * Mold + 2) * Mold = 2.0/3.0; // For low precision faster results, can use Eold + Mold here to return // not a bad result (3 decimal places or so). // Square Mold 7 times int N = 128; // N = 2^7 = 128 (7-bit exponent) float temp = pow(Mold, N); // Xnew = Mold^N, N = 128 int Xnew = Float.floatToIntBits(temp); // ….in binary format // // seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm // 33222222222211111111110000000000 // 10987654321098765432109876543210 // 12345678EEEEEEE1MMM0100010001000 // // Extract Enew and Mnew from Xnew int Enew = ((Xnew >> 23) & 0xff) - 128; // Extract Exponent. i = Xnew & 0x807fffff; // Clear exponent part i = (i >> 7) | 0x00010000; // Place mantissa in [16:0] including hidden bit i += (Enew <>= 2; // worst case is if bit 24 got set due to rounding while((i & 0x00800000) != 0) // shift until this bit is set { i <<= 1; // shift left ++e; // increment exponent } i &= 0xff7fffff; // Clear bit 23, the hidden mantissa bit i |= e << 23; // Put the exponent in there. temp = Float.intBitsToFloat(i); // Convert to log2(Mold) value temp += (float)Eold; // Calculate Eold + log2(Mold) = Xnew = log2(f) return (temp); // return log2(f) }; // fplog2(float */ /** * Log to the base 2 of a single precision floating point number. This is a * fast algorithm exchanging speed for precision. Is accurate to about 3 digits. * * This function converts the IEEE floating point input to bits as a 32-bit * integer. IEEE FP is sign bit, 8 exponent bits and 23 mantissa bits. The * exponent is e + 127 so that negative exponents can be supported. Finally, * the mantissa is of the format 1.mmmmmmm where the 1 is assumed and the * mmmm’s are the low 23 bits. After every floating point operation, the * number is normalized by shifting and adjusting the exponent until the * desired 1.mmmmmm mantissa is achieved. * * So, the input FP is converted to an int, is divided by 2^23 to put the exp to the LHS * of the decimal and the mantissa to the RHS of the decimal. 127 is subtracted from it * to give the real exponent. Next, the mantissa is extracted and passed through an * error polynomial to converge on the result and added to the origninal ee.mmmmmmm * to give a fairly accurate estimate. * * @param f float input value. * @return float result of log2(f) * @see fpow2 */ public static float flog2(float f) { float Xnorm = (float)Float.floatToIntBits(f); // Convert to integer Xnorm = (Xnorm * 1.1920929e-7f) - 127; // eemmmmmmm *(1/2^23)-127 = ee.mmmmm float frac = modf(Xnorm); // Get RHS of decimal = mantissa frac = 0.346607f * frac * (1 - frac); // Polynomial error calculation return Xnorm + frac; // New value } // flog2 /** * This is a high performance to a few decimal places for 2^x, where x * is a float. * * It’s reasonable to assume that the input value to be raised to the * power is no greater than 128. At too high a value, the power will be * larger than the largest number representable by a single precision float * and since the exponent is from -127 to 128, then it’s reasonable to * assume the input argument is limited to this. This means that the * input argument is always in the format iii.fffffff and that due to * floating point normalization in IEEE numbers, we can represent this * closer to eee.mmmmmmm, where eee is the exponent and mmmmmmm is the * mantissa. This allows us to greatly speed up our algorithm. * * @param x float input value to calculate alog2 of (alog2(x) == pow2(x)). * @return float result = 2^x. * @see flog2 */ public static float fpow2(float x) { float frac = modf(x); // Get fractional part frac = 0.33971f * frac * (1 - frac); // Polynomial error calculation float Xnorm = x + 127 - frac; // Offset exponent and frac adjust Xnorm *= 8388608.0f; // * 2^23 (shift left) = eemmmmmmm Xnorm = Float.intBitsToFloat((int)Xnorm); // Convert to proper float return Xnorm; } // fpow2 /** * Returns the log to the base 2 of an integer. This is done by converting to an IEEE * single precision floating point number, casting this to an int to get the bits and * shifting it right 23 bits to seperate the exponent from the mantissa. Then the * exponent is normalized by subtracting 128 from it. Finally, if the MS bit after the 1. * in the mantissa is a 1, then the exponent is rounded up by 1. The idea is to return the * nearest value to the log rounded to nearest. * * @param x int to get the log to the base 2 of. * @return int log2(x) rounded to the nearest int. */ public static int log2(int x) { int y = Float.floatToIntBits((float)x); //int y = (int)Float.intBitsToFloat(x); y = (((y & 0x00400000) != 0) ? ((y >> 23) + 1) : (y >> 23)) - 127; return y; } /** * Raise input int, x, to the power of 2. Used integer pow function. * * @param x Int to raise to power of 2. * @return 2^x returned as int. * @see pow */ public static int pow2(int x) { return pow(2, x); } // inv sqrt of float has a trick. It’s not magic. // // Algorithm fast 1/sqrt(x) // f -> int // // int >> 2 divides exponent by 2 // if exponent is even mantissa 1.mmmm -> 1.0mmmm // if exponent is odd mantissa 1.mmmm -> 1.1mmmm // // Last step is very clever. sqrt of exp does result id divide by 2. // sqrt of 1.x is 1.0x or 1.1x approximately depending on odd or even exponent. APPROX!!!!!! // Thing sqrt(2) is similar to sqrt(200), sqrt(20) is similar to sqrt(2000), so this is true. // // The subtract is handling the 1/x operation. // // The eqt is 1/sqrt(x) = a - sqrt(x) This is a subtract not in fp numbers but in ints don’t // forget… // // 0x5f3759df -> + exp = BE = 190 - 128 = 62 decimal. Mantissa = 1.01101110101100111011111 // = 1.375 x 2^62 approx. // // If exp of sqrt is 0 -> 0x80…..turns exp from BE to 3E or -66 decimal. The mantissa also gets // affected. The 1.0 part remains the same (since it’s hidden). The RHS of the decimal point gets // subtracted. // // So finally, convert int back to fp number. // Run one pass of newtons sqrt. /* float InvSqrt (float x) { float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x; } */ // // /** * Get square root of int, x, and return the result as an int. This is based on * an algorithm for the microchip pic 8-bit micro extended to twice the width * and optimized for java rather than assembler. * * @param x Value to get square root of. Must be > 0. There is no error check * for negative x. * @result int sqrt(x). * */ public static int sqrt(int x) { int bx = 0xc000; // Used to shift or set a bit. int v = 0x8000; // sqrt value initial estimate int sq; // TODO: was unsigned int (doesn’t exist in java) while(bx > 0) // Loops 16 times through { if ((sq = v*v) == x) // check for exact match. break; v = (sq >= 1; } return v; // Return the closest estimate } // sqrt