Optimized pow() approximation for Java, C / C++, and C#

I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more 🙂 In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:

Approximation of pow() in Java

This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That’s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. metaheuristics or neural nets.

UPDATE, December 10, 2011

I just managed to make the above code about 30% faster than the one above on my machine. The error is a tiny fraction different (not better or worse).

This new approximation is about 23 times as fast as Math.pow() on my machine (Intel Core2 Quad, Q9550, Java 1.7.0_01-b08, 64-Bit Server VM). Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark PowBench.java and have a look, I have tried to prevent overoptimization, and substract the overhead introduced due to this preventation.

Approximation of pow() in C and C++

UPDATE, January 25, 2012

The code below is updated with using union, you do not need -fno-strict-aliasing any more for compiling. Also, here is a more precise version of the approximation.

Compiled on my Pentium-M with gcc 4.1.2:

This version is 7.8 times faster than pow() from the standard library.

Approximation of pow() in C#

Jason Jung has posted a port of the this code to C#:

How the Approximation was Developed

It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.

Approximation of e^x

As described here, the paper “A Fast, Compact Approximation of the Exponential Function” develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate e^x. This macro can be transformed into Java code straightforward, which looks like this:

Use Exponential Functions for a^b

Thanks to the power of math, we know that a^b can be transformed like this:

  1. Take exponential
  2. Extract b

Now we have expressed the pow calculation with e^x and ln(x). We already have the e^x approximation, but no good ln(x). The old approximation is very bad, so we need a better one. So what now?

Approximation of ln(x)

Here comes the big trick: Rember that we have the nice e^x approximation? Well, ln(x) is exactly the inverse function! That means we just need to transform the above approximation so that the output of e^x is transformed back into the original input.

That’s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:

Now solve the equation

for val:

  1. The original formula
  2. Perform subtraction
  3. Bring value to other side
  4. Divide by factor
  5. Finally, val on the left side

Voíla, now we have a nice approximation of ln(x):

Combine Both Approximations

Finally we can combine the two approximations into e^(ln(a) * b):

Between the two shifts, we can simply insert the tmp1 calculation into the tmp2 calculation to get

Now simplify tmp2 calculation:

  1. The original formula
  2. We can drop the factor 1512775
  3. And finally, calculate the substraction

The Result

That’s it! Add some casts, and the complete function is the same as above.

This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence 🙂

UPDATE Recently there several other approximative pow calculation methods have been developed, here are some others that I have found through reddit:

  • Fast pow() With Adjustable Accuracy — This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.
  • Fast SSE2 pow: tables or polynomials? — Uses SSE operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.

Please post what you think about this!

46 Comments on "Optimized pow() approximation for Java, C / C++, and C#"

Notify of
avatar

frank j
Guest
frank j
8 years 6 months ago

Would it be possible to use this with float value instead of doubles?

Jesus Suarez
Guest
Jesus Suarez
1 year 9 months ago
I don’t know if this is helpful to you 7 years later, but maybe it will be helpful to someone else. I derived this pow method for 32-bit floats from Schraudolph’s 1999 paper. The new value to subtract and add the the int-interpreted value is 1065307417 (rather than 1072632447). Use a single int instead of a length-2 array, and replace doubles with floats. The complete method is: inline float fastPow(float a, float b) { union { float f; int i; } u = { a }; u.i = (int)(b * (u.i – 1065307417) + 1065307417); return u.f; }
Martin Ankerl
Guest
8 years 6 months ago

I dont think so, the whole trick is based on the bit layout of the double numbers. So you need to find a very fast e^x approximation that works with floats, then you can do the rest again.

Martin Ankerl
Guest
8 years 6 months ago

Another benchmark: on a Pentium 4, Windows, Java 1.6.0 -server, the approximation is about 41 times faster than Math.pow() 🙂

Mr. Sceptic
Guest
Mr. Sceptic
8 years 6 months ago

Not bad! But: if I want to compute a pow(), in which cases it might be ok to get an error up to 25%? Could you provide us some examples?

Martin Ankerl
Guest
8 years 6 months ago

In my experience the error gets bigger when the exponent gets bigger. I have written a short program that uses random numbers to try to find a combination that gets very bad errors. Base is a random double between 0 and 1000, exponent random double between 0 and 5. The biggest error after 100 million evaluations is:

a between 0 and 1000, b between 0 and 5:
Worst case:
a=512.0125338006894
b=4.914054794454942
a^b approximation: 2.5571362865152E13
a^b Math.pow(): 2.0585114388503066E13
Errors is 19.499345822682237 %

Average error is 4.021374964371438 %

—–
a between 0 and 100, b between 0 and 3:
Worst case:
a=64.00103767757574
b=2.8915318496742626
a^b approximation: 191223.125
a^b Math.pow(): 166973.39656298532
Error is 12.681378592162784 %

Average error is 2.7778168699408558 %

Martin Ankerl
Guest
8 years 6 months ago

Mr Skeptic, I am using this (a modified form) in an Ant Colony Optimization algorithm. There the core algorithms needs to calculate a^b * c^d very often. I have done comparison of the optimization quality when using this optimization and when using Math.pow(), the error does not have any measurable negative effect in the examples I am using. And the optimization got a hell of a lot faster, so even if there is a slight decrease in optimization quality this is more than compensated by the higher number of iterations I am able to do now.

trackback

[…] Optimized pow() approximation for Java and C / C++ by Martin Ankerl – […]

Karthick
Guest
Karthick
8 years 6 months ago

Hi,

I need have a procedure which can handle 50! and 1/50! , that is 50 factorial and 1 divided by 50 factorial….

Is there a way to handle these huge numbers

Ilse
Guest
Ilse
8 years 6 months ago
seejayjames
Guest
seejayjames
8 years 2 months ago
Martin, that test program is interesting—so you ran it with random values in that range, iterated 100 million times, and chose the highest error… clever. What I also found interesting was where the base was the most off: right near the largest power of 2 in the range, but not right on it. Something to do with the bitshifting? It was almost creepy: a between 0 and 1000, b between 0 and 5: Worst case: a=512.0125338006894 // why so close, but just a bit above? b=4.914054794454942 //close to highest, but again, not quite? a^b approximation: 2.5571362865152E13 a^b Math.pow(): 2.0585114388503066E13 Errors… Read more »
Martin Ankerl
Guest
8 years 2 months ago

Hi seejayjames, the behaviour of the error is really a bit strange. I think it is a bit off because of the way the constants were selected (read the paper for how they did it). This should minimize the average error.

But the 19.5% error does not look like such a big problem to me, because 512^4.9 is about 2*10^13, which is a very large number that might be out of the useful range for most usages.

I have used this a^b version as the core part in a research project at work, and it gives a huge speedup.

trackback

[…] UPDATE This pow() approximation is obsolete. I have a much faster and more accurate approximation version here. […]

Jason Jung
Guest
Jason Jung
7 years 8 months ago

The code in .net for the approximation:

Martin Ankerl
Guest
7 years 8 months ago

Thanks Jason! I have integrated your code into the article.

Stephen Early
Guest
Stephen Early
7 years 7 months ago

I’m currently wrestling with an application that uses exp a lot, and am looking at your approximation. Will the smae magic numbers work on 64bit linux, or do they need to be redetermined?

Stephen Early
Guest
Stephen Early
7 years 7 months ago

I forgot to mention that we are using C++ and the g++ compiler.

Martin Ankerl
Guest
7 years 7 months ago

Hi Stephen, if I am not mistaken the magic numbers should be the same, but you have to rewrite the function because the casts like

rely on 32 bit boundaries. I’m not sure how to best do this, I am not a C guy.
If you manage to get a working version of the code, please share it! May I ask what for you need the exp function?

Alexis
Guest
7 years 5 months ago

Great Routines for J2ME!

trackback

[…] use floating point tricks based on my pow() approximation. Basically I just took the pow() formula and for a^b I substitued b with 0.5, then simplified this […]

Ralf
Guest
Ralf
7 years 3 months ago

Hi,

Your aproximation seems to be really usefor for slow embeddede devices.Besides,I have no RAM free space enough to load math library. However,i tried to used your aproximation for pow in an embedded micro(JN5139) 32-bit RISC CPU using Big-Endian format and using gcc compiler and it doesn’t work. Any idea of what kind of changes can be done to make it work ?
Thanks.

Martin Ankerl
Guest
7 years 3 months ago

Hi Ralf, sorry but I have no idea. big-endian and 32 bit sounds like it should work. I have never programmed anything for embedded devices. double should be 64 bit, int should be 32 bit.

NickF
Guest
7 years 2 months ago

Well, code looks OK, but I’ve Cut and Paste it into J2ME (on Windows XP, SP3),
run it like pow(1.026f, 0.077f) an the result is: 0.97423… !?
The result is [b]less[/b] than 1.0 !.

NickF
Guest
7 years 2 months ago

OK, I decided to use ‘powTaylor(double, double)’
from http://today.java.net/pub/a/today/2007/11/06/creating-java-me-math-pow-method.html
The results are quite accurate.

Donald Murray
Guest
Donald Murray
6 years 11 months ago
Here is some code I’ve written to complement what I have available in my J2ME libraries…..related to pow, log, etc.. There is some stuff commented out here, so don’t worry about it. The rest is good. Note I didn’t include the concept of bittians for fast trig function lookups to 8 bits accuracy (great for graphics). Good luck with these. /** * Fast integer to the power of another integer. Note that if the exponent is < 1 * then the function returns 0, as you'd expect for an integer function. This * function relies on the JVM to catch… Read more »
Joel Buursma
Guest
Joel Buursma
5 years 11 months ago

The link to the paper referenced (“A Fast, Compact Approximation of the Exponential Function”) appears to be broken. This appears to be better:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.1569

Dick Rochester
Guest
Dick Rochester
4 years 10 months ago
I have inherited some C code from a former employee that computes power(a,b), he claims more accurately than the math.h pow(a,b) when a is close to 1. I thought he was using a Taylor series expansion about 1 but I can’t seem to get the math to match. Here is his code: double powa(double f1, double f2) { double rv,ln,am1; ln=log(f1); am1=f2-1.0; rv=f1*ln*am1; ln*=ln; am1*=am1; rv+=.5*f1*ln*am1; rv+=f1; return(rv); } 1234567891011121314151617 double powa(double f1, double f2){   double rv,ln,am1;    ln=log(f1);   am1=f2-1.0;   rv=f1*ln*am1;    ln*=ln;   am1*=am1;    rv+=.5*f1*ln*am1;    rv+=f1;    return(rv);} Any idea where this approximation came from? How was it derived? Thanks in advance.
henrikhedemann
Guest
henrikhedemann
4 years 12 days ago
Actually your code is based on the following mathematical idea ( I use x,y instead of f1,f2): (I) pow(x,y)=x* pow(x, (y-1) ) =x* exp [ (y-1) * log(x) ] Now take the first three terms of the Taylor series of exp (given y is close to 1 means y-1 is close to zero): (II) exp(t) ~ 1+t+0.5*t*t [ + O(t*t*t) ] using (II) in (I) is exactly what happens in this code. The accuracy is poor except very (!) close to 1. An error estimate is obviously the next term in the Taylor series, i.e. delta ~ 0.167*x*pow((y-1)*log(x),3)
DarthWho
Guest
DarthWho
4 years 8 months ago

I am having trouble implementing this in Microsoft Visual C++ any tips?
I have tracked the problem down to both instances of pointer operations
ex:
(*(1 + (int *)&a));
neither operation returns the value it should how do I fix this?

Nisse Bergman
Guest
Nisse Bergman
4 years 7 months ago

Hi I would like to implement this for ActionScript 3. As3 only has 32 bit floats and ints so could you give me a pointer on how to implement this for floats instead of doubles.
The Double.longBitsToDouble does that just take the bits in a long and put it in a double?

Thanks in advance

arkon
Guest
arkon
4 years 5 months ago

i love you! this babies work SOOO fast… still have some problems since the numbers are indeed quite a way off, but that should be manageable.
thanks a lot!

Sam Hocevar
Guest
4 years 4 months ago

If you replace *(1 + (int * )&p) = tmp2; with union { double d; int x[2]; } u = { p }; u.x[1] = tmp2; p = u.d; you get standards-compliant C/C++ and you can drop the ugly requirement for -fno-strict-aliasing. I think the compiler will generate similar code.

Bob
Guest
Bob
4 years 3 months ago

On my machine I found little difference in cpu time between the native exact pow and this fast pow function. I was doing x^5 with x from 10 to 500, and the x^1/5 with larger numbers

Maybe the native pow is better optimized on my 6-core?

Bob
Guest
Bob
4 years 3 months ago

Further more careful testing suggest that this is actually quite a bit slower than the native pow function for C (didn’t test C++). I did 100 million pow calculations and this “fast” pow(double,double) method took 5 times longer than the native C pow(double,double). Here was the test code section:

x=.01;
y=.1;
for(i=0;i<100000000;i++)
{
x += .00001;
y += .0000001;
pow(x,y);
}

It took 26 seconds using the above pow code, and ony 6 seconds with the native C pow function.

So…I would definitely do some testing on your machine before jumping onto this. Perhaps Java is quite different.

Bob
Guest
Bob
4 years 3 months ago

I’m using Visual Studio 2008’s compiler. It is at least somewhat faster and perhaps more so with optimization so I am going to use it in my fast max filtering. This does not depend strongly on small errors in pow so it looks indistinguishable (to my eyes). Thanks!

trackback

[…] taking a look at this: optimized pow() approximation BBCode tags | How to ask questions the smart way | Common Java Mistakes | Official Forum […]

trackback

[…] high accuracy require black magic. To see what I mean by “black magic,” take a look at this blog post by Martin Ankerl and an associated paper he links to in Neural Computation. Also see the CORDIC […]

Alberto
Guest
Alberto
1 year 19 days ago

Thanks for your post. I needed to optimize my source code seems I am carrying out an evolutionary algorithm using “pow” (a feature weighting approach in Data Mining). It works perfect!

wpDiscuz