<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Martin Ankerl &#187; C++</title> <atom:link href="http://martin.ankerl.com/tag/c/feed/" rel="self" type="application/rss+xml" /><link>http://martin.ankerl.com</link> <description>Chunky bacon!!</description> <lastBuildDate>Sat, 04 Feb 2012 09:19:03 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Optimized pow() approximation for Java, C / C++, and C#</title><link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/</link> <comments>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comments</comments> <pubDate>Thu, 04 Oct 2007 22:48:08 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[benchmark]]></category> <category><![CDATA[C++]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[java]]></category> <category><![CDATA[linux]]></category> <category><![CDATA[news]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[science]]></category> <category><![CDATA[tricks]]></category> <category><![CDATA[floating point]]></category> <category><![CDATA[optimization]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=96</guid> <description><![CDATA[I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, &#8230; <a href="http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I have already written about approximations of <tt>e^x</tt>, <tt>log(x)</tt> and <tt>pow(a, b)</tt> in my post <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">Optimized Exponential Functions for Java</a>. Now I have more <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif?9d7bd4" alt=':-)' class='wp-smiley' /> In particular, the <tt>pow()</tt> function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:</p><h1>Approximation of pow() in Java</h1><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
    final int x = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int y = (int) (b * (x - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) y) &lt;&lt; 32);
}</pre><p>This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That&#8217;s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. <a href="http://en.wikipedia.org/wiki/Metaheuristic">metaheuristics</a> or <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">neural nets</a>.</p><h2>UPDATE, December 10, 2011</h2><p>I just managed to make the above code about 30% faster than the one above on my machine. The error is a tiny fraction different (not better or worse).</p><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
	final long tmp = Double.doubleToLongBits(a);
    final long tmp2 = (long)(b * (tmp - 4606921280493453312L)) + 4606921280493453312L;
    return Double.longBitsToDouble(tmp2);
}</pre><p>This new approximation is about <strong>23 times</strong> as fast as Math.pow() on my machine (Intel Core2 Quad, Q9550, Java 1.7.0_01-b08, 64-Bit Server VM). Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark <a href="/files/PowBench.java">PowBench.java</a> and have a look, I have tried to prevent overoptimization, and substract the overhead introduced due to this preventation.</p><h1>Approximation of pow() in C and C++</h1><h2>UPDATE, January 25, 2012</h2><p>The code below is updated with using union, you do not need <tt>-fno-strict-aliasing</tt> any more for compiling. Also, here is a <a href="http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/">more precise version of the approximation</a>.</p><pre class="brush: cpp; title: ; notranslate">double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}</pre><p>Compiled on my Pentium-M with gcc 4.1.2:<pre>gcc -O3 -march=pentium-m -fomit-frame-pointer</pre><p>This version is <b>7.8 times</b> faster than pow() from the standard library.</p><h1>Approximation of pow() in C#</h1><p>Jason Jung has posted a port of the this code to C#:</p><pre class="brush: csharp; title: ; notranslate">public static double PowerA(double a, double b) {
  int tmp = (int)(BitConverter.DoubleToInt64Bits(a) &gt;&gt; 32);
  int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
  return BitConverter.Int64BitsToDouble(((long)tmp2) &lt;&lt; 32);
}</pre><h1>How the Approximation was Developed</h1><p>It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.</p><h2>Approximation of e^x</h2><p>As described <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">here</a>, the paper &#8220;<a href="http://citeseer.ist.psu.edu/schraudolph98fast.html">A Fast, Compact Approximation of the Exponential Function</a>&#8221; develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate <tt>e^x</tt>. This macro can be transformed into Java code straightforward, which looks like this:</p><pre class="brush: java; title: ; notranslate">public static double exp(double val) {
    final long tmp = (long) (1512775 * val + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp &lt;&lt; 32);
}</pre><h2>Use Exponential Functions for a^b</h2><p>Thanks to the power of math, we know that <tt>a^b</tt> can be transformed like this:</p><ol><li>Take exponential<pre>a^b = e^(ln(a^b))</pre><li>Extract b<pre>a^b = e^(ln(a)*b)</pre></ol><p>Now we have expressed the pow calculation with <tt>e^x</tt> and <tt>ln(x)</tt>. We already have the <tt>e^x</tt> approximation, but no good <tt>ln(x)</tt>. The <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">old approximation</a> is very bad, so we need a better one. So what now?</p><h2>Approximation of ln(x)</h2><p>Here comes the big trick: Rember that we have the nice <tt>e^x</tt> approximation? Well, <tt>ln(x)</tt> is exactly the inverse function! That means we just need to transform the above approximation so that the output of <tt>e^x</tt> is transformed back into the original input.</p><p>That&#8217;s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:</p><pre>final double tmp = (Double.doubleToLongBits(val) >> 32);</pre><p>Now solve the equation<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre><p> for val:</p><ol><li>The original formula<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre><li>Perform subtraction<pre>tmp = 1512775 * val + 1072632447</pre><li>Bring value to other side<pre>tmp - 1072632447 = 1512775 * val</pre><li>Divide by factor<pre>(tmp - 1072632447) / 1512775 = val</pre><li>Finally, val on the left side<pre>val = (tmp - 1072632447) / 1512775</pre></ol><p>Voíla, now we have a nice approximation of <tt>ln(x)</tt>:</p><pre class="brush: java; title: ; notranslate">public double ln(double val) {
    final double x = (Double.doubleToLongBits(val) &gt;&gt; 32);
    return (x - 1072632447) / 1512775;
}</pre><h2>Combine Both Approximations</h2><p>Finally we can combine the two approximations into <tt>e^(ln(a) * b)</tt>:</p><pre class="brush: java; title: ; notranslate">public static double pow1(final double a, final double b) {
    // calculate ln(a)
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final double ln_a = (x - 1072632447) / 1512775;

    // ln(a) * b
    final double tmp1 = ln_a * b;

    // e^(ln(a) * b)
    final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre><p>Between the two shifts, we can simply insert the <tt>tmp1</tt> calculation into the tmp2 calculation to get</p><pre class="brush: java; title: ; notranslate">public static double pow2(final double a, final double b) {
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre><p>Now simplify <tt>tmp2</tt> calculation:</p><ol><li>The original formula<pre>tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))</pre><li>We can drop the factor <tt>1512775</tt><pre>tmp2 = (x - 1072632447) * b + (1072693248 - 60801)</pre><li>And finally, calculate the substraction<pre>tmp2 = b * (x - 1072632447) + 1072632447</pre></ol><h2>The Result</h2><p>That&#8217;s it! Add some casts, and the complete function is the same as above.</p><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
    final int tmp = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) tmp2) &lt;&lt; 32);
}</pre><p>This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif?9d7bd4" alt=':-)' class='wp-smiley' /></p><p><strong>UPDATE</strong> Recently there several other approximative <tt>pow</tt> calculation methods have been developed, here are some others that I have found through <a href="http://www.reddit.com/r/programming/comments/8kftl/fast_pow_approximation_in_java_and_c/">reddit</a>:</p><ul><li><a href="http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html">Fast pow() With Adjustable Accuracy</a> &#8212; This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.</li><li><a href="http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html">Fast SSE2 pow: tables or polynomials?</a> &#8212; Uses <a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE </a> operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.</li></ul><p>Please post what you think about this!</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/feed/</wfw:commentRss> <slash:comments>41</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 1/12 queries in 0.352 seconds using disk: basic
Object Caching 446/464 objects using disk: basic

Served from: martin.ankerl.com @ 2012-02-04 10:20:02 -->
