<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martin Ankerl &#187; C++</title>
	<atom:link href="http://martin.ankerl.com/category/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://martin.ankerl.com</link>
	<description>No movement is faster than no movement</description>
	<lastBuildDate>Tue, 13 Jul 2010 05:31:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Optimized pow() approximation for Java, C / C++, and C#</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/</link>
		<comments>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 22:48:08 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[tricks]]></category>
		<category><![CDATA[floating point]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=96</guid>
		<description><![CDATA[I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation: Approximation of pow() in Java public static [...]]]></description>
			<content:encoded><![CDATA[<p>I have already written about approximations of <tt>e^x</tt>, <tt>log(x)</tt> and <tt>pow(a, b)</tt> in my post <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">Optimized Exponential Functions for Java</a>. Now I have more <img src='http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  In particular, the <tt>pow()</tt> function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:</p>
<h1>Approximation of pow() in Java</h1>
<pre class="brush: java;">public static double pow(final double a, final double b) {
    final int x = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int y = (int) (b * (x - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) y) &lt;&lt; 32);
}</pre>
<p>This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That&#8217;s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. <a href="http://en.wikipedia.org/wiki/Metaheuristic">metaheuristics</a> or <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">neural nets</a>.</p>
<p>I use Linux, Java 1.6.0-b105 with the server VM, and execute the benchmark with this command:
<pre>sudo nice -n -20 java -cp . -server PowTest</pre>
<p> The approximation is <b>27 times faster</b> than Math.pow() on my Pentium-M. On a Pentium 4 it is <b>41 times faster</b>. Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark <a href="/files/PowTest.java">PowTest.java</a> and have a look, I have tried to prevent overoptimization while still having a low overhead.</p>
<h1>Approximation of pow() in C and C++</h1>
<pre class="brush: cpp;">double pow(double a, double b) {
    int tmp = (*(1 + (int *)&amp;a));
    int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
    double p = 0.0;
    *(1 + (int * )&amp;p) = tmp2;
    return p;
}</pre>
<p>Compiled on my Pentium-M with gcc 4.1.2:
<pre>gcc -O3 -march=pentium-m -fomit-frame-pointer -fno-strict-aliasing</pre>
<p>This version is <b>7.8 times</b> faster than pow() from the standard library.</p>
<p><strong>WARNING</strong>! you HAVE to use the <tt>-fno-strict-aliasing</tt> option, or this does not work!</p>
<h1>Approximation of pow() in C#</h1>
<p>Jason Jung has posted a port of the this code to C#: </p>
<pre class="brush: csharp;">public static double PowerA(double a, double b) {
  int tmp = (int)(BitConverter.DoubleToInt64Bits(a) &gt;&gt; 32);
  int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
  return BitConverter.Int64BitsToDouble(((long)tmp2) &lt;&lt; 32);
}</pre>
<h1>How the Approximation was Developed</h1>
<p>It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.</p>
<h2>Approximation of e^x</h2>
<p>As described <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">here</a>, the paper &#8220;<a href="http://citeseer.ist.psu.edu/schraudolph98fast.html">A Fast, Compact Approximation of the Exponential Function</a>&#8221; develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate <tt>e^x</tt>. This macro can be transformed into Java code straightforward, which looks like this:</p>
<pre class="brush: java;">public static double exp(double val) {
    final long tmp = (long) (1512775 * val + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp &lt;&lt; 32);
}</pre>
<h2>Use Exponential Functions for a^b</h2>
<p>Thanks to the power of math, we know that <tt>a^b</tt> can be transformed like this:</p>
<ol>
<li>Take exponential
<pre>a^b = e^(ln(a^b))</pre>
<li>Extract b
<pre>a^b = e^(ln(a)*b)</pre>
</ol>
<p>Now we have expressed the pow calculation with <tt>e^x</tt> and <tt>ln(x)</tt>. We already have the <tt>e^x</tt> approximation, but no good <tt>ln(x)</tt>. The <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">old approximation</a> is very bad, so we need a better one. So what now?</p>
<h2>Approximation of ln(x)</h2>
<p>Here comes the big trick: Rember that we have the nice <tt>e^x</tt> approximation? Well, <tt>ln(x)</tt> is exactly the inverse function! That means we just need to transform the above approximation so that the output of <tt>e^x</tt> is transformed back into the original input.</p>
<p>That&#8217;s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:</p>
<pre>final double tmp = (Double.doubleToLongBits(val) >> 32);</pre>
<p>Now solve the equation
<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre>
<p> for val:</p>
<ol>
<li>The original formula
<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre>
<li>Perform subtraction
<pre>tmp = 1512775 * val + 1072632447</pre>
<li>Bring value to other side
<pre>tmp - 1072632447 = 1512775 * val</pre>
<li>Divide by factor
<pre>(tmp - 1072632447) / 1512775 = val</pre>
<li>Finally, val on the left side
<pre>val = (tmp - 1072632447) / 1512775</pre>
</ol>
<p>Voíla, now we have a nice approximation of <tt>ln(x)</tt>:</p>
<pre class="brush: java;">public double ln(double val) {
    final double x = (Double.doubleToLongBits(val) &gt;&gt; 32);
    return (x - 1072632447) / 1512775;
}</pre>
<h2>Combine Both Approximations</h2>
<p>Finally we can combine the two approximations into <tt>e^(ln(a) * b)</tt>:</p>
<pre class="brush: java;">public static double pow1(final double a, final double b) {
    // calculate ln(a)
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final double ln_a = (x - 1072632447) / 1512775;

    // ln(a) * b
    final double tmp1 = ln_a * b;

    // e^(ln(a) * b)
    final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre>
<p>Between the two shifts, we can simply insert the <tt>tmp1</tt> calculation into the tmp2 calculation to get</p>
<pre class="brush: java;">public static double pow2(final double a, final double b) {
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre>
<p>Now simplify <tt>tmp2</tt> calculation:</p>
<ol>
<li>The original formula
<pre>tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))</pre>
<li>We can drop the factor <tt>1512775</tt>
<pre>tmp2 = (x - 1072632447) * b + (1072693248 - 60801)</pre>
<li>And finally, calculate the substraction
<pre>tmp2 = b * (x - 1072632447) + 1072632447</pre>
</ol>
<h2>The Result</h2>
<p>That&#8217;s it! Add some casts, and the complete function is the same as above.</p>
<pre class="brush: java;">public static double pow(final double a, final double b) {
    final int tmp = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) tmp2) &lt;&lt; 32);
}</pre>
<p>This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence <img src='http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><strong>UPDATE</strong> Recently there several other approximative <tt>pow</tt> calculation methods have been developed, here are some others that I have found through <a href="http://www.reddit.com/r/programming/comments/8kftl/fast_pow_approximation_in_java_and_c/">reddit</a>:</p>
<ul>
<li><a href="http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html">Fast pow() With Adjustable Accuracy</a> &#8212; This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.
</li>
<li><a href="http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html">Fast SSE2 pow: tables or polynomials?</a> &#8212; Uses <a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE </a> operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.
</li>
</ul>
<p>Please post what you think about this!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=96&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
	</channel>
</rss>
