<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Optimized pow() approximation for Java, C / C++, and C#</title>
	<atom:link href="http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/feed/" rel="self" type="application/rss+xml" />
	<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/</link>
	<description>Chunky bacon!!</description>
	<lastBuildDate>Wed, 08 Feb 2012 16:00:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: martinus</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3764</link>
		<dc:creator>martinus</dc:creator>
		<pubDate>Wed, 25 Jan 2012 20:15:14 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3764</guid>
		<description>Nice to hear that this is of use for you. I have just wtriten a new blog post with union version, and a more precise version:

http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/</description>
		<content:encoded><![CDATA[<p>Nice to hear that this is of use for you. I have just wtriten a new blog post with union version, and a more precise version:</p>
<p><a href="http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/" rel="nofollow">http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3761</link>
		<dc:creator>Bob</dc:creator>
		<pubDate>Wed, 25 Jan 2012 20:11:15 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3761</guid>
		<description>I&#039;m using Visual Studio 2008&#039;s compiler.  It is at least somewhat faster and perhaps more so with optimization so I am going to use it in my fast max filtering. This does not depend strongly on small errors in pow so it looks indistinguishable (to my eyes). Thanks!</description>
		<content:encoded><![CDATA[<p>I&#8217;m using Visual Studio 2008&#8242;s compiler.  It is at least somewhat faster and perhaps more so with optimization so I am going to use it in my fast max filtering. This does not depend strongly on small errors in pow so it looks indistinguishable (to my eyes). Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: martinus</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3756</link>
		<dc:creator>martinus</dc:creator>
		<pubDate>Wed, 25 Jan 2012 19:38:29 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3756</guid>
		<description>thanks!</description>
		<content:encoded><![CDATA[<p>thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: martinus</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3753</link>
		<dc:creator>martinus</dc:creator>
		<pubDate>Wed, 25 Jan 2012 19:03:02 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3753</guid>
		<description>I get a speedup of a factor of about 4.2 when I enable optimization, fast floating point, and SSE2. 

At http://pastebin.com/DRvPJL2K I have also added a new method &lt;tt&gt;fastPrecisePow&lt;/tt&gt; that is much more precise for large exponents, but a bit slower. It is 3.3 times faster than pow on my PC.</description>
		<content:encoded><![CDATA[<p>I get a speedup of a factor of about 4.2 when I enable optimization, fast floating point, and SSE2. </p>
<p>At <a href="http://pastebin.com/DRvPJL2K" rel="nofollow">http://pastebin.com/DRvPJL2K</a> I have also added a new method <tt>fastPrecisePow</tt> that is much more precise for large exponents, but a bit slower. It is 3.3 times faster than pow on my PC.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3751</link>
		<dc:creator>Bob</dc:creator>
		<pubDate>Wed, 25 Jan 2012 18:49:00 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3751</guid>
		<description>Same results as before (~ 4.5 sec). Largest error is 33% or so at the upper end of x and y. I&#039;m not using optimization so that may change things.</description>
		<content:encoded><![CDATA[<p>Same results as before (~ 4.5 sec). Largest error is 33% or so at the upper end of x and y. I&#8217;m not using optimization so that may change things.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: martinus</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3748</link>
		<dc:creator>martinus</dc:creator>
		<pubDate>Wed, 25 Jan 2012 18:29:50 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3748</guid>
		<description>Could you try this code:
[cpp]double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}[/cpp]

Also take care that your CPU&#039;s speedstep does not produce strange results, so you should do a warmup first to ensure the CPU is in the highest frequency for the whole benchmark.

I have copied a benchmark here: http://pastebin.com/DRvPJL2K</description>
		<content:encoded><![CDATA[<p>Could you try this code:</p>
<pre class="brush: cpp; title: ; notranslate">double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}</pre>
<p>Also take care that your CPU&#8217;s speedstep does not produce strange results, so you should do a warmup first to ensure the CPU is in the highest frequency for the whole benchmark.</p>
<p>I have copied a benchmark here: <a href="http://pastebin.com/DRvPJL2K" rel="nofollow">http://pastebin.com/DRvPJL2K</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3745</link>
		<dc:creator>Bob</dc:creator>
		<pubDate>Wed, 25 Jan 2012 17:51:00 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3745</guid>
		<description>Martinus,

Good idea as that checked something else. Turned out the pow function as implemented above was giving a few infinities, which lead to the slowdown. When I replaced it with this:
double powFast(double a, double b) 
{  
    int one = 1;
    int tmp = (*(one + (int *)&amp;a));    
    int tmp2 = (int)(b * (tmp - 1072632447) +1072632447);    
    double p = 0.0;    
    *(one + (int * )&amp;p) = tmp2;   
    return p;
}

it ran quite a bit faster than before and even faster than native. Still the difference between powFast and the native pow was pretty small on my machine.
pow()        -  5.9 sec, sum=2.5243x10^36
powFast() - 4.5 sec, sum=3.3048x10^36</description>
		<content:encoded><![CDATA[<p>Martinus,</p>
<p>Good idea as that checked something else. Turned out the pow function as implemented above was giving a few infinities, which lead to the slowdown. When I replaced it with this:<br />
double powFast(double a, double b)<br />
{<br />
    int one = 1;<br />
    int tmp = (*(one + (int *)&amp;a));<br />
    int tmp2 = (int)(b * (tmp &#8211; 1072632447) +1072632447);<br />
    double p = 0.0;<br />
    *(one + (int * )&amp;p) = tmp2;<br />
    return p;<br />
}</p>
<p>it ran quite a bit faster than before and even faster than native. Still the difference between powFast and the native pow was pretty small on my machine.<br />
pow()        &#8211;  5.9 sec, sum=2.5243&#215;10^36<br />
powFast() &#8211; 4.5 sec, sum=3.3048&#215;10^36</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: martinus</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3740</link>
		<dc:creator>martinus</dc:creator>
		<pubDate>Wed, 25 Jan 2012 14:09:26 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3740</guid>
		<description>Hi bob, are you sure that the compiler does not just optimize the pow(x,y) call away? Could you try to change the benchmark into something like this, and try again:

[cpp]sum=0;
x=.01;
y=.1;
for(i=0;i&lt;100000000;i++) {
    x += .00001;
    y += .0000001;
    sum += pow(x,y);
}
std::cout &lt;&lt; sum &lt;&lt; std::endl;[/cpp]</description>
		<content:encoded><![CDATA[<p>Hi bob, are you sure that the compiler does not just optimize the pow(x,y) call away? Could you try to change the benchmark into something like this, and try again:</p>
<pre class="brush: cpp; title: ; notranslate">sum=0;
x=.01;
y=.1;
for(i=0;i&lt;100000000;i++) {
    x += .00001;
    y += .0000001;
    sum += pow(x,y);
}
std::cout &lt;&lt; sum &lt;&lt; std::endl;</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3739</link>
		<dc:creator>Bob</dc:creator>
		<pubDate>Wed, 25 Jan 2012 14:07:09 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3739</guid>
		<description>Further  more careful testing suggest that this is actually quite a bit slower than the native pow function for C (didn&#039;t test C++).  I did 100 million pow calculations and this &quot;fast&quot; pow(double,double) method took 5 times longer than the native C pow(double,double). Here was the test code section:

x=.01;
y=.1;
for(i=0;i&lt;100000000;i++)
 {
	x += .00001;
	y += .0000001;
	pow(x,y);
}

It took 26 seconds using the above pow code, and ony 6 seconds with the native C pow function.

So...I would definitely do some testing on your machine before jumping onto this.  Perhaps Java is quite different.</description>
		<content:encoded><![CDATA[<p>Further  more careful testing suggest that this is actually quite a bit slower than the native pow function for C (didn&#8217;t test C++).  I did 100 million pow calculations and this &#8220;fast&#8221; pow(double,double) method took 5 times longer than the native C pow(double,double). Here was the test code section:</p>
<p>x=.01;<br />
y=.1;<br />
for(i=0;i&lt;100000000;i++)<br />
 {<br />
	x += .00001;<br />
	y += .0000001;<br />
	pow(x,y);<br />
}</p>
<p>It took 26 seconds using the above pow code, and ony 6 seconds with the native C pow function.</p>
<p>So&#8230;I would definitely do some testing on your machine before jumping onto this.  Perhaps Java is quite different.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comment-3699</link>
		<dc:creator>Bob</dc:creator>
		<pubDate>Tue, 24 Jan 2012 21:59:30 +0000</pubDate>
		<guid isPermaLink="false">http://martin.ankerl.com/?p=96#comment-3699</guid>
		<description>On my machine I found little difference in cpu time  between the native exact pow and this fast pow function. I was doing x^5 with x from 10 to 500, and the x^1/5 with larger numbers

Maybe the native pow is better optimized on my 6-core?</description>
		<content:encoded><![CDATA[<p>On my machine I found little difference in cpu time  between the native exact pow and this fast pow function. I was doing x^5 with x from 10 to 500, and the x^1/5 with larger numbers</p>
<p>Maybe the native pow is better optimized on my 6-core?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 1/3 queries in 0.701 seconds using disk: basic
Object Caching 432/433 objects using disk: basic

Served from: martin.ankerl.com @ 2012-02-08 22:48:09 -->
