<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Martin Ankerl &#187; C++</title> <atom:link href="http://martin.ankerl.com/category/c/feed/" rel="self" type="application/rss+xml" /><link>http://martin.ankerl.com</link> <description>Chunky bacon!!</description> <lastBuildDate>Sat, 04 Feb 2012 10:02:31 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Optimized Approximative pow() in C / C++</title><link>http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/</link> <comments>http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/#comments</comments> <pubDate>Wed, 25 Jan 2012 19:48:39 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[science]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=894</guid> <description><![CDATA[Mostly thanks to this reddit discussion, I have updated my pow() approximation for C / C++. I have now two different versions: This new code uses the union trick, instead of the weird casting trick I&#8217;ve used before. This means &#8230; <a href="http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Mostly thanks to <a href="http://www.reddit.com/r/gamedev/comments/n7na0/fast_approximation_to_mathpow/">this reddit discussion</a>, I have updated my <a href="http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/">pow() approximation</a> for C / C++. I have now two different versions:</p><pre class="brush: cpp; title: ; notranslate">inline double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}</pre><p><span id="more-894"></span><br /> <a href="http://martin.ankerl.com/wp-content/uploads/2012/01/pow.png?9d7bd4"><img src="http://martin.ankerl.com/wp-content/uploads/2012/01/pow.png?9d7bd4" alt="" title="This is how a^b looks like, in case you were wondering..." width="240" height="202" class="alignright size-full wp-image-906" /></a>This new code uses the union trick, instead of the weird casting trick I&#8217;ve used before. This means that <tt>-fno-strict-aliasing</tt> is no more  required any more when compiling, and it is also a bit faster because one less temporary variables is needed. When you have a little endian machine, you have to exchange u.x[0] and u.x[1]. On my PC, this version is 4.2 times faster than the much more precise pow().</p><p>Besides that, I also have now a slower approximation that has much less error when the exponent is larger than 1. It makes use <a href="https://secure.wikimedia.org/wikipedia/en/wiki/Exponentiation_by_squaring">exponentiation by squaring</a>, which is exact for the integer part of the exponent, and uses only the exponent&#8217;s fraction for the approximation:</p><pre class="brush: cpp; title: ; notranslate">// should be much more precise with large b
inline double fastPrecisePow(double a, double b) {
  // calculate approximation with fraction of the exponent
  int e = (int) b;
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)((b - e) * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;

  // exponentiation by squaring with the exponent's integer part
  // double r = u.d makes everything much slower, not sure why
  double r = 1.0;
  while (e) {
    if (e &amp; 1) {
      r *= a;
    }
    a *= a;
    e &gt;&gt;= 1;
  }

  return r * u.d;
}</pre><p>This code is 3.3 times faster than pow(). Writing a microbenchmark is not easy, so <a href="http://pastebin.com/DRvPJL2K">I have posted mine here</a>. <a href="http://pastebin.com/ZW95gEyr">Here is also a Java version of the more accurate pow approximation</a>.</p><p>Any ideas how this could be improved? Please post them!</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Cleverness Considered Harmful</title><link>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/</link> <comments>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/#comments</comments> <pubDate>Fri, 10 Dec 2010 22:04:36 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[rant]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[Uncategorized]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=501</guid> <description><![CDATA[I have just read this nice quote at the stackoverflow question &#8220;Why is cleverness considered harmful in programming by some people?&#8220;: Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it. &#8211; Alan Perlis Which reminds me of &#8230; <a href="http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I have just read this nice quote at the stackoverflow question &#8220;<a href="http://programmers.stackexchange.com/questions/25276">Why is cleverness considered harmful in programming by some people?</a>&#8220;:</p><blockquote><p>Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.<br /> &#8211; Alan Perlis</p></blockquote><p>Which reminds me of a little code piece I have written recently. I&#8217;ve recently tried to implement a small, little parser for a very simple custom data format, in C++. To do this, I have tried several approaches:</p><h2>1. Boost.Spirit</h2><p>Since we use Boost in our projects, I have started reading about <a href="http://boost-spirit.com/home/">Boost.Spirit</a>, and took some time to decipher the tutorials which contains code <a href="http://www.boost.org/doc/libs/1_45_0/libs/spirit/doc/html/spirit/qi/tutorials/complex___our_first_complex_parser.html">like this</a>:</p><pre class="brush: cpp; title: ; notranslate">bool r = phrase_parse(first, last,
  //  Begin grammar
  (
      '(' &gt;&gt; double_[ref(rN) = _1]
          &gt;&gt; -(',' &gt;&gt; double_[ref(iN) = _1]) &gt;&gt; ')'
  |   double_[ref(rN) = _1]
  ),
  //  End grammar
 space);</pre><p>After half an hour I got annoyed because it simply is too much effort. I don&#8217;t care how well thought out the library is and how powerful it is, it is simply unusable. Maybe I am too stupid, but I am sure that even when I manage to understand it enough to write a decent parser, half a year later I can never debug my code again: it&#8217;s simply too clever.</p><h2>2. Coco</h2><p>I&#8217;ve ditched Boost.Spirit, and tried to use <a href="http://www.ssw.uni-linz.ac.at/Coco/">Coco</a>. I am unfamiliar with this but have seen a colleague use it, so I gave it a try. I was reading the documentation, which looks nice but has 42 pages and since I am a lazy bastard I stopped right there because I just want to get something working, and quickly.</p><h2>3. Hand Written Large Switch</h2><p>I have ditched Cocomo, and started to write my own, very simple code that basically looked like this:</p><pre class="brush: cpp; title: ; notranslate">while (instream &gt;&gt; sym) {
  switch (symbol_map[sym]) {
  case START:
    // do this
    break;
  case WHATEVER:
    // do that
    break;
  }
}</pre><p>After just 10 minutes I got a minimal parser that worked good enough and was extremly readable and understandable code. Everybody with basic C++ understanding can skim over this code and get it. The <a href="http://www.flickr.com/photos/smitty/2245445147/">number of WTF&#8217;s per minute</a> when reading this code is close to zero. I am very happy with this approach, and it really rocks because it is so dead simple.</p><p>You can rightfully say that a simple switch is very inflexible, and not extensible. You are right, but who cares? Almost all code that I have seen that was planed ahead for flexibility that you might need, gets too complicated because what you planed ahead for might never be needed; even worse: most of the time you need flexibility that you cannot know in advance, it only becomes apparent when you have something and running and use it for a while.</ul><h2>Final Words</h2><p>Back to the original quote, based on my experience I would extend it a bit:</p><blockquote><p>Ignorants add complexity; fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.<br /> &#8211; Martin Ankerl</p></blockquote><p>If you find this interesting you might also like consider reading the <a href="http://martin.ankerl.com/2007/01/05/three-laws-of-software-development/">Three Laws of Software Development</a>.</p><p>What do you think?</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>Optimized pow() approximation for Java, C / C++, and C#</title><link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/</link> <comments>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comments</comments> <pubDate>Thu, 04 Oct 2007 22:48:08 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[benchmark]]></category> <category><![CDATA[C++]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[java]]></category> <category><![CDATA[linux]]></category> <category><![CDATA[news]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[science]]></category> <category><![CDATA[tricks]]></category> <category><![CDATA[floating point]]></category> <category><![CDATA[optimization]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=96</guid> <description><![CDATA[I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, &#8230; <a href="http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I have already written about approximations of <tt>e^x</tt>, <tt>log(x)</tt> and <tt>pow(a, b)</tt> in my post <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">Optimized Exponential Functions for Java</a>. Now I have more <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif?9d7bd4" alt=':-)' class='wp-smiley' /> In particular, the <tt>pow()</tt> function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:</p><h1>Approximation of pow() in Java</h1><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
    final int x = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int y = (int) (b * (x - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) y) &lt;&lt; 32);
}</pre><p>This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That&#8217;s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. <a href="http://en.wikipedia.org/wiki/Metaheuristic">metaheuristics</a> or <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">neural nets</a>.</p><h2>UPDATE, December 10, 2011</h2><p>I just managed to make the above code about 30% faster than the one above on my machine. The error is a tiny fraction different (not better or worse).</p><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
	final long tmp = Double.doubleToLongBits(a);
    final long tmp2 = (long)(b * (tmp - 4606921280493453312L)) + 4606921280493453312L;
    return Double.longBitsToDouble(tmp2);
}</pre><p>This new approximation is about <strong>23 times</strong> as fast as Math.pow() on my machine (Intel Core2 Quad, Q9550, Java 1.7.0_01-b08, 64-Bit Server VM). Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark <a href="/files/PowBench.java">PowBench.java</a> and have a look, I have tried to prevent overoptimization, and substract the overhead introduced due to this preventation.</p><h1>Approximation of pow() in C and C++</h1><h2>UPDATE, January 25, 2012</h2><p>The code below is updated with using union, you do not need <tt>-fno-strict-aliasing</tt> any more for compiling. Also, here is a <a href="http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/">more precise version of the approximation</a>.</p><pre class="brush: cpp; title: ; notranslate">double fastPow(double a, double b) {
  union {
    double d;
    int x[2];
  } u = { a };
  u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447);
  u.x[0] = 0;
  return u.d;
}</pre><p>Compiled on my Pentium-M with gcc 4.1.2:<pre>gcc -O3 -march=pentium-m -fomit-frame-pointer</pre><p>This version is <b>7.8 times</b> faster than pow() from the standard library.</p><h1>Approximation of pow() in C#</h1><p>Jason Jung has posted a port of the this code to C#:</p><pre class="brush: csharp; title: ; notranslate">public static double PowerA(double a, double b) {
  int tmp = (int)(BitConverter.DoubleToInt64Bits(a) &gt;&gt; 32);
  int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
  return BitConverter.Int64BitsToDouble(((long)tmp2) &lt;&lt; 32);
}</pre><h1>How the Approximation was Developed</h1><p>It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.</p><h2>Approximation of e^x</h2><p>As described <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">here</a>, the paper &#8220;<a href="http://citeseer.ist.psu.edu/schraudolph98fast.html">A Fast, Compact Approximation of the Exponential Function</a>&#8221; develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate <tt>e^x</tt>. This macro can be transformed into Java code straightforward, which looks like this:</p><pre class="brush: java; title: ; notranslate">public static double exp(double val) {
    final long tmp = (long) (1512775 * val + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp &lt;&lt; 32);
}</pre><h2>Use Exponential Functions for a^b</h2><p>Thanks to the power of math, we know that <tt>a^b</tt> can be transformed like this:</p><ol><li>Take exponential<pre>a^b = e^(ln(a^b))</pre><li>Extract b<pre>a^b = e^(ln(a)*b)</pre></ol><p>Now we have expressed the pow calculation with <tt>e^x</tt> and <tt>ln(x)</tt>. We already have the <tt>e^x</tt> approximation, but no good <tt>ln(x)</tt>. The <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">old approximation</a> is very bad, so we need a better one. So what now?</p><h2>Approximation of ln(x)</h2><p>Here comes the big trick: Rember that we have the nice <tt>e^x</tt> approximation? Well, <tt>ln(x)</tt> is exactly the inverse function! That means we just need to transform the above approximation so that the output of <tt>e^x</tt> is transformed back into the original input.</p><p>That&#8217;s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:</p><pre>final double tmp = (Double.doubleToLongBits(val) >> 32);</pre><p>Now solve the equation<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre><p> for val:</p><ol><li>The original formula<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre><li>Perform subtraction<pre>tmp = 1512775 * val + 1072632447</pre><li>Bring value to other side<pre>tmp - 1072632447 = 1512775 * val</pre><li>Divide by factor<pre>(tmp - 1072632447) / 1512775 = val</pre><li>Finally, val on the left side<pre>val = (tmp - 1072632447) / 1512775</pre></ol><p>Voíla, now we have a nice approximation of <tt>ln(x)</tt>:</p><pre class="brush: java; title: ; notranslate">public double ln(double val) {
    final double x = (Double.doubleToLongBits(val) &gt;&gt; 32);
    return (x - 1072632447) / 1512775;
}</pre><h2>Combine Both Approximations</h2><p>Finally we can combine the two approximations into <tt>e^(ln(a) * b)</tt>:</p><pre class="brush: java; title: ; notranslate">public static double pow1(final double a, final double b) {
    // calculate ln(a)
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final double ln_a = (x - 1072632447) / 1512775;

    // ln(a) * b
    final double tmp1 = ln_a * b;

    // e^(ln(a) * b)
    final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre><p>Between the two shifts, we can simply insert the <tt>tmp1</tt> calculation into the tmp2 calculation to get</p><pre class="brush: java; title: ; notranslate">public static double pow2(final double a, final double b) {
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre><p>Now simplify <tt>tmp2</tt> calculation:</p><ol><li>The original formula<pre>tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))</pre><li>We can drop the factor <tt>1512775</tt><pre>tmp2 = (x - 1072632447) * b + (1072693248 - 60801)</pre><li>And finally, calculate the substraction<pre>tmp2 = b * (x - 1072632447) + 1072632447</pre></ol><h2>The Result</h2><p>That&#8217;s it! Add some casts, and the complete function is the same as above.</p><pre class="brush: java; title: ; notranslate">public static double pow(final double a, final double b) {
    final int tmp = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) tmp2) &lt;&lt; 32);
}</pre><p>This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif?9d7bd4" alt=':-)' class='wp-smiley' /></p><p><strong>UPDATE</strong> Recently there several other approximative <tt>pow</tt> calculation methods have been developed, here are some others that I have found through <a href="http://www.reddit.com/r/programming/comments/8kftl/fast_pow_approximation_in_java_and_c/">reddit</a>:</p><ul><li><a href="http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html">Fast pow() With Adjustable Accuracy</a> &#8212; This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.</li><li><a href="http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html">Fast SSE2 pow: tables or polynomials?</a> &#8212; Uses <a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE </a> operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.</li></ul><p>Please post what you think about this!</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/feed/</wfw:commentRss> <slash:comments>41</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 5/36 queries in 0.029 seconds using disk: basic
Object Caching 666/724 objects using disk: basic

Served from: martin.ankerl.com @ 2012-02-04 11:02:34 -->
