<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martin Ankerl &#187; programming</title>
	<atom:link href="http://martin.ankerl.com/tag/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://martin.ankerl.com</link>
	<description>No movement is faster than no movement</description>
	<lastBuildDate>Tue, 13 Jul 2010 05:31:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Two Word Anagram Finder Algorithm (in Ruby)</title>
		<link>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/</link>
		<comments>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/#comments</comments>
		<pubDate>Sat, 09 Aug 2008 19:32:30 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[benchmark]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[tricks]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=156</guid>
		<description><![CDATA[Today I have got some sourcecode for you. There is a little programming challenge named The Self-Documenting Code Contest that is quite fun, they try to find the cleanest and easiest to read code for this task: Write a program that generates all two-word anagrams of the string &#8220;documenting&#8221;. Here&#8217;s a word list you might [...]]]></description>
			<content:encoded><![CDATA[<p>Today I have got some sourcecode for you. There is a little programming challenge named <a href="http://selfexplanatorycode.blogspot.com/">The Self-Documenting Code Contest</a> that is quite fun, they try to find the cleanest and easiest to read code for this task:</p>
<blockquote><p>
Write a program that generates all two-word anagrams of the string &#8220;documenting&#8221;. Here&#8217;s a word list you might want to use: <a href='http://martin.ankerl.com/wp-content/uploads/2008/08/wordlist.zip'>wordlist.zip</a>.</p>
<p>When you&#8217;re done, send the results to <a href="mailto:selfdocumenting@hotmail.com">selfdocumenting@hotmail.com</a>.</p>
<p>Good luck!
</p></blockquote>
<p>So this caught my interest and i wrote a little entry in Ruby that is 23 lines long with whitespace and very nice to read. But I won&#8217;t show you this code until the contest is over, and this is not the reason for this post. The reason is, that the nice version takes about 2 seconds, and somebody else has coded a Python solution that takes only 1 second (I have no idea what his code looks like). This post is about a fast anagram finding algorithm, and how I developed this algorithm. The final result takes about 0.11 seconds.</p>
<h1>Algorithm</h1>
<p>The most basic algorithm has two phases:</p>
<ol>
<li>Read in the file
<li>Build all combinations of two words and compare the letter count with the query.
</ol>
<p>Building the combinations is usually done with two nested loops and takes O(n^2) runtime. This is slow, so I have added another step in between:</p>
<h2>Idea #1: Filter out Candidate Words</h2>
<p>The second step is really slow, but it would be a lot faster if it has to handle less words. So I wrote a little filtering step that lets only words through which are made out of the same letters as the query word.</p>
<p>For example when the query is <tt>documenting</tt>, the word <tt>men</tt> or <tt>go</tt> and even <tt>too</tt> are extracted, even if the number of letters might not match. But that&#8217;s not important, what is important is that the number of possible words are reduced a lot, and so the next phase is faster.</p>
<h2>Idea #2: Use a Commutative Hashing Function</h2>
<p>String comparisons are slow. To common way to find out if the strings <tt>coming</tt> with <tt>tuned</tt> is an anagram of the word <tt>documenting</tt> is to sort the letters and make a comparison, like this:</p>
<pre class="brush: ruby;">
irb(main):003:0&gt; &quot;documenting&quot;.unpack(&quot;c*&quot;).sort.pack(&quot;c*&quot;)
=&gt; &quot;cdegimnnotu&quot;
irb(main):004:0&gt; (&quot;coming&quot; + &quot;tuned&quot;).unpack(&quot;c*&quot;).sort.pack(&quot;c*&quot;)
=&gt; &quot;cdegimnnotu&quot;
</pre>
<p>The strings are equal, so we have a match. But this comparison is terribly slow! What&#8217;s worse, the computations have to be redone for each match. It would be much better to just compare hash values, and find a hash function to quickly check if we might have a match, and only do the string comparison when the hash check matches. The hash has to be good enough that we don&#8217;t have too much false positives (hashes are equal but the real comparisons not) to get a speed advantage. So why not just sum up all the letters bytes? </p>
<pre class="brush: ruby;">
irb(main):005:0&gt; &quot;documenting&quot;.sum
=&gt; 1181
irb(main):006:0&gt; &quot;coming&quot;.sum + &quot;tuned&quot;.sum
=&gt; 1181
</pre>
<p>Ruby&#8217;s <a href="http://www.ruby-doc.org/core/classes/String.html#M000857">String#sum</a> does exactly this. we can now precalculate the sum for each word, and to find a match we just add the two hashes and compare the result to the query&#8217;s hash:</p>
<pre class="brush: ruby;">
irb(main):007:0&gt; query=&quot;documenting&quot;; first=&quot;coming&quot;; second=&quot;tuned&quot;
=&gt; &quot;tuned&quot;
irb(main):008:0&gt; first.sum + second.sum == query.sum
=&gt; true
</pre>
<p>When this very quick check returns true, we have to do the string comparison to be absolutely sure it is a match. This considerably speeds up the whole program, but it is still O(n^2).</p>
<h2>Idea #3: Reformulate Problem</h2>
<p>Now here comes the trickiest and coolest part. Since Idea #2 the slowest part is matching the numbers, with still quadratic complexity. But the hard task is not anagram finding any more, we have reduced it to finding two hashes that combined have the same hash as the query. We can reformulate this problem into something completely detached from the anagram problem:</p>
<blockquote><p>
Given a list of numbers, find all combination of two numbers that add up to a given number
</p></blockquote>
<p>When we concentrate on just this problem and ignore the rest, we might come up with a better way of doing things.</p>
<p>I came up with a fast solution, described below. Somebody posted a better solution that is both faster and simpler, if you want just this final solution <a href="#idea4">skip ahead to Idea #4</a> as the following description is outdated.</p>
<p>It clearly looks stupid to just try all combinations to add the numbers.<br />
So lets sort them first. Quicksort is fast, especially with numbers, so no worries here. Now consider a list of numbers like this example:</p>
<pre>1   3   7   10   10   12   17   20   22   23   24   24   25   26   30</pre>
<p>Find all the combinations of two number that add up to 27. They are</p>
<ul>
<li>1 + 26 = 27
<li>3 + 24 = 27
<li>7 + 20 = 27
<li>10 + 17 = 27
<li>10 + 17 = 27 (a second time)
</ul>
<p>You can detect a pattern here: the first number always increases, the second number always decreases! We can now formulate an algorithm for this:</p>
<p>We can have two pointers to the array, one starting from the left side, the other starting from the right side. When the numbers behind the pointers add up to a bigger result than the query (e.g. 1 + 30 = 31), we decrease the right pointer to find a smaller combination (1 + 26 = 27). When the sums are too small (1 + 25 = 26), we move the left pointer to the right (3 + 25 = 28).</p>
<p>This way we walk through the whole array in O(n) time and the sum of the pointers is always kept as close the the desired result as possible. When the pointers meet each other, we can stop the whole process or otherwise we would just reverse the words. </p>
<p>This algorithm gets a bit more complicated when you consider that we might have lots of numbers in it that are equal, whenever this happens you have to fall back into an O(n^2) matching algorithm for just this section.</p>
<h2><a name="idea4"></a>Idea #4: Use Hash directly</h2>
<p><b>UPDATE</b> Scrap the implementation in idea #3. A blog post here from a reader of this article posted a way to do this really in O(n), without any sorting which is O(n*log(n)). The idea is to use a hashmap that maps from the hash key of the word to its matches:</p>
<pre class="brush: ruby;">
M = {}
S = the target sum
for each element e in the list
      if M[S-e] exists? (e,S-e) is a pair
      add e to the M
</pre>
<p>Just use a Hashmap that maps from the cummulative hash of a word to a list of words that have the same hash. Whenever a new word is added, get the list of words that is stored under <tt>query.sum - current_word.sum</tt>. When the hashes are the same we just have to create a list of all the matches under this key, and check each of the matches sequentially for equality. This is just normal hash collision handling through a linked list. That&#8217;s very simple and works like a charm.</p>
<p>I have revised the code, it got both simpler and faster. That&#8217;s a win-win situation, wohoo! </p>
<h1>The Sourcecode</h1>
<p>I hope the code is understandable now with the above explanation. If you have any questions or ideas, please share them here!</p>
<pre class="brush: ruby;">
#!/usr/bin/ruby

# created by Martin Ankerl http://martin.ankerl.com/

class String
	# creates an array of characters
	def letters
		unpack(&quot;c*&quot;)
	end
end

class Array
	# converts an array of letters back into a String
	def word
		pack(&quot;c*&quot;)
	end
end

query = &quot;documenting&quot;
query_letters_sorted = query.letters.sort
txt = File.read('wordlist.txt').downcase

# to quickly check if a letter is part of the query word
used_letters = Array.new(256, nil)
query_letters_sorted.each do |letter|
	used_letters[letter] = true
end

# Maps from cummulative hash of a word to a list of words that have this hash code.
hashToWords = Hash.new do |hash, key|
	hash[key] = Array.new
end

query_hash = query.sum

prev = 0
txt_size = txt.size
separator = 10
idx = txt.index(separator, prev)
while prev &lt; txt_size

	letter_idx = prev

	# no need to check end of word because it is \n
	# which is not part of the word anyways
	while used_letters[txt[letter_idx]]
		letter_idx += 1
	end

	# ignore word if the above quick check fails
	if letter_idx == idx
		word = txt[prev, idx-prev]

		# check all key matches
		key = word.sum
		hashToWords[query_hash - key].each do |other_word|
			if (word.letters + other_word.letters).sort == query_letters_sorted
				puts &quot;#{word} #{other_word}&quot;
				puts &quot;#{other_word} #{word}&quot;
			end
		end

		# insert word
		hashToWords[key] &lt;&lt; word
	end

	prev = idx + 1

	# no need to check end of file because we have to end with new line
	idx = txt.index(separator, prev)
end
</pre>
<p>When you rewrite the algorithm in C++ or Java or Python I am sure it will be faster than this one. But this is not the point of this post. The point is, &#8220;The Best Optimizer is between Your Ears&#8221; (Michael Abrash, <a href="http://www.byte.com/abrash/">Graphics Programming Black Book</a>).</p>
<p>Have fun!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=156&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Optimized pow() approximation for Java, C / C++, and C#</title>
		<link>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/</link>
		<comments>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 22:48:08 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[tricks]]></category>
		<category><![CDATA[floating point]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=96</guid>
		<description><![CDATA[I have already written about approximations of e^x, log(x) and pow(a, b) in my post Optimized Exponential Functions for Java. Now I have more In particular, the pow() function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation: Approximation of pow() in Java public static [...]]]></description>
			<content:encoded><![CDATA[<p>I have already written about approximations of <tt>e^x</tt>, <tt>log(x)</tt> and <tt>pow(a, b)</tt> in my post <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">Optimized Exponential Functions for Java</a>. Now I have more <img src='http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  In particular, the <tt>pow()</tt> function is now even faster, simpler, and more accurate. Without further ado, I proudly give you the brand new approximation:</p>
<h1>Approximation of pow() in Java</h1>
<pre class="brush: java;">public static double pow(final double a, final double b) {
    final int x = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int y = (int) (b * (x - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) y) &lt;&lt; 32);
}</pre>
<p>This is really very compact. The calculation only requires 2 shifts, 1 mul, 2 add, and 2 register operations. That&#8217;s it! In my tests it usually within an error margin of 5% to 12%, in extreme cases sometimes up to 25%. A careful analysis is left as an exercise for the reader. This is very usable for in e.g. <a href="http://en.wikipedia.org/wiki/Metaheuristic">metaheuristics</a> or <a href="http://en.wikipedia.org/wiki/Artificial_neural_network">neural nets</a>.</p>
<p>I use Linux, Java 1.6.0-b105 with the server VM, and execute the benchmark with this command:
<pre>sudo nice -n -20 java -cp . -server PowTest</pre>
<p> The approximation is <b>27 times faster</b> than Math.pow() on my Pentium-M. On a Pentium 4 it is <b>41 times faster</b>. Unfortunately, microbenchmarks are difficult to do in Java, so your mileage may vary. You can download the benchmark <a href="/files/PowTest.java">PowTest.java</a> and have a look, I have tried to prevent overoptimization while still having a low overhead.</p>
<h1>Approximation of pow() in C and C++</h1>
<pre class="brush: cpp;">double pow(double a, double b) {
    int tmp = (*(1 + (int *)&amp;a));
    int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
    double p = 0.0;
    *(1 + (int * )&amp;p) = tmp2;
    return p;
}</pre>
<p>Compiled on my Pentium-M with gcc 4.1.2:
<pre>gcc -O3 -march=pentium-m -fomit-frame-pointer -fno-strict-aliasing</pre>
<p>This version is <b>7.8 times</b> faster than pow() from the standard library.</p>
<p><strong>WARNING</strong>! you HAVE to use the <tt>-fno-strict-aliasing</tt> option, or this does not work!</p>
<h1>Approximation of pow() in C#</h1>
<p>Jason Jung has posted a port of the this code to C#: </p>
<pre class="brush: csharp;">public static double PowerA(double a, double b) {
  int tmp = (int)(BitConverter.DoubleToInt64Bits(a) &gt;&gt; 32);
  int tmp2 = (int)(b * (tmp - 1072632447) + 1072632447);
  return BitConverter.Int64BitsToDouble(((long)tmp2) &lt;&lt; 32);
}</pre>
<h1>How the Approximation was Developed</h1>
<p>It is quite impossible to understand what is going on in this function, it just magically works. To shine a bit more light on it, here is a detailed description how I have developed this.</p>
<h2>Approximation of e^x</h2>
<p>As described <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">here</a>, the paper &#8220;<a href="http://citeseer.ist.psu.edu/schraudolph98fast.html">A Fast, Compact Approximation of the Exponential Function</a>&#8221; develops a C macro that does a good job at exploiting the IEEE 754 floating-point representation to calculate <tt>e^x</tt>. This macro can be transformed into Java code straightforward, which looks like this:</p>
<pre class="brush: java;">public static double exp(double val) {
    final long tmp = (long) (1512775 * val + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp &lt;&lt; 32);
}</pre>
<h2>Use Exponential Functions for a^b</h2>
<p>Thanks to the power of math, we know that <tt>a^b</tt> can be transformed like this:</p>
<ol>
<li>Take exponential
<pre>a^b = e^(ln(a^b))</pre>
<li>Extract b
<pre>a^b = e^(ln(a)*b)</pre>
</ol>
<p>Now we have expressed the pow calculation with <tt>e^x</tt> and <tt>ln(x)</tt>. We already have the <tt>e^x</tt> approximation, but no good <tt>ln(x)</tt>. The <a href="http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/">old approximation</a> is very bad, so we need a better one. So what now?</p>
<h2>Approximation of ln(x)</h2>
<p>Here comes the big trick: Rember that we have the nice <tt>e^x</tt> approximation? Well, <tt>ln(x)</tt> is exactly the inverse function! That means we just need to transform the above approximation so that the output of <tt>e^x</tt> is transformed back into the original input.</p>
<p>That&#8217;s not too difficult. Have a look at the above code, we now take the output and move backwards to undo the calculation. First reverse the shift:</p>
<pre>final double tmp = (Double.doubleToLongBits(val) >> 32);</pre>
<p>Now solve the equation
<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre>
<p> for val:</p>
<ol>
<li>The original formula
<pre>tmp = (1512775 * val + (1072693248 - 60801))</pre>
<li>Perform subtraction
<pre>tmp = 1512775 * val + 1072632447</pre>
<li>Bring value to other side
<pre>tmp - 1072632447 = 1512775 * val</pre>
<li>Divide by factor
<pre>(tmp - 1072632447) / 1512775 = val</pre>
<li>Finally, val on the left side
<pre>val = (tmp - 1072632447) / 1512775</pre>
</ol>
<p>Voíla, now we have a nice approximation of <tt>ln(x)</tt>:</p>
<pre class="brush: java;">public double ln(double val) {
    final double x = (Double.doubleToLongBits(val) &gt;&gt; 32);
    return (x - 1072632447) / 1512775;
}</pre>
<h2>Combine Both Approximations</h2>
<p>Finally we can combine the two approximations into <tt>e^(ln(a) * b)</tt>:</p>
<pre class="brush: java;">public static double pow1(final double a, final double b) {
    // calculate ln(a)
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final double ln_a = (x - 1072632447) / 1512775;

    // ln(a) * b
    final double tmp1 = ln_a * b;

    // e^(ln(a) * b)
    final long tmp2 = (long) (1512775 * tmp1 + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre>
<p>Between the two shifts, we can simply insert the <tt>tmp1</tt> calculation into the tmp2 calculation to get</p>
<pre class="brush: java;">public static double pow2(final double a, final double b) {
    final double x = (Double.doubleToLongBits(a) &gt;&gt; 32);
    final long tmp2 = (long) (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801));
    return Double.longBitsToDouble(tmp2 &lt;&lt; 32);
}</pre>
<p>Now simplify <tt>tmp2</tt> calculation:</p>
<ol>
<li>The original formula
<pre>tmp2 = (1512775 * (x - 1072632447) / 1512775 * b + (1072693248 - 60801))</pre>
<li>We can drop the factor <tt>1512775</tt>
<pre>tmp2 = (x - 1072632447) * b + (1072693248 - 60801)</pre>
<li>And finally, calculate the substraction
<pre>tmp2 = b * (x - 1072632447) + 1072632447</pre>
</ol>
<h2>The Result</h2>
<p>That&#8217;s it! Add some casts, and the complete function is the same as above.</p>
<pre class="brush: java;">public static double pow(final double a, final double b) {
    final int tmp = (int) (Double.doubleToLongBits(a) &gt;&gt; 32);
    final int tmp2 = (int) (b * (tmp - 1072632447) + 1072632447);
    return Double.longBitsToDouble(((long) tmp2) &lt;&lt; 32);
}</pre>
<p>This concludes my little tutorial on microoptimization of the pow() function. If you have come this far, I congratulate your presistence <img src='http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><strong>UPDATE</strong> Recently there several other approximative <tt>pow</tt> calculation methods have been developed, here are some others that I have found through <a href="http://www.reddit.com/r/programming/comments/8kftl/fast_pow_approximation_in_java_and_c/">reddit</a>:</p>
<ul>
<li><a href="http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html">Fast pow() With Adjustable Accuracy</a> &#8212; This looks quite a bit more sophisticated and precise than my approximation. Written in C and for float values. A Java port should not be too difficult.
</li>
<li><a href="http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html">Fast SSE2 pow: tables or polynomials?</a> &#8212; Uses <a href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE </a> operation and seems to be a bit faster than the table approach from the link above with the potential to scale better when due to less cache usage.
</li>
</ul>
<p>Please post what you think about this!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=96&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Ajax Dojo Comet Tutorial</title>
		<link>http://martin.ankerl.com/2007/08/21/ajax-dojo-comet-tutorial/</link>
		<comments>http://martin.ankerl.com/2007/08/21/ajax-dojo-comet-tutorial/#comments</comments>
		<pubDate>Tue, 21 Aug 2007 12:41:22 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[ajax]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[videos]]></category>
		<category><![CDATA[comet]]></category>
		<category><![CDATA[dojo]]></category>
		<category><![CDATA[howto]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=92</guid>
		<description><![CDATA[EDIT: This tutorial is for an old version of dojo / comet, and it will not work in a recent version! Markus Holzmann, an intern at Profactor of my fellow colleague Philipp Hartl, had the opportunity to experiment with Ajax during his job. He wrote a tutorial about how to push events from the server [...]]]></description>
			<content:encoded><![CDATA[<p><strong>EDIT</strong>: This tutorial is for an old version of dojo / comet, and it will not work in a recent version!</p>
<p>Markus Holzmann, an intern at <a href="http://www.profactor.at/">Profactor</a> of my fellow colleague <a href="http://leanaustria.net/">Philipp Hartl</a>, had the opportunity to experiment with <a href="http://en.wikipedia.org/wiki/Ajax_(programming)">Ajax</a> during his job. He wrote a tutorial about how to push events from the server to the client. For example, display popup messages on all browsers at the same time (see screencast in <a href="/files/hello_comet.html" target="_blank">full resolution here</a>):<br />
<center><br />
<a href="/files/hello_comet.html" target="_blank">  <object width="400" height="317"><param name="wmode" value="transparent"></param>
    <embed src="/files/Hello_Comet.swf" wmode="transparent" width="401" height="317" type="application/x-shockwave-flash"></embed></object><br />
</a><br />
</center><br />
Read on how Markus did this:</p>
<p><span id="more-92"></span></p>
<h1>Cometd Hello World</h1>
<p>
I&#8217;ve read Chris Bucchere&#8217;s <a href="http://thebdgway.blogspot.com/2006/11/say-hello-world-to-comet.html">Say Hello World to Comet</a> and built an application based on this using a more current version of <a href="http://www.mortbay.org/">Jetty</a> (version <a href="http://dist.codehaus.org/jetty/jetty-6.1.5/">6.1.5</a>) which I embedded into a <a href="http://tomcat.apache.org/">Tomcat</a> v5.5 Server. For the developing I used <a href="http://www.eclipse.org/">Eclipse</a> 3.2.</p>
<p><h1>Start your Engines</h1>
<p>At first you have to get the server running. As I mentioned I embedded Jetty into a Tomcat server. Therefore you have configure the libraries:</p>
<ol>
<li>Add the packages <tt>org.mortbay.cometd</tt> and <tt>dojox.cometd</tt> to your source folder and delete the <tt>client</tt> package in the <tt>org.mortbay.cometd</tt> package.</li>
<li>Add <tt>jetty-util-6.1.5.jar</tt>, <tt>jetty-6.1.5.jar</tt> and <tt>servlet-api-2.5-6.1.5.jar</tt> to your build path.</li>
<li>Copy the <tt>jetty-util-6.1.5.jar</tt> file into the <tt>/lib</tt> folder in the <tt>WEB-INF</tt> directory.</li>
</ol>
<p>Replace the existing servlets in your web.xml &#8211; file in the WEB-INF &#8211; folder with the following servlets:</p>
<pre>&lt;servlet&gt;
  &lt;servlet-name&gt;cometd&lt;/servlet-name&gt;
  &lt;servlet-class&gt;org.mortbay.cometd.continuation.ContinuationCometdServlet&lt;/servlet-class&gt;
  &lt;load-on-startup&gt;1&lt;/load-on-startup&gt;
&lt;/servlet&gt;
&lt;servlet-mapping&gt;
  &lt;servlet-name&gt;cometd&lt;/servlet-name&gt;
  &lt;url-pattern&gt;/cometd/*&lt;/url-pattern&gt;
 &lt;/servlet-mapping&gt;</pre>
<p>For the project I used the <a href="http://dojotoolkit.org/">dojo toolkit</a> (version 0.4.3) which has an integrated <a href="http://www.cometd.com/">COMETd</a> class that makes it easy to build comet projects. <a href="http://download.dojotoolkit.org/release-0.4.3/dojo-0.4.3-ajax.tar.gz">Download it</a> and add it to your <tt>WebContent</tt> folder.</p>
<p>
When you&#8217;ve done all this, the hardest piece of work for this program is already done.</p>
<h1>Hack the Code</h1>
<p>Now you can implement the code for the client side: You need a HTML file with a button on it. The code for this looks like this (<a href="/files/hello_comet_test.html">download</a>):</p>
<pre>&lt;html&gt;
  &lt;head&gt;
    &lt;script type=&quot;text/javascript&quot; src=&quot;../dojo.js&quot;&gt;&lt;/script&gt;
    &lt;script type=&quot;text/javascript&quot;&gt;<b>
      dojo.require(&quot;dojo.io.cometd&quot;);

      cometd.init({}, &quot;cometd&quot;);

      cometd.subscribe(&quot;/hello/world&quot;, false, &quot;publishHandler&quot;);

      publishHandler = function(msg) {
        alert(msg.data.test);
      }</b>
    &lt;/script&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;input type=&quot;button&quot;
       onclick=&quot;<b>cometd.publish('/hello/world', { test: 'hello world' } )</b>&quot;
       value=&quot;Click Me!&quot;&gt;
  &lt;/body&gt;
&lt;/html&gt;</pre>
<p>Line by line, the above bold code works like this:</p>
<ol>
<li>In the line
<pre>&lt;script type="text/javascript" src="../dojo.js"&gt;&lt;/script&gt;</pre>
<p> you integrate the dojo toolkit into the project.</p>
<li>To activate the cometd class of dojo:
<pre>dojo.require("dojo.io.cometd");</pre>
<li>Connect the server with the client:
<pre>cometd.init({}, "cometd");</pre>
<li>Here we say what to do when there is a subscribe event:
<pre>cometd.subscribe("/hello/world", false, "publishHandler");</pre>
<li>Last but not least, the <tt>publishHandler</tt> function serves as the callback function, which uses <tt>alert</tt> to show a simple message box:
<pre>publishHandler = function(msg) {
  alert(msg.data.test);
}</pre>
</ol>
<h1>Give it a Try</h1>
<p>When you load the HTML file now, you can click on the button and an alert box saying <i>hello world</i> will appear:</p>
<p>
<center><img src="/files/helloworld.png" width="324" height="124" /></center></p>
<p>
The reason for this is that when you click the code
<pre>cometd.publish('/hello/world', { test: 'hello world' } )</pre>
<p> is executed which publishes a text on the channel with the id <tt>/hello/world</tt>.</p>
<p>
The funny thing is that this is able to run on any number of browsers. Everytime when a client clicks the button, on <i>all</i> browsers that view this page the alert box is shown. (See screencast above).</p>
<h1>Pushing Data from Server to Client</h1>
<p>You can also add serverside code to trigger an event. I wrote a JSP file with the following code:</p>
<pre>&lt;%@page import="java.util.*"%&gt;
&lt;%@page import="dojox.cometd.*" %&gt;
&lt;%
Bayeux b = (Bayeux)getServletContext().getAttribute(Bayeux.DOJOX_COMETD_BAYEUX);
Channel c = b.getChannel("/hello/world",false);

Map&lt;String,Object&gt; message = new HashMap&lt;String,Object&gt;();
message.put("test", "jsp: hello world");

c.publish(b.newClient("server_user",null),message, "new server message");
%&gt;</pre>
<p>When this page is loaded, an alert popup appears at the page saying <i>jsp: hello world</i>.</p>
<p>That&#8217;s it. Happy hacking!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=92&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/08/21/ajax-dojo-comet-tutorial/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Erlang Syntax Highlighting</title>
		<link>http://martin.ankerl.com/2007/05/06/erlang-syntax-highlighting/</link>
		<comments>http://martin.ankerl.com/2007/05/06/erlang-syntax-highlighting/#comments</comments>
		<pubDate>Sun, 06 May 2007 18:55:47 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[tricks]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[gedit]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[syntax highlighting]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=88</guid>
		<description><![CDATA[I have written a language definition file for GtkSourceView to get a nice syntax highlighting for Erlang with applications that use this component, e.g. Gnome&#8217;s standard editor gedit. The highlighting looks like this: Here is how to get this to work: Install Erlang Language File Download the file erlang.lang. Copy the file to /usr/share/gtksourceview-1.0/language-specs/erlang.lang. Start [...]]]></description>
			<content:encoded><![CDATA[<p>I have written a language definition file for <a href="http://gtksourceview.sourceforge.net/">GtkSourceView</a> to get a nice syntax highlighting for <a href="http://www.erlang.org/">Erlang</a> with applications that use this component, e.g. Gnome&#8217;s standard editor <a href="http://www.gnome.org/projects/gedit/">gedit</a>.</p>
<p>The highlighting looks like this:<br />
<center><br />
<img src="/files/erlang-gedit.png" alt="Screenshot gedit with Erlang sourcecode" width="420" height="322" /><br />
</center></p>
<p>Here is how to get this to work:<br />
<span id="more-88"></span></p>
<h1>Install Erlang Language File</h1>
<ol>
<li>Download the file <a href="/files/erlang.lang">erlang.lang</a>.</li>
<li>Copy the file to <tt>/usr/share/gtksourceview-1.0/language-specs/erlang.lang</tt>.</li>
<li>Start gedit, open an Erlang file, and choose <tt>View</tt> &gt; <tt>Highlight Mode</tt> &gt; <tt>Sources</tt> &gt; <tt>Erlang</tt>.
</ol>
<h1>Automatically Recognize *.erl</h1>
<p>If you want gedit to automatically recognize that all <tt>.erl</tt> files should be correctly highlighted, you have to define the mime type (more info is <a href="http://zerokspot.com/node/35">here</a>):</p>
<ol>
<li>Create directory to override mime types, in the command line type
<pre>mkdirhier ~/.local/share/mime/packages</pre>
</li>
<li>Download the custom mime file <a href="/files/Override.xml">Override.xml</a> into this directory (if the file already exists, you have to copy the relevant lines by hand):
<pre>cd ~/.local/share/mime/packages
wget http://martin.ankerl.com/files/Override.xml</pre>
</li>
<li>Update the mime database by merging the file:
<pre>update-mime-database ~/.local/share/mime</pre>
</li>
<li>Restart nautilus (or logout &#038; login again):
<pre>killall nautilus</pre>
</li>
</ol>
<p>There you go, Erlang code in all its glory. Happy hacking!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=88&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/05/06/erlang-syntax-highlighting/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>TextAnalyzer &#8211; Automatically Extract Characteristic Words</title>
		<link>http://martin.ankerl.com/2007/01/09/textanalyzer-automatically-extract-characteristic-words/</link>
		<comments>http://martin.ankerl.com/2007/01/09/textanalyzer-automatically-extract-characteristic-words/#comments</comments>
		<pubDate>Tue, 09 Jan 2007 14:36:56 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[freeware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[characteristic words]]></category>
		<category><![CDATA[textanalyzer]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=80</guid>
		<description><![CDATA[TextAnalzyer is a text analyzer tool that finds out words that are characteristic for a given input file. It is independent from any language, and even seems to work well with HTML files. This program is only a little prototype, that shows that this technique seems to work. It&#8217;s public domain, feel free to do [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://martin.ankerl.com/files/textanalyze.rb">TextAnalzyer</a> is a text analyzer tool that finds out words that are characteristic for a given input file. It is independent from any language, and even seems to work well with HTML files.</p>
<p>This program is only a little prototype, that shows that this technique seems to work. It&#8217;s public domain, feel free to do whatever you like with it:<br />
<span id="more-80"></span></p>
<h1>Download</h1>
<p><a href="http://martin.ankerl.com/files/textanalyze.rb">textanalyze.rb</a>, Licence: Public Domain.</p>
<h1>Example</h1>
<ol>
<li>Build an index with a reasonably large amount of data, it should be much larger than the text you want to analyze. For example, I have indexed 76 of Grimm&#8217;s fairy tales with this command:
<p><code>cat *.txt | ruby ../textanalzye.rb c</code></p>
<p>      This creates the file <tt>wordcount.dat</tt> that contains the word count of each word.</li>
<li>To find out which words are characteristic for a specific text, the previously generated reference data is used. To continue the example:
<p><code>cat LittleRedRidingHood.txt |ruby ../textanalzye.rb a</code></p>
<p>This produces the output</p>
<p><code>hood, grandma, riding, hunter, red</code></p>
<p>So the above words seem to be very relevant to LittleRedRidingHood.txt when compared to all of Grimm&#8217;s tales.
</li>
</ol>
<h1>Other Uses</h1>
<p>The previous example seems a bit useless, but there certainly are a lot of useful applications. Here are some ideas:</p>
<ul>
<li>Quickly find out what an unknown text is about</li>
<li>Automatically extract important words from blog entries</li>
<li>Find out what a text is about by reading just 5 words</li>
<li>Automatically create very short descriptions for a large number of documents</li>
</ul>
<p>The currently implemented algorithm even works well with HTML files (To my own surprise. Actually, I am surprised that it works at all…)</p>
<h1>Algorithm</h1>
<p>The main idea is quite simple: the algorithm assumes, that important words are :</p>
<ol>
<li>Often used in the to-be-analyzed text</li>
<li>Seldom used in other texts</li>
</ol>
<p>For example, the second condition ensures that words like &#8220;the&#8221;, &#8220;and&#8221; etc. are not considered important.</p>
<p>The full algorithm to calculate the score of a word (higher==more important) is done with this formula:</p>
<pre>tanh(curVal/curWords*200) - 5*tanh((allVal-curVal)/(allWords-curWords)*200)</pre>
<p>The variables:</p>
<ul>
<li><tt>curVal</tt>: How often the word to score is present in the to-be-analyzed text.</li>
<li><tt>curWords</tt>: Total number of words in the to-be-analyzed text.</li>
<li><tt>allVal</tt>: How often the word to score is present in the indexed dataset.</li>
<li><tt>allWords</tt>: Total number of words of the indexed dataset.</li>
</ul>
<p>Please don&#8217;t ask me how or why this works. I have no idea. I have invented this formula in one of the rare moments when I was enlighted for approximately 10 seconds, quickly wrote it down, and immediately forgot how it worked because my mind was overwhelmed by its beauty and simplicity&#8230; Or something like that <img src='http://martin.ankerl.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=80&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/01/09/textanalyzer-automatically-extract-characteristic-words/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Three Laws of Software Development</title>
		<link>http://martin.ankerl.com/2007/01/05/three-laws-of-software-development/</link>
		<comments>http://martin.ankerl.com/2007/01/05/three-laws-of-software-development/#comments</comments>
		<pubDate>Fri, 05 Jan 2007 22:45:14 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[laws]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=79</guid>
		<description><![CDATA[seecretGeek has a nice blog entry about the three laws of Software Development, inspired by Isaac Asimov&#8217;s Laws of Robotics (minor adaptations from me): A developer must write code that creates value. A developer must make their code easy to maintain, except where such expenditure will conflict with the first law. A developer must reduce [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.secretgeek.net/">seecretGeek</a> has a nice <a href="http://www.secretgeek.net/laws_3.asp">blog entry</a> about the three laws of Software Development, inspired by Isaac Asimov&#8217;s <a href="http://en.wikipedia.org/wiki/Three_Laws_of_Robotics">Laws of Robotics</a> (minor adaptations from me):</p>
<blockquote>
<ol>
<li>A developer must write code that creates value.</li>
<li>A developer must make their code easy to maintain, except where such expenditure will conflict with the first law.</li>
<li>A developer must reduce their code to the smallest size possible, as long as such reduction does not conflict with the first two laws.</li>
</ol>
</blockquote>
<p>That&#8217;s about all there is to software development. Print this out, make copies for all your coworkers, and engrave it in stone plates so that our childrens children will still be able to live by these timeless wisdoms!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=79&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/01/05/three-laws-of-software-development/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Statistical Unit Tests with ensure4j</title>
		<link>http://martin.ankerl.com/2007/01/04/statistical-unit-tests-with-ensure4j/</link>
		<comments>http://martin.ankerl.com/2007/01/04/statistical-unit-tests-with-ensure4j/#comments</comments>
		<pubDate>Thu, 04 Jan 2007 20:54:33 +0000</pubDate>
		<dc:creator>Martin Ankerl</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[freeware]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[tdd]]></category>
		<category><![CDATA[ensure4j]]></category>
		<category><![CDATA[junit]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://martin.ankerl.com/?p=78</guid>
		<description><![CDATA[As part of another project I am developing ensure4j. The syntax (see the examples here) is working quite nicely, ensure4j is already very useful for internal use. Lately I was busy adding tests that are able to verify if some code (e.g. an optimizer that uses random, like genetic algorithm, simulated annealing, &#8230;) produces the [...]]]></description>
			<content:encoded><![CDATA[<p>As part of another project I am developing <a href="http://martin.ankerl.com/2006/08/02/redesigning-junit-asserts/">ensure4j</a>. The syntax (see the examples <a href="http://martin.ankerl.com/2006/08/02/redesigning-junit-asserts/">here</a>) is working quite nicely, ensure4j is already very useful for internal use.</p>
<p>Lately I was busy adding tests that are able to verify if some code (e.g. an optimizer that uses random, like <a href="http://en.wikipedia.org/wiki/Genetic_algorithm">genetic algorithm</a>, <a href="http://en.wikipedia.org/wiki/Simulated_annealing">simulated annealing</a>, &#8230;) produces the desired in e.g. 95% of the cases (Wikipedia has <a href="http://en.wikipedia.org/wiki/Confidence_interval#Practical_example">a nice practical example</a> for confidence intervals).</p>
<h1>Example Usage</h1>
<p>Here is an example that tests the nonsense code <tt>Math.random() * 2</tt>.</p>
<pre class="brush: java;">ensure(new Experiment() {
    public double measure() {
        return Math.random() * 2;
    }
}).between(0.9, 1.1, 0.95).sample(10, 100);</pre>
<p>The code most likely does not make much sense out of context like this, so here is an explanation of what it does:</p>
<p><span id="more-78"></span></p>
<h1>Explanation</h1>
<p>In that example we want to verify that the code  returns values whose <strong>95% confidence interval is between 0.9 and 1.1</strong>. At first take <strong>10 samples</strong> to verify this. If the mean is not as expected between the interval, there is still the chance that this was just bad luck. We can take further samples (<strong>up to 1000</strong>) to rule bad luck out. When we have to take more than 1000 samples and the mean is still not in the specified  still not met, there is a very high probability that the code does not produce the desired result, so we have to fail.</p>
<p>I hope the above description is somewhat understandable. It&#8217;s no piece of cake, and there is no way to cheat around the complexity involved with that kind of statistical tests. I have tried to make the interface as clean and simple as possible, suggestions are always welcome.</p>
<h1>Open Source?</h1>
<p>I am developing ensure4j partly at work, and partly in my free time. I would like to make it open source, but I need to get the OK from my boss first.</p>
<p>ensure4j is the the only implementation I know of that uses statistical tests that can be used with JUnit. If you know something similar than this, please post!</p>
<div style='clear:both'></div><img src="http://martin.ankerl.com/?ak_action=api_record_view&id=78&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://martin.ankerl.com/2007/01/04/statistical-unit-tests-with-ensure4j/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
