<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Martin Ankerl &#187; ruby</title> <atom:link href="http://martin.ankerl.com/category/ruby/feed/" rel="self" type="application/rss+xml" /><link>http://martin.ankerl.com</link> <description>Chunky bacon!!</description> <lastBuildDate>Sat, 04 Feb 2012 10:18:10 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Quickly Solving the &#8220;Instagram Engineering Challenge: The Unshredder&#8221;</title><link>http://martin.ankerl.com/2011/11/15/solving-the-instagram-challenge-quickly/</link> <comments>http://martin.ankerl.com/2011/11/15/solving-the-instagram-challenge-quickly/#comments</comments> <pubDate>Tue, 15 Nov 2011 20:39:47 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[coding]]></category> <category><![CDATA[ruby]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=676</guid> <description><![CDATA[Today I have read about the Instagram Engineering Challenge: The Unshredder, and decided to give it a try. The task is simple to explain: Create a program that can unshred this image (do not try the challenge on this image, &#8230; <a href="http://martin.ankerl.com/2011/11/15/solving-the-instagram-challenge-quickly/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Today I have read about the <a href="http://instagram-engineering.tumblr.com/post/12651721845/instagram-engineering-challenge-the-unshredder">Instagram Engineering Challenge: The Unshredder</a>, and decided to give it a try. The task is simple to explain: Create a program that can unshred this image (do not try the challenge on this image, try the <a href="http://instagram-static.s3.amazonaws.com/images/TokyoPanoramaShredded.png">original PNG source</a> instead!):<br /><center><a href="http://martin.ankerl.com/wp-content/uploads/2011/11/TokyoPanoramaShredded.jpg?9d7bd4"><img src="http://martin.ankerl.com/wp-content/uploads/2011/11/TokyoPanoramaShredded.jpg?9d7bd4" alt="" title="TokyoPanoramaShredded" width="640" height="359" class="alignnone size-full wp-image-683" /></a><br /></center></p><p>I have <a href="http://derstandard.at/plink/1319182777697?sap=2&#038;_pid=23674915#pid23674915">postet here</a> that I think I can solve it in 2 hours, and got some downvotes for that; so I have decided to really give it a try. Long story short, it took me about 2 hours and 35 minutes.</p><p>Before I started developing, I made some quick assumptions to simplify things:</p><ul><li>I want to code it in Ruby, this is the language where I am most productive.</li><li>I can assume the size of the stripes is well known, and I have hardcoded this size.</li><li>The image can be converted to RAW with an external tool, and written into RAW.</li></ul><div>While coding I timed myself, and created a little timeline of my trials and errors. Since I wanted to finish as quickly as possible, the code is very ugly: no tests, some hardcoded constants, etc.</p><p><span id="more-676"></span></p><h1>Code</h1><p>Without further ado, I give you the code:</p></div><pre class="brush: ruby; title: ; notranslate"># 0:08 - how to read binary files...
# 0:14 - index RGB values
# 0:21 - difference calculation seems to privide a reasonable number
# 0:25 - create all pairings of each row (too slow: need to use logic for all slices)
# 0:30 - create all pairings
# 0:31 - sorted pairings
# 0:35 - copy and number image to mspaint to verify
# 0:36 - bugfix, wrong slice row calculation
# 0:48 - getting stuck with combining pairs
# 1:03 - recombining
# 1:14 - thinking about how to ensure not to combine invalid groups (duplicates!)
# 1:33 - got first reasonable order. Hopefully the correct order!!
# 1:47 - wrote first image, wrong order
# 1:53 - got an almost correct image! one slice was aligned wrong.
# 1:58 - no luck with grayscale
# 2:28 - damn black-white skyscraper!
# 2:29 - SUCCESSSS!!!!!!
#
# time not measured: I thought about the problem about 5 minutes before starting hacking.
# So my time is probably about 2:34.
class Img
	attr_reader :w, :h, :slices
	def initialize
		# unpack as 8bit unsigned chars
		@d = IO.binread(&quot;TokyoPanoramaShredded.raw&quot;).unpack(&quot;C*&quot;)
		@w = 640
		@h = 359
		@slices = 20
		@sw = @w / @slices
	end

	def write(sorted)
		data = @d.dup
		source_slice = 0
		sorted.each do |target_slice|
			# copy one slice
			@sw.times do |w|
				@h.times do |h|
					target_idx = idx(w + @sw * source_slice, h)
					source_idx = idx(w + @sw * target_slice, h)
					data[target_idx] = @d[source_idx]
					data[target_idx+1] = @d[source_idx+1]
					data[target_idx+2] = @d[source_idx+2]
				end
			end
			source_slice += 1
		end

		File.open(&quot;out.raw&quot;, &quot;wb&quot;) do |f|
			f.write data.pack(&quot;C*&quot;)
		end
	end

	def idx(x, y)
		(y * @w + x) * 3
	end

	# top left is 0, 0
	def rgb(x, y)
		i = idx(x, y)
		[@d[i], @d[i+1], @d[i+2]]
	end	

	# calculate sum of difference between r, g, b of two columns
	def difference(x1, x2)
		diff = 0
		@h.times do |y|
			# find best in range. This is required because otherwise the black syscraper fucks things up.
			lower = y - 15
			lower = 0 if (lower &lt; 0)

			upper = y + 15
			upper = @h-1 if upper &gt;= @h

			best = 1e100
			lower.upto(upper) do |r|
				v = 0
				rgb1 = rgb(x1, r)
				rgb2 = rgb(x2, r)

				d = rgb1[0] - rgb2[0]
				v += d*d
				d = rgb1[1] - rgb2[1]
				v += d*d
				d = rgb1[2] - rgb2[2]
				v += d*d

				best = v if (v &lt; best)
			end
			diff += best
		end
		diff
	end

	def slice_start_idx(s)
		@sw * s
	end

	def slice_end_idx(s)
		@sw * s + @sw - 1
	end
end

# calculate
img = Img.new

# create ALL pairings.
slices = 20
pairs = []
slices.times do |s1|
	slices.times do |s2|
		next if s1 == s2

		pairs.push [img.difference(img.slice_end_idx(s1), img.slice_start_idx(s2)), [s1, s2] ]
	end
end

def add_to_group(combined, new_group)
	# create array of fixed numbers
	taken_numbers = {}
	taken_left = {}
	taken_right = {}
	combined.each do |group|
		group[1..-2].each do |x|
			taken_numbers[x] = true
		end
		taken_left[group.last] = true
		taken_right[group.first] = true
	end

	was_found = false
	combined.each do |group|
		next if was_found
		if (group.last == new_group.first)
			# insert at back
			was_found = true
			new_group.delete_at(0)
			if taken_numbers[new_group.last] || taken_left[new_group.last] || new_group.last == group.first
				return
			end
			new_group.each do |x|
				group.push x
			end
		elsif (group.first == new_group.last)
			# insert at front
			was_found = true
			new_group.pop
			if taken_numbers[new_group.first] || taken_right[new_group.first] || new_group.first == group.last
				return
			end
			new_group.reverse.each do |x|
				group.insert(0, x)
			end
		end
	end
	if !was_found
		cf = combined.flatten
		new_group.each do |x|
			was_found = was_found || cf.index(x)
		end
		if !was_found
			combined.push new_group
		end
	end
end

# sort, lowest first
pairs.sort!

# create combinations.
# combined has an array of all correspondences.
combined = []
pairs.each do |diff, new_group|

	# insert new group
	add_to_group(combined, new_group)

	# try to recombine everything, until nothing changes any more
	was_recombined = true
	while was_recombined
		new_combined = []
		combined.each do |group|
			add_to_group(new_combined, group)
		end
		was_recombined = combined.size != new_combined.size
		combined = new_combined
	end

	if (combined.size == 1 &amp;&amp; combined[0].size == img.slices)
		# we got everything! Write output image.
		img.write(combined[0])
		exit
	end
end</pre><h1>Result</h1><p>Here is the result I got with this code:<br /><center><img src="http://martin.ankerl.com/wp-content/uploads/2011/11/ordered.jpg?9d7bd4" alt="" title="ordered" width="640" height="359" class="alignnone size-full wp-image-678" /></center></p><h1>Algorithm</h1><p>In hindsight, the algorithm combines two ideas: <a href="http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html">Hierarchical Agglomerative Clustering</a>, and a <del datetime="2011-11-15T21:19:11+00:00">quick and dirty</del> novel distance-metric that defines how &#8220;good&#8221; a pairing is.</p><p>The hierarchical agglomerative clustering is a bit difficult to code, because there are a lot of corner cases when you are allowed to combine pairings and when not. I am not sure if my code is correctly, this should be recoded without time pressure.</p><p>The first version of the distance metric was very simple: calculate the sum of the quadratic differences between the two neighboring pixels when two slices are put together. Unfortunately this does not work for the slices with skyscraper that has black-white stripes: here lots of pixels are white on the left side, but black on the right side since the building is just slightly tilt. After realizing this problem, the solution is simple: for each pixel, find the best matching pixel of the other slice within a certain range. Through trial and error I have chosen +-15 pixels, and finally got the correct image.</ol><p>Happy hacking,<br /> Martin</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2011/11/15/solving-the-instagram-challenge-quickly/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>Cleverness Considered Harmful</title><link>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/</link> <comments>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/#comments</comments> <pubDate>Fri, 10 Dec 2010 22:04:36 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[C++]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[rant]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[Uncategorized]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=501</guid> <description><![CDATA[I have just read this nice quote at the stackoverflow question &#8220;Why is cleverness considered harmful in programming by some people?&#8220;: Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it. &#8211; Alan Perlis Which reminds me of &#8230; <a href="http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I have just read this nice quote at the stackoverflow question &#8220;<a href="http://programmers.stackexchange.com/questions/25276">Why is cleverness considered harmful in programming by some people?</a>&#8220;:</p><blockquote><p>Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.<br /> &#8211; Alan Perlis</p></blockquote><p>Which reminds me of a little code piece I have written recently. I&#8217;ve recently tried to implement a small, little parser for a very simple custom data format, in C++. To do this, I have tried several approaches:</p><h2>1. Boost.Spirit</h2><p>Since we use Boost in our projects, I have started reading about <a href="http://boost-spirit.com/home/">Boost.Spirit</a>, and took some time to decipher the tutorials which contains code <a href="http://www.boost.org/doc/libs/1_45_0/libs/spirit/doc/html/spirit/qi/tutorials/complex___our_first_complex_parser.html">like this</a>:</p><pre class="brush: cpp; title: ; notranslate">bool r = phrase_parse(first, last,
  //  Begin grammar
  (
      '(' &gt;&gt; double_[ref(rN) = _1]
          &gt;&gt; -(',' &gt;&gt; double_[ref(iN) = _1]) &gt;&gt; ')'
  |   double_[ref(rN) = _1]
  ),
  //  End grammar
 space);</pre><p>After half an hour I got annoyed because it simply is too much effort. I don&#8217;t care how well thought out the library is and how powerful it is, it is simply unusable. Maybe I am too stupid, but I am sure that even when I manage to understand it enough to write a decent parser, half a year later I can never debug my code again: it&#8217;s simply too clever.</p><h2>2. Coco</h2><p>I&#8217;ve ditched Boost.Spirit, and tried to use <a href="http://www.ssw.uni-linz.ac.at/Coco/">Coco</a>. I am unfamiliar with this but have seen a colleague use it, so I gave it a try. I was reading the documentation, which looks nice but has 42 pages and since I am a lazy bastard I stopped right there because I just want to get something working, and quickly.</p><h2>3. Hand Written Large Switch</h2><p>I have ditched Cocomo, and started to write my own, very simple code that basically looked like this:</p><pre class="brush: cpp; title: ; notranslate">while (instream &gt;&gt; sym) {
  switch (symbol_map[sym]) {
  case START:
    // do this
    break;
  case WHATEVER:
    // do that
    break;
  }
}</pre><p>After just 10 minutes I got a minimal parser that worked good enough and was extremly readable and understandable code. Everybody with basic C++ understanding can skim over this code and get it. The <a href="http://www.flickr.com/photos/smitty/2245445147/">number of WTF&#8217;s per minute</a> when reading this code is close to zero. I am very happy with this approach, and it really rocks because it is so dead simple.</p><p>You can rightfully say that a simple switch is very inflexible, and not extensible. You are right, but who cares? Almost all code that I have seen that was planed ahead for flexibility that you might need, gets too complicated because what you planed ahead for might never be needed; even worse: most of the time you need flexibility that you cannot know in advance, it only becomes apparent when you have something and running and use it for a while.</ul><h2>Final Words</h2><p>Back to the original quote, based on my experience I would extend it a bit:</p><blockquote><p>Ignorants add complexity; fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.<br /> &#8211; Martin Ankerl</p></blockquote><p>If you find this interesting you might also like consider reading the <a href="http://martin.ankerl.com/2007/01/05/three-laws-of-software-development/">Three Laws of Software Development</a>.</p><p>What do you think?</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2010/12/10/cleverness-considered-harmful/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>svn-shortlog &#8212; Compact &amp; Beautiful Subversion Changelog</title><link>http://martin.ankerl.com/2009/12/23/svn-shortlog-compact-beautiful-subversion-changelog/</link> <comments>http://martin.ankerl.com/2009/12/23/svn-shortlog-compact-beautiful-subversion-changelog/#comments</comments> <pubDate>Wed, 23 Dec 2009 16:58:17 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[coding]]></category> <category><![CDATA[news]]></category> <category><![CDATA[open source]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[ruby]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=303</guid> <description><![CDATA[At work we periodically have short developer meetings to discuss what has happened in the last month. To do this, we go through the bugs in our issue tracking system, and the subversion commits in our repository. Unfortunately, getting an &#8230; <a href="http://martin.ankerl.com/2009/12/23/svn-shortlog-compact-beautiful-subversion-changelog/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>At work we periodically have short developer meetings to discuss what has happened in the last month. To do this, we go through the bugs in our issue tracking system, and the subversion commits in our repository. Unfortunately, getting an overview of the subversion commits was rather cumbersome, and we could not find any efficient tool to do this. Hence, <strong>svn-shortlog</strong> was born.</p><p>This is an attempt to format the subversion log of a one-month period in the following way:</p><ul><li>Beautiful HTML output.</li><li>Compact representation of lots of information</li><li>Usable with a not-so color rich beamer.</li><li>Fully automatic.</li></ul><h2>Usage</h2><ol><li>Install <a href="http://www.ruby-lang.org/de/">Ruby</a> (both 1.8 or 1.9 should work).</li><li>Download <a href="http://svn-shortlog.googlecode.com/svn/trunk/svn-shortlog.rb">svn-shortlog.rb</a>.</li><li>Open <tt>svn-shortlog.rb</tt> with your favourite text editor, and configure the config section according to your needs.</li><li>Doubleclick <tt>svn-shortlog.rb</tt></li><li>Open the generated <tt>changelog_....html</tt> file with your favourite browser.</li></ol><h2>Sample Output</h2><p>Here is a <a target="_blank" href="http://martin.ankerl.com/wp-content/uploads/2009/12/changes_2009-12-01_to_2009-12-31.html">sample output of one month of boost commits</a> into trunk, taken from the <a href="http://www.boost.org/users/download/#repository">public repository</a>. The output is quite information dense, a quick description is in the screenshot:<center><img src="http://martin.ankerl.com/wp-content/uploads/2009/12/documentation.png?9d7bd4" alt="" title="documentation" width="690" height="408" /></center> All commits are structured by user, then by date. Each commit is on one line. You can click each line to see the full information related to a commit.</p><h2>Issues</h2><p>Ideas, suggestions, problems? Please post them as a comment here, at the <a href="https://code.google.com/p/svn-shortlog/issues/list">bug tracker</a>.</p><h2>Credits</h2><p>This tool is based on the idea from my colleague <a href="http://cheind.wordpress.com/">Christoph Heindl</a> and inspired by <a href="http://groups.google.com/group/linux.kernel/msg/d43224c9ba53f0cc?">Linus&#8217; Kernel shortlog</a> and <a href="http://mail.google.com/">Gmail</a>.</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2009/12/23/svn-shortlog-compact-beautiful-subversion-changelog/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>How to Generate Random Colors Programmatically</title><link>http://martin.ankerl.com/2009/12/09/how-to-create-random-colors-programmatically/</link> <comments>http://martin.ankerl.com/2009/12/09/how-to-create-random-colors-programmatically/#comments</comments> <pubDate>Wed, 09 Dec 2009 19:09:17 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[coding]]></category> <category><![CDATA[howto]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[tutorial]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=254</guid> <description><![CDATA[Creating random colors is actually more difficult than it seems. The randomness itself is easy, but aesthetically pleasing randomness is more difficult. For a little project at work I needed to automatically generate multiple background colors with the following properties: &#8230; <a href="http://martin.ankerl.com/2009/12/09/how-to-create-random-colors-programmatically/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Creating random colors is actually more difficult than it seems. The randomness itself is easy, but aesthetically pleasing randomness is more difficult. For a little project at work I needed to automatically generate multiple background colors with the following properties:</p><ul><li>Text over the colored background should be easily readable</li><li>Colors should be very distinct</li><li>The number of required colors is not initially known</li></ul><h2>Naïve Approach</h2><p>The first and simplest approach is to create random colors by simply using a random number between <tt>[0, 256[</tt> for the R, G, B values. I have created a little Ruby script to generate sample HTML code:<pre class="brush: ruby; title: ; notranslate"># generates HTML code for 26 background colors given R, G, B values.
def gen_html
  ('A'..'Z').each do |c|
    r, g, b = yield
    printf &quot;&lt;span style=\&quot;background-color:#%02x%02x%02x; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;\&quot;&gt;#{c}&lt;/span&gt; &quot;, r, g, b
  end
end

# naive approach: generate purely random colors
gen_html { [rand(256), rand(256), rand(256)] }</pre><p> The generated output looks like this:<p style="text-align:center;"><span style="background-color:#a69dd8; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#0c35b0; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#f82750; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#0ebd31; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#5fab4f; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#c538cf; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#014a59; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#e14af8; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#9fb730; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#4bec60; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#ef9345; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#d2ece0; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#9cda80; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#dbc07c; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#7328dd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#1e9942; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#621b7b; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#c830b2; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#362332; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#e8c55d; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#bd8787; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#66c6a4; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#21ec4b; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#782364; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#c3bf15; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#3db35a; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span></p><p>As you can see this is quite suboptimal. Some letters are hard to read because the background is too dark (B, Q, S), other colors look very similar (F, R).</p><h2>Using HSV Color Space</h2><p><a href="http://en.wikipedia.org/wiki/File:HSV_cylinder.png"><img src="http://martin.ankerl.com/wp-content/uploads/2009/12/HSV_cylinder_small.png?9d7bd4" alt="HSV_cylinder_small" title="HSV_cylinder_small" width="250" height="200" style="float:right;margin-left:10px; margin-bottom:10px;" /></a>Let's fix the too dark / too bright problem first. A convenient way to do this is to not use the RGB color space, but <a href="http://en.wikipedia.org/wiki/HSL_and_HSV">HSV</a> (Hue, Saturation, Value). Here you get equally bright and colorful colors by using a fixed value for saturation and value, and just modifying the hue.</p><p>Based on the description provided by the wikipedia article on <a href="http://en.wikipedia.org/wiki/HSL_and_HSV#Converting_to_RGB">conversion from HSV to RGB</a> I have implemented a converter:<pre class="brush: ruby; title: ; notranslate"># HSV values in [0..1[
# returns [r, g, b] values from 0 to 255
def hsv_to_rgb(h, s, v)
  h_i = (h*6).to_i
  f = h*6 - h_i
  p = v * (1 - s)
  q = v * (1 - f*s)
  t = v * (1 - (1 - f) * s)
  r, g, b = v, t, p if h_i==0
  r, g, b = q, v, p if h_i==1
  r, g, b = p, v, t if h_i==2
  r, g, b = p, q, v if h_i==3
  r, g, b = t, p, v if h_i==4
  r, g, b = v, p, q if h_i==5
  [(r*256).to_i, (g*256).to_i, (b*256).to_i]
end</pre><p>Using the generator and fixed values for saturation and value:<pre class="brush: ruby; title: ; notranslate"># using HSV with variable hue
gen_html { hsv_to_rgb(rand, 0.5, 0.95) }</pre><p>returns something like this:<center><span style="background-color:#f379ad; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#7979f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#9079f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#79e5f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#8979f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#79f396; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#79cff3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#79b1f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#7979f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#799ef3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#ecf379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#80f379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#797cf3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#79f3f0; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#9af379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#79f37a; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#f3ad79; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#f3e179; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#79b9f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#e8f379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#f3b379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#f379c9; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#79b8f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#f379dc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#79f37b; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#8e79f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span><br /></center><br /> Much better. The text is easily readable, and all colors have a similar brightness. Unfortunately, since we have limited us to less colors now, the difference between the randomly generated colors is even less than in the first approach.</p><h2>Golden Ratio</h2><p>Using just <tt>rand()</tt> to choose different values for hue does not lead to a good use of the whole color spectrum, it simply is too random.<center><img src="http://martin.ankerl.com/wp-content/uploads/2009/12/distribution-random.png?9d7bd4" alt="distribution-random" title="distribution-random" width="483" height="291" class="alignright size-full wp-image-273" /></center></p><p>Here I have generated 2, 4, 8, 16, and 32 random values and printed them all on a scale. Its easy to see that some values are very tightly packed together, which we do not want.</p><p>Lo and behold, some mathematician has discovered the <a href="http://en.wikipedia.org/wiki/Golden_ratio">Golden Ratio</a> more than 2400 years ago. It has lots of interesting properties, but for us only one is interesting:</p><blockquote><p>[...] Furthermore, it is a property of the golden ratio, <em>&Phi;</em>, that each subsequent hash value divides the interval into which it falls according to the golden ratio!<br /> -- <a href="http://brpreiss.com/books/opus4/html/page214.html">Bruno R. Preiss, P.Eng.</a></p></blockquote><p>Using the golden ratio as the spacing, the generated values look like this:<br /><center><img src="http://martin.ankerl.com/wp-content/uploads/2009/12/distribution-goldenratio.png?9d7bd4" alt="distribution-goldenratio" title="distribution-goldenratio" width="483" height="291" class="alignright size-full wp-image-274" /></center></p><p>Much better! The values are very evenly distributed, regardless how many values are used. Also, the algorithm for this is extremly simple. Just add 1/&Phi; and modulo 1 for each subsequent color.</p><pre class="brush: ruby; title: ; notranslate"># use golden ratio
golden_ratio_conjugate = 0.618033988749895
h = rand # use random start value
gen_html {
  h += golden_ratio_conjugate
  h %= 1
  hsv_to_rgb(h, 0.5, 0.95)
}</pre><p>The final result:<br /><center><span style="background-color:#f37e79; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#7998f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#bbf379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#f379df; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#79f3e3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#f3bf79; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#9c79f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#7af379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#f3799d; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#79c1f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#e4f379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#de79f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#79f3ba; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#f39779; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#797ff3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#a2f379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#f379c6; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#79e9f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#f3d979; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#b579f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#79f392; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#f37984; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#79a8f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#cbf379; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#f379ee; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#79f3d3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span></center></p><p>You can see that the first few values are very different, and the difference decreases as more colors are added (Z and E are already quite similar). Anyways, this is good enough for me.</p><p>And because it is so beautiful, here are some more colors <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_wink.gif?9d7bd4" alt=';-)' class='wp-smiley' /><br /> <tt>s=0.99, v=0.99</tt>, <tt>s=0.25, h=0.8</tt>, and <tt>s=0.3, v=0.99</tt><center><span style="background-color:#024bfd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#94fd02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#fd02de; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#02fdd3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#fd8a02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#4102fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#0dfd02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#fd0256; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#029ffd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#e8fd02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#c802fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#02fd7f; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#fd3602; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#0217fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#61fd02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#fd02aa; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#02f3fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#fdbe02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#7402fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#02fd2b; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#fd0222; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#026bfd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#b5fd02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#fc02fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#02fdb3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#fd6a02; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span><br /></center></p><p><center><span style="background-color:#99a8cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#b7cc99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#cc99c6; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#99ccc4; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#ccb599; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#a699cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#9bcc99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#cc99aa; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#99b9cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#c8cc99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#c299cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#99ccb3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#cca499; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#999dcc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#accc99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#cc99bb; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#99cacc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#ccbf99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#b099cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#99cca2; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#cc99a0; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#99afcc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#becc99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#cc99cc; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#99ccbd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#ccae99; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span></center></p><p><center><span style="background-color:#b1c7fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">A</span> <span style="background-color:#ddfdb1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">B</span> <span style="background-color:#fdb1f3; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">C</span> <span style="background-color:#b1fdf0; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">D</span> <span style="background-color:#fddab1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">E</span> <span style="background-color:#c4b1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">F</span> <span style="background-color:#b4fdb1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">G</span> <span style="background-color:#fdb1ca; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">H</span> <span style="background-color:#b1e1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">I</span> <span style="background-color:#f7fdb1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">J</span> <span style="background-color:#edb1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">K</span> <span style="background-color:#b1fdd7; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">L</span> <span style="background-color:#fdc1b1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">M</span> <span style="background-color:#b1b7fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">N</span> <span style="background-color:#cefdb1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">O</span> <span style="background-color:#fdb1e4; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">P</span> <span style="background-color:#b1fafd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Q</span> <span style="background-color:#fdeab1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">R</span> <span style="background-color:#d4b1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">S</span> <span style="background-color:#b1fdbd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">T</span> <span style="background-color:#fdb1bb; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">U</span> <span style="background-color:#b1d1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">V</span> <span style="background-color:#e7fdb1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">W</span> <span style="background-color:#fdb1fd; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">X</span> <span style="background-color:#b1fde7; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Y</span> <span style="background-color:#fdd0b1; padding:5px; -moz-border-radius:3px; -webkit-border-radius:3px;">Z</span></center></p><p>Have fun!<br /> Martin</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2009/12/09/how-to-create-random-colors-programmatically/feed/</wfw:commentRss> <slash:comments>8</slash:comments> </item> <item><title>Two Word Anagram Finder Algorithm (in Ruby)</title><link>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/</link> <comments>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/#comments</comments> <pubDate>Sat, 09 Aug 2008 19:32:30 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[benchmark]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[tricks]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=156</guid> <description><![CDATA[Today I have got some sourcecode for you. There is a little programming challenge named The Self-Documenting Code Contest that is quite fun, they try to find the cleanest and easiest to read code for this task: Write a program &#8230; <a href="http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Today I have got some sourcecode for you. There is a little programming challenge named <a href="http://selfexplanatorycode.blogspot.com/">The Self-Documenting Code Contest</a> that is quite fun, they try to find the cleanest and easiest to read code for this task:</p><blockquote><p> Write a program that generates all two-word anagrams of the string &#8220;documenting&#8221;. Here&#8217;s a word list you might want to use: <a href="http://martin.ankerl.com/wp-content/uploads/2008/08/wordlist.zip?9d7bd4">wordlist.zip</a>.</p><p>When you&#8217;re done, send the results to <a href="mailto:selfdocumenting@hotmail.com">selfdocumenting@hotmail.com</a>.</p><p>Good luck!</p></blockquote><p>So this caught my interest and i wrote a little entry in Ruby that is 23 lines long with whitespace and very nice to read. But I won&#8217;t show you this code until the contest is over, and this is not the reason for this post. The reason is, that the nice version takes about 2 seconds, and somebody else has coded a Python solution that takes only 1 second (I have no idea what his code looks like). This post is about a fast anagram finding algorithm, and how I developed this algorithm. The final result takes about 0.11 seconds.</p><h1>Algorithm</h1><p>The most basic algorithm has two phases:</p><ol><li>Read in the file<li>Build all combinations of two words and compare the letter count with the query.</ol><p>Building the combinations is usually done with two nested loops and takes O(n^2) runtime. This is slow, so I have added another step in between:</p><h2>Idea #1: Filter out Candidate Words</h2><p>The second step is really slow, but it would be a lot faster if it has to handle less words. So I wrote a little filtering step that lets only words through which are made out of the same letters as the query word.</p><p>For example when the query is <tt>documenting</tt>, the word <tt>men</tt> or <tt>go</tt> and even <tt>too</tt> are extracted, even if the number of letters might not match. But that&#8217;s not important, what is important is that the number of possible words are reduced a lot, and so the next phase is faster.</p><h2>Idea #2: Use a Commutative Hashing Function</h2><p>String comparisons are slow. To common way to find out if the strings <tt>coming</tt> with <tt>tuned</tt> is an anagram of the word <tt>documenting</tt> is to sort the letters and make a comparison, like this:</p><pre class="brush: ruby; title: ; notranslate">
irb(main):003:0&gt; &quot;documenting&quot;.unpack(&quot;c*&quot;).sort.pack(&quot;c*&quot;)
=&gt; &quot;cdegimnnotu&quot;
irb(main):004:0&gt; (&quot;coming&quot; + &quot;tuned&quot;).unpack(&quot;c*&quot;).sort.pack(&quot;c*&quot;)
=&gt; &quot;cdegimnnotu&quot;
</pre><p>The strings are equal, so we have a match. But this comparison is terribly slow! What&#8217;s worse, the computations have to be redone for each match. It would be much better to just compare hash values, and find a hash function to quickly check if we might have a match, and only do the string comparison when the hash check matches. The hash has to be good enough that we don&#8217;t have too much false positives (hashes are equal but the real comparisons not) to get a speed advantage. So why not just sum up all the letters bytes?</p><pre class="brush: ruby; title: ; notranslate">
irb(main):005:0&gt; &quot;documenting&quot;.sum
=&gt; 1181
irb(main):006:0&gt; &quot;coming&quot;.sum + &quot;tuned&quot;.sum
=&gt; 1181
</pre><p>Ruby&#8217;s <a href="http://www.ruby-doc.org/core/classes/String.html#M000857">String#sum</a> does exactly this. we can now precalculate the sum for each word, and to find a match we just add the two hashes and compare the result to the query&#8217;s hash:</p><pre class="brush: ruby; title: ; notranslate">
irb(main):007:0&gt; query=&quot;documenting&quot;; first=&quot;coming&quot;; second=&quot;tuned&quot;
=&gt; &quot;tuned&quot;
irb(main):008:0&gt; first.sum + second.sum == query.sum
=&gt; true
</pre><p>When this very quick check returns true, we have to do the string comparison to be absolutely sure it is a match. This considerably speeds up the whole program, but it is still O(n^2).</p><h2>Idea #3: Reformulate Problem</h2><p>Now here comes the trickiest and coolest part. Since Idea #2 the slowest part is matching the numbers, with still quadratic complexity. But the hard task is not anagram finding any more, we have reduced it to finding two hashes that combined have the same hash as the query. We can reformulate this problem into something completely detached from the anagram problem:</p><blockquote><p> Given a list of numbers, find all combination of two numbers that add up to a given number</p></blockquote><p>When we concentrate on just this problem and ignore the rest, we might come up with a better way of doing things.</p><p>I came up with a fast solution, described below. Somebody posted a better solution that is both faster and simpler, if you want just this final solution <a href="#idea4">skip ahead to Idea #4</a> as the following description is outdated.</p><p>It clearly looks stupid to just try all combinations to add the numbers.<br /> So lets sort them first. Quicksort is fast, especially with numbers, so no worries here. Now consider a list of numbers like this example:</p><pre>1   3   7   10   10   12   17   20   22   23   24   24   25   26   30</pre><p>Find all the combinations of two number that add up to 27. They are</p><ul><li>1 + 26 = 27<li>3 + 24 = 27<li>7 + 20 = 27<li>10 + 17 = 27<li>10 + 17 = 27 (a second time)</ul><p>You can detect a pattern here: the first number always increases, the second number always decreases! We can now formulate an algorithm for this:</p><p>We can have two pointers to the array, one starting from the left side, the other starting from the right side. When the numbers behind the pointers add up to a bigger result than the query (e.g. 1 + 30 = 31), we decrease the right pointer to find a smaller combination (1 + 26 = 27). When the sums are too small (1 + 25 = 26), we move the left pointer to the right (3 + 25 = 28).</p><p>This way we walk through the whole array in O(n) time and the sum of the pointers is always kept as close the the desired result as possible. When the pointers meet each other, we can stop the whole process or otherwise we would just reverse the words.</p><p>This algorithm gets a bit more complicated when you consider that we might have lots of numbers in it that are equal, whenever this happens you have to fall back into an O(n^2) matching algorithm for just this section.</p><h2><a name="idea4"></a>Idea #4: Use Hash directly</h2><p><b>UPDATE</b> Scrap the implementation in idea #3. A blog post here from a reader of this article posted a way to do this really in O(n), without any sorting which is O(n*log(n)). The idea is to use a hashmap that maps from the hash key of the word to its matches:</p><pre class="brush: ruby; title: ; notranslate">
M = {}
S = the target sum
for each element e in the list
      if M[S-e] exists? (e,S-e) is a pair
      add e to the M
</pre><p>Just use a Hashmap that maps from the cummulative hash of a word to a list of words that have the same hash. Whenever a new word is added, get the list of words that is stored under <tt>query.sum - current_word.sum</tt>. When the hashes are the same we just have to create a list of all the matches under this key, and check each of the matches sequentially for equality. This is just normal hash collision handling through a linked list. That&#8217;s very simple and works like a charm.</p><p>I have revised the code, it got both simpler and faster. That&#8217;s a win-win situation, wohoo!</p><h1>The Sourcecode</h1><p>I hope the code is understandable now with the above explanation. If you have any questions or ideas, please share them here!</p><pre class="brush: ruby; title: ; notranslate">
#!/usr/bin/ruby

# created by Martin Ankerl http://martin.ankerl.com/

class String
	# creates an array of characters
	def letters
		unpack(&quot;c*&quot;)
	end
end

class Array
	# converts an array of letters back into a String
	def word
		pack(&quot;c*&quot;)
	end
end

query = &quot;documenting&quot;
query_letters_sorted = query.letters.sort
txt = File.read('wordlist.txt').downcase

# to quickly check if a letter is part of the query word
used_letters = Array.new(256, nil)
query_letters_sorted.each do |letter|
	used_letters[letter] = true
end

# Maps from cummulative hash of a word to a list of words that have this hash code.
hashToWords = Hash.new do |hash, key|
	hash[key] = Array.new
end

query_hash = query.sum

prev = 0
txt_size = txt.size
separator = 10
idx = txt.index(separator, prev)
while prev &lt; txt_size

	letter_idx = prev

	# no need to check end of word because it is \n
	# which is not part of the word anyways
	while used_letters[txt[letter_idx]]
		letter_idx += 1
	end

	# ignore word if the above quick check fails
	if letter_idx == idx
		word = txt[prev, idx-prev]

		# check all key matches
		key = word.sum
		hashToWords[query_hash - key].each do |other_word|
			if (word.letters + other_word.letters).sort == query_letters_sorted
				puts &quot;#{word} #{other_word}&quot;
				puts &quot;#{other_word} #{word}&quot;
			end
		end

		# insert word
		hashToWords[key] &lt;&lt; word
	end

	prev = idx + 1

	# no need to check end of file because we have to end with new line
	idx = txt.index(separator, prev)
end
</pre><p>When you rewrite the algorithm in C++ or Java or Python I am sure it will be faster than this one. But this is not the point of this post. The point is, &#8220;The Best Optimizer is between Your Ears&#8221; (Michael Abrash, <a href="http://www.byte.com/abrash/">Graphics Programming Black Book</a>).</p><p>Have fun!</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2008/08/09/two-word-anagram-finder-algorithm/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>New Release of XDCC-Fetch</title><link>http://martin.ankerl.com/2007/11/04/new-release-of-xdcc-fetch/</link> <comments>http://martin.ankerl.com/2007/11/04/new-release-of-xdcc-fetch/#comments</comments> <pubDate>Sun, 04 Nov 2007 14:41:09 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[coding]]></category> <category><![CDATA[freeware]]></category> <category><![CDATA[news]]></category> <category><![CDATA[open source]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[release]]></category> <category><![CDATA[xdcc]]></category> <category><![CDATA[xdcc-fetch]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=99</guid> <description><![CDATA[XDCC-Fetch is a nice little application written in Ruby that is able to download from XDCC bots on IRC. I have updated it to work with fox 1.6, so this should work with the recent Ruby version. Screenshot Unfortunately I &#8230; <a href="http://martin.ankerl.com/2007/11/04/new-release-of-xdcc-fetch/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p><a href="http://xdccfetch.sourceforge.net/">XDCC-Fetch</a> is a nice little application written in Ruby that is able to download from <a href="http://en.wikipedia.org/wiki/XDCC">XDCC</a> bots on IRC. I have updated it to work with fox 1.6, so this should work with the recent Ruby version.</p><h1>Screenshot</h1><p><center><br /> <img src="/files/xdcc-fetch.png?9d7bd4" width="527" height="488"><br /></center></p><p>Unfortunately I don&#8217;t really have the time nor the interest to continue development for XDCC-Fetch. Please <a href="mailto:martin.ankerl@gmail.com">contact me</a> if you are interested to continue development.</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/11/04/new-release-of-xdcc-fetch/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>New Release of Dice-RPG</title><link>http://martin.ankerl.com/2007/11/04/new-release-of-dice-rpg/</link> <comments>http://martin.ankerl.com/2007/11/04/new-release-of-dice-rpg/#comments</comments> <pubDate>Sun, 04 Nov 2007 14:19:00 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[freeware]]></category> <category><![CDATA[fun]]></category> <category><![CDATA[news]]></category> <category><![CDATA[open source]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[Dice-RPG]]></category> <category><![CDATA[release]]></category> <category><![CDATA[RPG]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=98</guid> <description><![CDATA[I was bored today so I have updated my little program Dice-RPG to work with fox 1.6. What? Dice-RPG is a free dice throwing program that can be used for role playing games. Although I have never played a RPG &#8230; <a href="http://martin.ankerl.com/2007/11/04/new-release-of-dice-rpg/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>I was bored today so I have updated my little program Dice-RPG to work with fox 1.6.</p><h1>What?</h1><p>Dice-RPG is a free dice throwing program that can be used for role playing games. Although I have never played a RPG in my entire life, my <a href="http://wyrm-chris.livejournal.com/">brother</a> forced me to write this tool for the good of mankind (actually, I wrote it because I did not have anything better to do, but don&#8217;t tell him).</p><p>Here is a screenshot:<br /><center><br /> <img src="/files/dice-rpg.png?9d7bd4" width="358" height="325"><br /></center></p><h1>How?</h1><p>To use Dice-RPG,</p><ol><li>Install the Ruby one-click installer from <a href="http://www.ruby-lang.org/">here</a> if you don&#8217;t have it already. This is a runtime, like Java or C#.<li>Download <a href="/files/dice-rpg.rbw">Dice-RPG.rbw</a> (right click and save the link) and doubleclick it.</ol><p>Have fun!</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/11/04/new-release-of-dice-rpg/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>RFind &#8211; Quickly Find Files</title><link>http://martin.ankerl.com/2007/04/01/rfind-quickly-find-files/</link> <comments>http://martin.ankerl.com/2007/04/01/rfind-quickly-find-files/#comments</comments> <pubDate>Sun, 01 Apr 2007 19:35:05 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[freeware]]></category> <category><![CDATA[open source]]></category> <category><![CDATA[ruby]]></category> <category><![CDATA[release]]></category> <category><![CDATA[RFind]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=86</guid> <description><![CDATA[RFind is a little application that indexes the filenames of a given directory, and allows to quickly search this index with regular expressions. The motivation behind this app was that someone thought this had to be in C++ to be &#8230; <a href="http://martin.ankerl.com/2007/04/01/rfind-quickly-find-files/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>RFind is a little application that indexes the filenames of a given directory, and allows to quickly search this index with regular expressions.</p><p>The motivation behind this app was that someone thought this had to be in C++ to be fast, so I proved him wrong: search-on-typing with more than 500,000 indexed filenames is easily possible <img src="http://martin.ankerl.com/wp-includes/images/smilies/icon_smile.gif?9d7bd4" alt=':-)' class='wp-smiley' /></p><p>I have tried to make this little tool very configurable so that is can be useful to everyone. Some of the features are:</p><ul><li>Hierarchical presented search results<li>Search-on-typing<li>Define rules to execute on mouseclick</ul><p><strong>Download:</strong></p><ul><li><a href="/files/rfind-1.0.zip?9d7bd4">rfind-1.0.zip</a></li></ul><p>This tool is written in Ruby and requires fxruby 1.0, which is a bit out of date. I will try to update it to a recent version of fxruby soon.</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/04/01/rfind-quickly-find-files/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>TextAnalyzer in Python</title><link>http://martin.ankerl.com/2007/03/29/textanalyzer-in-python/</link> <comments>http://martin.ankerl.com/2007/03/29/textanalyzer-in-python/#comments</comments> <pubDate>Thu, 29 Mar 2007 19:05:56 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[coding]]></category> <category><![CDATA[freeware]]></category> <category><![CDATA[news]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[ruby]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=85</guid> <description><![CDATA[I have just found out that somebody has translated my textanalyzer from Ruby into Python. It also contains some improvements like stopwords. The core algorithm is still the same. Get it at kelpheavyweaponry.com.]]></description> <content:encoded><![CDATA[<p>I have just found out that somebody has translated my <a href="http://martin.ankerl.com/2007/01/09/textanalyzer-automatically-extract-characteristic-words/">textanalyzer</a> from Ruby into Python. It also contains some improvements like stopwords. The core algorithm is still the same. Get it at <a href="http://www.kelpheavyweaponry.com/cgi-bin/viewcvs.cgi/trunk/libraries/textanalyze.py">kelpheavyweaponry.com</a>.</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/03/29/textanalyzer-in-python/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>e &#8212; New Release of Extract Any Archive</title><link>http://martin.ankerl.com/2007/02/25/e-new-release-of-extract-any-archive/</link> <comments>http://martin.ankerl.com/2007/02/25/e-new-release-of-extract-any-archive/#comments</comments> <pubDate>Sun, 25 Feb 2007 10:34:04 +0000</pubDate> <dc:creator>martinus</dc:creator> <category><![CDATA[freeware]]></category> <category><![CDATA[linux]]></category> <category><![CDATA[news]]></category> <category><![CDATA[open source]]></category> <category><![CDATA[ruby]]></category><guid isPermaLink="false">http://martin.ankerl.com/?p=84</guid> <description><![CDATA[Extract Any Archive just got better: When you extract multiple archives at once, e.g. with e *.rar and some files are not extractable, e continuous to extract the other files and prints an error message with all the failed files &#8230; <a href="http://martin.ankerl.com/2007/02/25/e-new-release-of-extract-any-archive/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description> <content:encoded><![CDATA[<p>Extract Any Archive just got better: When you extract multiple archives at once, e.g. with</p><pre>e *.rar</pre><p>and some files are not extractable, <tt>e</tt> continuous to extract the other files and prints an error message with all the failed files when it has finished. More info and download of <tt>e</tt> is <a href="http://martin.ankerl.com/2006/08/11/program-e-extract-any-archive/">here</a>.</p><div style='clear:both'></div>]]></content:encoded> <wfw:commentRss>http://martin.ankerl.com/2007/02/25/e-new-release-of-extract-any-archive/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 1/46 queries in 0.063 seconds using disk: basic
Object Caching 1349/1446 objects using disk: basic

Served from: martin.ankerl.com @ 2012-02-04 11:20:14 -->
