Tuesday, November 5, 2013

The blog has moved!

I've moved the blog to a new server. It now lives on my very own personal domain. The old blog will still be available, but I won't be writing there anymore.

For those that are interested, the new blog is hosted on GitHub Pages. The posts are written in simple markdown and stored in a git repository. Each time the repository is pushed, GitHub uses Jekyll to generate a static website from the posts. All this makes posts easy to write, edit and publish, and keeps everything under version control. You can see the source code for this blog here.

The migration from BlogSpot was relatively painless. Jekyll has migration recipes for a variety of existing providers. There are several alternatives for BlogSpot - this one worked for me. It keeps the posts as HTML, as opposed to markdown. This isn't really a problem for Jekyll, since it handles HTML without any problems; it's just a pain to edit HTML. Thanks, @ngauthier! Getting the LaTeX equations to display correctly was also relatively straightforward.

I also migrated my Russian-language blog to misha.penkov.id.au. Unfortunately, there was no LiveJournal importer for Jekyll, so I wrote my own.

Monday, September 16, 2013

Automatic Speech Recognition with Google

I recently found out that Google Chrome has a speech recognition component.



(Yes, this video is already 2 years old!) This component utilizes a Web service to do the actual speech recognition. The Web service isn't available officially, but can be accessed without Chrome to perform some quick speech recognition. Here's a script that does the work for you:

The script is mostly based on existing work by Sunil. I had to modify a few things to get it working:

  • Fix the fancy quotation marks
  • Fix the wget line (it wasn't outputting to the correct file)
Other than that, full credit should go to Sunil.  

It only seems to work well for smaller files (no more than a couple of seconds of audio). For larger file, the response from the Web service appears to be empty.

You can find more details about the API in this gist by alotaiba.

Monday, April 22, 2013

Coding Practice: Quicksort

I've mentioned sorting algorithms several times in the past, with a specific focus on Mergesort. Today's article introduces Quicksort, another common sorting algorithm. The article starts with an intuitive, non-technical description. Next, the article presents the C code and a hand-wavy theoretical analysis of its computational complexity, backed by a pinch of practical results. The article concludes with a comparison with the Mergesort algorithm.

Intuitive Description


The most intuitive description of the Quicksort algorithm is credited to its inventor, Tony Hoare:
"Just grab a thing and compare the other things with it."
The trick is that Quicksort "grabs" and "compares" intelligently, avoiding unnecessary comparisons and allowing it to sort a collection in logarithmic time. Specifically, this is achieved by partitioning the collection around the thing we just grabbed (called the "pivot") into two smaller collections. Everything smaller than or equal to the pivot goes into the left sub-collection, and everything else goes into the right sub-collection. The two sub-collections can then be Quicksort-ed independently, recursively. The recursion terminates when the sub-collections contain less than two elements.

Code and Analysis of Computational Complexity


The code for Quicksort is fairly straightforward:

Most of the work is performed in the partition method, which can be implemented in-place.

The computational complexity of Quicksort depends on the selection of the pivot element. In the best case, the selected pivot is the median of the collection and the partition step divides the collection into two smaller collections of identical size. Since the size of the sorted collection is halved at each step of the recursion, the best case complexity of Quicksort is $O(N \log N)$. In the worst case, the selected pivot is the minimum or maximum of the collection, and the partition step achieves very little. The worst case complexity is $O(N^2)$.

There are several ways to select the pivot element, the simplest being selecting the first, last or middle element of the collection. Since selecting the first or last element can lead to worst-case performance if the array is already sorted, selecting the middle element is the better option of the three.

The effect of pivot selection on the complexity of Quicksort can be observed empirically, by counting the number of comparisons for three different types of input: random, sorted and uniform; and for three different pivot selection methods: first, last and middle. Here are some results (sorting 100 input elements, showing the number of comparisons first, last, middle selection modes, respectively):
  • random (mean over 100 runs): 713.28, 715.17, 713.25 
  • sorted: 1001, 1001, 543 
  • same: 1001, 1001, 1001
The above results support what is already well-known: merge sort performs worst when given sorted and uniform input. The former can be dealt with by selecting the middle element as the pivot (or even randomizing the pivot selection). The latter can be dealt with by checking for uniform input prior to sorting, which will take O(N).

To obtain these results, I used GDB (to set breakpoints and count the number of hits), Python (to generate the input) and bash (to tie everything together). The entire code for reproducing these results is here.

Comparison with Mergesort


Mergesort and Quicksort are both divide-and-conquer sorting algorithms. They work by first dividing the input data into parts and then recursively processing each part separately. However, there are significant differences between them.
  1. First, Quicksort does all of its work in the divide (partition) step. The conquer step is trivial, since after recursion is complete, the array is completely sorted. In contrast, Mergesort does very little work in the divide step, and does most of its work after the recursion is complete.
  2. Second, the algorithms have different computational complexity: Mergesort is consistent $O(N \log N)$, Quicksort is $O(N \log N)$, $O(N \log N)$ and $O(N^2)$ in the best, average and worst-case, respectively.
  3. Third, the algorithms have different space complexity: unlike Mergesort, Quicksort's partition step can be implemented in-place without significant impact on complexity.
  4. Fourth, unlike Mergesort, Quicksort is not a stable sorting algorithm, since the partition step reorders elements. Stable implementations of Quicksort do exist, but are not in-place.
  5. Finally, Mergesort is easier to parallelize than Quicksort, since the divide step is simpler with the former.

Conclusion


If you're one of the chosen few that managed to soldier on through the entire article, give yourself a pat on the back. Thanks for reading the entire thing. Please reward yourself with a refreshing chuckle at this sorting-related xkcd.com comic:


Monday, April 8, 2013

An (unexpected) defense of Microsoft Store...

I know I had a lot of fun bashing Microsoft and their Online Store last week, but being a fair and level-minded individual, I feel that I do need to say some things in their defense.

While they failed to obtain my business for the University 365 offer (a blunder that they will surely regret for decades), a quick read of the MS Office license revealed that, as a proud owner of the Home and Student version, I can install it on one more PC.  Which is what I promptly did (can't beat free!).

Unfortunately, the smooth sailing ended here.  In blind defiance of the above-mentioned license, the installed program refused to authenticate, and threatened to disable itself within a month if I did not provide it with a new license key, which, as we all know, costs bags of money.  Having read the The License, I was fully confident in my self-righteousness.  There was no way I was going to pay for something that was already mine.  I did the unthinkable.  I picked up my phone and called the Verification Hotline.

The Verification Hotline is the last resort for people that want to authenticate a Microsoft product, but for one reason or another can't do so over the Internet.  It was well after 7pm when I called, so I half-expected to be kindly asked to call back the next day.  Fortunately, these expectations were misplaced, and I was treated to a warming chat with... a computer.  To proceed with verification, you need to enter something like 64 digits (through the keypad!) to identify your install.  It's difficult to convey the rush of adrenaline as you power towards the last couple of digits.  I've never diffused a bomb, or issued a launch code for an ICBM, but I guess those experiences would come pretty close.

After all that, I got through to an operator.  Finally, a chance to plead my case...  in Japanese.  Great.  After a long and thorough discussion about when and how I installed The Product, the operator agreed to activate my installation.  To do that, I had to enter another missile launch code into my Office install, as she was reading it out.  Another 64 digits or so, and my efforts would finally bear fruit.

My call got cut off after 10 digits.  Game over, man!

Unable to control the fury, I redialed the number, and mashed the keypad until an option to talk to an operator was presented.  I had naively expected that somehow, the person I was talking to before would be there, and we could pick up where we left off...  Alas, that was not to be.  The voice on the other side of the phone was cold and distant.  "I'm afraid you'll have to start again...", she said apologetically.

Like I mentioned earlier, I'm a fairly persistent guy when I need to be.  I persevered.  Entering the 64-digit launch code a second time through was nowhere as painful as the first.  I had the thought that by the time I'd have gone through the process another 3 or 4 times, I'd have the whole thing memorized.  It's really no big deal -- back in the good old days of Windows 95, I reinstalled the O/S so often I had the whole product key committed to memory.

While what I've written so far doesn't really do much in defense of the Microsoft Store, there really is a happy ending to all this.  After I entered my launch code a second time, I didn't have to jump through any more hoops.  The kind soul I spoke to the first time through pre-recorded the authorization code for me, and all I had to do was punch that into my Office install.  All done!  And it only took half an hour...

Furthermore, I had to recently stumble into The Store on an unrelated issue.  I was surprised to see something that I don't recall seeing before -- a live chat option.  You click on that, and get to talk to a real person.  Straightaway.  It's brilliant!  If only that was there a week ago -- I wouldn't have had to rant.  Oh well.  Better out than in, they say.

Thursday, April 4, 2013

The Decimator, 2.0

A little while ago, I wrote about the woes of dealing with numbers in Japanese notation.  Since I never let an opportunity to procrastinate to pass me by, I also posted a brief JavaScript (The Decimator™) to help deal with the confusion.  A friend of mine pointed out that it doesn't help with some use cases, such as 千5百万 (that's 15 million, but you knew that, right?).  And thus another opportunity to procrastinate presented itself, and now, I give you the Decimator™, 2.0:

http://mpenkov.github.com/decimator/

It accepts free text input, and handles both traditional (Kanji only) and mixed (Arabic numerals plus Kanji) numbers.  Feel free to give it a whirl.

Thursday, March 28, 2013

Surviving Customer Support

This article is going to be another rant.

I'm trying to get my hands on an install of MS Office 2013 (well, I really only need PowerPoint to do my presentations, and Word to occasionally read files sent by people who don't know better).  Since I'm an honest individual, I decide to do the unthinkable and actually pay for Microsoft software.  However, I'm also a student, so I'm looking for a way to pay the least amount possible and still keep a clear conscience.  It appears the best option is the University 365 deal that's exclusive to full-time students. The catch: they have to verify you're a student before you can purchase or even download anything. This is where the fun begins.

You get three options for verification: through your school, an ISIC card, or manually typing a verification code into a text box. The first option sounds like the best, except it doesn't work: you get redirected to your school (in my case, Hokkaido University), you enter a username and password, and... get redirected to the start page. One option down, two to go. No, scratch that, I don't have an ISIC card. Well, that pretty much leaves one option -- get in touch with support and ask for a verification code. Sounds easy, huh?

Actually, it isn't. Go to the MS store and search for a way to contact their support team. Bonus question: try to find a way that doesn't involve picking up a telephone. Go on, I'll wait.

Back already?  Did you find an email?  Let me know if you did, cause I didn't.  As a consolation prize, I found a chat option...  that I can't use right now because it's outside of US business hours.  Seriously?  Is it really that hard to foresee that not everyone is going to be in the same time zone as the US?  It's almost as if Microsoft haven't heard of this wonderful thing called "eemail", which allows people across the globe to communicate without having to arrange a mutually convenient time.

Alright, let's try a different angle: let's go through the local Japanese MS site.  Maybe they have a workable support option.  Here's their product lineup.  I'll save you the time of having to click through to it:


Now, it's starting to get ridiculous.  Let's find the Microsoft Store in Japan through Google.  Thankfully, that site doesn't die with a server error.  But hey, remember that thing you wanted to buy half an hour ago...  yeah, Microsoft University 365?  It's not there!



I'm a patient and persistent guy.  Let's try googling around for "Microsoft University 365".  Here's the first hit from Google, and it looks really promising.  There's no link for purchasing information, but there is a "learn more" link.  Let's click on that.


Patience...  running...  out...

Microsoft, here I am, practically begging you to take my money so I can start using your software, and you still can't manage to keep my business.  Maybe I'll just have to go and pirate it like the rest of the world does.  I value my conscience, but I also value my time and sanity, and they are both taking a huge hit by having to deal with your online store.  If you insist on copying Apple and dressing up your staff to look like a bunch of clowns, then perhaps you should also look at copying Apple and making it easy for people to actually buy your products.

Sunday, March 24, 2013

Coding Practice: Binary Search

If you were given a full deck of cards and asked for search for a card (say, the Queen of Hearts), how would you do it? You could go about it several ways: start searching from the top of the deck, bottom of the deck, pick cards randomly... In the worst case, you'd have to shuffle through the entire deck (a whopping 52 cards) before reaching your goal. If I asked you to do this many times, and you were patient enough to oblige, you'd quickly grow tired of shuffling through the entire deck and look for ways to make your ordeal less monotonic. You'd put the cards in an order, using your favorite sorting algorithm (for example, merge-sort), using the value and suit of each card as the key.

Once the deck is sorted, then searching through becomes simpler.  There are several ways to perform the search.  One of the simplest to explain is the binary search, where you pick a card from the middle of the deck (dividing it into lower and upper halves), and compare it to the value you're searching.  If you got lucky and it's the card you want, then you're all done.  If the card you want is less than the card you just picked, then you look at the lower half.  Otherwise, you look at the upper half.

What's the computational complexity of this search?  At each step, you effectively halve the number of cards you have to look at next: 52, 26, 13, 6, 3, 1.   The number of search steps is thus $\log_2 52$, or approximately 6 steps.  The overall complexity is thus $\log N$, where $N$ is the number of elements to search.

The coding problem this time was to implement a binary search algorithm that searches a sorted array.  It works mostly like what I described above, with the exception of how it handles elements with the same value.  When the array contains more than one element of the same value (e.g. a messed-up deck with 2 Queens of Hearts), then the algorithm searches for the first occurrence of the element.

Here is a JavaScript demo (butt-ugly, but does what it's supposed to). The "<<" and ">>" arrows to move to the previous and next search steps, respectively. At each step, the blue, orange and red numbers represent the first, middle and end of the search, respectively. A green element indicates the search has reached its goal and has terminated.

Saturday, March 16, 2013

Counting Large Numbers in Japanese: a PITA

In Japan, the units people use to represent large numbers such as currency differ significantly to the "rest of the world".  Most people are used to decimal units that increase in powers of 3: thousands ($10^3$), millions ($10^6$), billions ($10^9$), trillions ($10^{12}$).  This system, known as the short scale, is also consistent with how numbers are written: as triplets, with a separator (such as a space, comma or otherwise).  Hundreds are a bit of an exception to this pattern.

In Japan, they also have hundreds, but beyond that, the units increase in powers of 4: ten thousand ($10^4$), 100 million ($10^8$), trillion ($10^{12}$).  Unsurprisingly, they have names for these units, too: man (万), oku (憶), and chou (兆), respectively.  They also have thousands ($10^3$), but that's kind of an exception.  They also have the infamous hyaku-man (百万), the hundred ten-thousands.  You may know that as a million.  Despite this "power of four" rule, the Japanese still write numbers as triplets, except using their own units: for example, car prices are often written as 123,000万円 (or 12.3億円, depending on the psychological trick the car dealership believes in).  For a non-Japanese person, that's 1,230,000,000 or 1.23 billion yen.  Simple, right?  All you need to do is mentally add a couple of zeros, shift the commas across, regroup the zeros...

Wrong.  I mess this up all the time, like when I'm reading the news or shopping around for my next BMW.  It happens frequently enough to be frustrating, but not frequently enough to learn my lesson. So, behold: I give you the "The Decimator": a JavaScript that converts Japanese numbers to our familiar decimal notation.
=

Sunday, February 24, 2013

This Week's Coding Practice: Stacks and the Tower of Hanoi

A stack is a common data type that allows its elements to be accessed in LIFO (last in, first out) order. Many parts of the physical world around us function like a stack. A common analogy is a stack of plates on a table: it's simple to put a new plate on top or remove the top-most plate. It's impossible to insert a new plate into the middle of the stack. It's also impossible to remove any plates other than the top-most one. I ran into yet another real-world example of a stack the other day: a narrow rectangular parking lot with only one side facing a driveway. The lot has just enough space to fit around 4-5 cars end-to-end. When people want to park, they just drive straight in. Eventually, they get parked in by someone, who also gets parked in by somebody else, and so on. Lucky last doesn't get parked in by anyone else, but gets nagged by all the people he parked in whenever they want to leave.

Another popular example of a stack is the famous Tower of Hanoi puzzle.  There are three rods, and an arbitrary number of disks of unique sizes.  Initially, all the disks are placed on the first rod, ordered by size (smallest disk at the top).  The goal of the puzzle is to move all the disks to another rod, one by one, without ever putting a larger disk on top of a smaller one.  The task for this week was to write an algorithm that solves this puzzle.

It may not be obvious from the description, but each peg can be modeled as a stack, since its physically impossible to access elements in any order other than LIFO (for example, randomly, or FIFO).  The entire puzzle is then modeled by 3 separate stacks.

How does one go about solving the puzzle?  It turns out that it has been thoroughly studied, and several well-known recursive and iterative algorithms exist.  However, since I have a tendency to do things the hard way, I didn't study those algorithms prior to solving the puzzle, and came up with a less elegant but home-brewed solution on my own.

Having spent a significant amount of time on the problems at Project Euler, I've acquired a instinctive reaction: whenever facing a problem for the first time, try brute force.  It's almost never the best (or even borderline satisfactory) solution, but is a fairly quick way of getting acquainted with a problem.  It's also a good starting point for trying out other things.  Lastly, it's better than nothing.

With that in mind, I approached the puzzle as a search problem (a more general instance of the binary search).  At each step, you represent the current state of the puzzle as 3 individual states.  To get to the next steps, a disks is moved from one stack to another, without violating the rules of the puzzle.  Once you get to a state in which all of the discs are on the second (or third) stack, then you've solved the puzzle.  In which order should we look at the next steps?  One simple way is DFS (depth-first search).

I quickly realized that a lot of the resulting steps weren't helpful in solving the puzzle: moving a single disc backwards and forwards between two rods achieves nothing.  It also causes some branches of the DFS to be infinitely long, meaning the search will fail to terminate.  One way to get around this problem is to search only those states that haven't been encountered yet.  This requires keeping track of the states encountered thus far, which could be done with a hash map.

As it turns out, DFS isn't the best way to perform the search, since the solution it yields isn't necessarily the simplest.  This can be addressed by switching to breadth-first search (BFS): the solution it yields will have the minimum possible number of steps (which, as it turns out, is equal to $2^{n-1}$).

After writing my solution out on paper, I quickly coded it up in Python.  Since I had a bit of spare time, I decided to try out my JavaScript chops and implement something I could show off graphically.  While fiddling with JS took me way longer than coming up with the actual algorithm, I got there in the end, and came up with this:



The search problem can be taken further by implementing an A* search that uses a heuristic to determine the best action to take at each step.  If a suitable heuristic can be formulated, this kind of search can significantly reduce the running time and memory usage of the algorithm.  It was tempting to keep digging, but I'll leave that to another day.

Full source (including the Python code) is available as a gist.

Friday, February 15, 2013

Coding Practice: Tries

I was 16 when I first got my hands on a mobile phone -- it was the year of the Sydney Olympics, and since I was a volunteer, I was spending a lot of time outside and needed a way to stay in touch with home base. I'm not 100% positive, but I think it was an Ericsson SH888. Calls weren't cheap back then, so SMS was the way to go. I remember the pain of having to enter English text through a numeric keypad without using a predictive text engine like T9. Of course, these days, predictive text (also known as auto-completion) is everywhere: mobile devices, web browsers, and office software. In today's article, I'll briefly discuss a data structure that can be used to implement auto-completion effectively and efficiently -- the trie.

A trie is a tree-like data structure. The main difference between tries and BSTs is that tries have an arbitrary branching factor, whereas BST nodes can have at most two children. This allows faster lookups: a trie lookup is proportional to the length of the key, whereas a BST lookup is proportional to the number of items stored in the tree. Another advantage of tries is that they allow prefix-based lookups. For this reason, they are used in text processing for tasks such as word autocompletion.

Interestingly, a trie can also be used as an alternative to a hash table when implementing associative arrays. It has several advantages over a hash table: it does not require a hash function, and is thus faster for shorter keys; it maintains order within items that it stores. This alternative is particularly useful when many of the associative array keys are prefixes of other keys (for example, when the keys are English words).

For this week's coding practice, I coded up an autocompleter in Python. The autocompleter works in two steps: in the first step, the script initializes a trie by looks at a dictionary of English words (on Linux systems, one can often be found at /usr/share/dict/american-english). In the second step, the script prompts the user to input a prefix, and uses the trie to locate all words beginning with that prefix. The initialization step takes a little while (approximately 2 seconds for 100,000 words), but only needs to be done once. The lookup step is pretty much instantaneous: as long as the trie is in memory, the cost of lookups is linear to the length of the key.

The entire american-english file consists of 100K words and is approximately 1MiB. The trie used to hold this dictionary has over 200K nodes, and occupies a whopping 85MiB in memory. That's approximately 425 bytes per node: it's likely that this can be reduced. Personally, I suspect that an implementation in C would use less memory.

Here is the code:

Finally, many a battle have been fought over how to pronounce the word "trie". One camp insists on pronouncing it the same as "tree", since the origins of the word are from "retrieval". Since a "tree" and a "trie" are slightly different things, another camp insists on pronouncing it as "try". Finally, others sidestep the issue completely and refer to tries as "prefix trees".

Wednesday, February 6, 2013

The Challenges of Character Encoding... in 2013

This article is going to be a bit of a rant.

Back when I was a lad, Unicode wasn't as popular as it is now. The world lived in the Dark Ages, where each text character was usually represented by a single byte, regardless of the language. A single byte can take up to 256 different values, meaning that on its own, it could be used to represent at most 256 different characters. People wanted to communicate to each other in different languages, but the number of characters in all modern languages was far more than 256.  This posed a problem.

A commonly accepted solution was to agree that the first 128 character codes were fixed (the ASCII character set), while the last 128 were unspecified -- their values depended on the context. A swarm of character encodings emerged, each handling these last 128 characters differently. For my native Russian, there were at least two such encodings: KOI-8 and CP1251. When you got text in Russian, you typically had to guess which encoding it was in. If you got it wrong, all you saw was gibberish (and any English text, since that was handled the same regardless of the encoding). It was 50/50, so not too bad. There were also some utilities that helped you guess the encoding.

A significant limitation, of course, was that you couldn't freely mix languages in text. For example, working with Russian and Greek text in the same file was not possible. Greek used a different encoding, which wasn't compatible with KOI-8 nor CP1251. You could see Russian or Greek at any one time, but never both.

Fast-forward to 2013. Unicode has become almost universally accepted. It solved the problems above, introduced several new ones, but overall, made the world a better place. The days of staring at gibberish trying to work out which encoding it is in were gone. Things just worked. Or so I thought.

I recently had the pleasure of playing correspondence chess with my dad using a chess server. While the game itself was quite entertaining (I lost miserably), I quickly noticed that sending Unicode messages wasn't working -- ASCII characters got through, non-ASCII characters got replaced by a question mark. This forced us to write in transliterated Russian, which I hate with a passion:


What was obvious that while the Web site was capable of displaying Cyrillic (which I managed to enter using HTML character escapes, thanks to the site admin), it was ruthlessly clobbering the text after it was being submitted. After looking at it more closely, I realized the input textarea was part of a POST form, and clicking on the "Msg" button submitted the form. The message was, therefore, part of the POST payload. I confirmed that it was successfully URL-encoded and still readable by using Chrome's Developer tools. After that, all trace of the message is lost, but one thing is clear: the non-ASCII characters in the message died a horrible death.



While encountering a fairly popular site that could not properly handle Unicode in 2013 was amusing, what was even more amusing was my dialogue with the administrator of the site. While I do believe they were genuinely trying to help, their firm belief that a server-side encoding problem could be fixed by modifying the client configuration was disconcerting, to say the least. The more I talked to them, the more I understood that they had no idea how character encoding works. Unfortunately, they also had little desire to reason about it, which led to a rather tense dialogue:
... the language that you use to handle message input is controlled by your local configuration and this is something over which we have no control. All of our pages, in common with around 75% of all online content from many diverse sites, are configured to accept input that conforms with UTF-8 standards. As I have tried to point out, there may well be a local issue relevant to your local configuration, codepage setting, default character coding and so on. These are not issues for which we, as a UK based Chess playing site, normally provide support. In this case, we have offered more support and advice than would have been provided by many of our competitors.
I edited the quote above liberally, for brevity. It can, however, be summarized by a single word in the King's English: bollocks. What's really happening is this: after the form is submitted, the URL-encoded message is retrieved from the POST payload, and encoded into one of the antique character encodings from the Dark Ages. Since these encodings only represent characters with codes up to 255, and the Unicode Cyrillic characters sit around the 1000 mark, there's simply no way to represent those characters anymore. Whatever performs the encoding replaces such characters with a placeholder character, which in this case happens to be a question mark.


I mashed up some JavaScript to demonstrate the problem.  While I couldn't be bothered doing a full form POST, the code demonstrates the essence of the problem.  It's below.

Here's a live version you can interact with.  Case closed.


clobber the submitted text

Had I been a more patient man, I would have persevered against the onslaught of ignorance and carried on my crusade for working Unicode input to its glorious end. Instead, I just opened an account on another chess server.

Tuesday, February 5, 2013

Coding Practice: Depth-first Tree Traversal

A few weeks ago, I wrote about trees -- binary search trees, to be specific.  Working with a tree, for example inserting or retrieving elements, often involve scanning through the elements of the tree, starting at the root.  This is known as a traversal.

Unsurprisingly, there are many ways to perform a traversal.  A major distinguishing factor is the order in which nodes are visited: if all children of a node are visited before any of the node's siblings, then the traversal is known as depth-first.  If, on the other hand, the siblings are visited before the children, then the traversal is known as breadth-first.  In this article, I'll focus on describing depth-first traversals.

There are three main ways of traversing a binary tree depth-first: pre-order, in-order and post-order. They are typically defined recursively, with each step of the recursion consisting of three sub-steps: do something to the current node (this is referred to as visiting), traverse the left subtree, traverse the right subtree. By convention, the left subtree is traversed before the right subtree. Pre-order, in-order and post-order traversals perform the visit sub-step before, in-between and after the two subtree traversals, respectively.  Here's an example of performing each of the three traversal on a small tree.

The big three traversal algorithms can be implemented recursively or iteratively. Recursive implementations are easier to understand since they follow directly from definition, but can be less efficient than iterative implementations due to the function call overhead (for more details, see http://stackoverflow.com/questions/72209/recursion-or-iteration).

Picking the traversal algorithm to use depends on the application. For example, propagating changes from the leaf nodes to the root (e.g. calculating the sum of the tree) can only be accomplished with a post-order traversal, since the sum of each subtree needs to be known before the sum of the current node can be calculated. In contrast, propagating changes from the root to the leaves would be best done with a pre-order traversal.

The problems to solve this week were:
  1. Implement a simple binary (non-search) tree node data structure in your favorite programming language and write the following methods: (1) print nodes pre-order, (2) print nodes in-order, (3) print nodes post-order.
  2. Write a function that, given two nodes and a tree root, finds the two nodes' lowest common ancestor. That is, the function should find the ancestor that both nodes share that is furthest away from the root.
I went with a C++ implementation this time to take advantage of the STL's sets and maps. For finding the lowest common ancestor (LCA), I didn't use a parent pointer in the Node, and instead used an arbitrary traversal to calculate the parent of each node in the tree and store it in a map. This costs $O(n)$ for both space and time. Once that's done, finding all ancestors of one node and searching for the LCA both cost $O(log(n))$ given a data structure with a fast membership function (a set). The approach is thus $O(n)$, but can be reduced to $O(log(n))$ if the results of the traversal are pre-calculated and stored somewhere.  This pre-calculation would be practically identical to keeping parent pointers in each Node.

The code is below:


Thursday, January 17, 2013

Cheating in Angry Birds

Please allow me to indulge in shameless self-marketing for the entire duration of this post.

Many moons ago, I developed a small library that processes a screenshot from Angry Birds, a popular time-waster computer game, and detects several important parameters like the scale (amount of zoom) of the level, position of the slingshot, and the kind of bird that is currently loaded.  At the time, I had just started doing serious work with OpenCV, and took this as an opportunity to put my newly-acquired OpenCV skillz to the test.  It was a very challenging and interesting side-project.  I consider the fact that I completed it without ever playing a single game of Angry Birds an important lifetime achievement.

I'm happy to report that the library I developed is now part of a real iPad app in the iTunes Store.  Here's a short instructional video to demonstrate how the app works:


Impress friends and relatives!  If you've dreamed of achieving astronomical scores in Angry Birds, then this app is definitely for you!  Once the app processes the screenshot (courtesy of yours truly), you select a target, and the app automatically calculates the required trajectories for you using some in-game physics.  The app is also clever enough to handle each type of bird individually (e.g. the bird splitting in mid-air as shown in the video).  The end result is your ability to hit the targets you picked with deadly accuracy.

In other news, at the end of 2012, an Australian university hosted an artificial intelligence (AI) competition that pitted computers against human players in a game of Angry Birds.  The AI challengers had access to a computer vision component that performed a similar task to the library that I developed (and a lot more, since they also detected other in-game sprites) and an in-game physics component to calculate trajectories.  It seems that the goal for the challengers was to identify the targets that would maximize the score for the game, and utilize the provided computer vision and physics components to hit those targets.  I wasn't able to find the results of the competition anywhere, but a reliable source in Australia with access to local TV reports that the human players tended to win.

Sunday, January 6, 2013

Why Linux rocks

... or how to share your internet connection.

I'm out of town for a conference.  I brought my laptop and tablet PC with me.  Unfortunately, the hotel room doesn't have wireless -- it only has a LAN port.  How can I access the internet through my tablet?  Well, since the laptop has a functioning wireless card, it's not that difficult.  You need to run hostapd, a daemon that implements a wireless Access Point (AP) in software, and bridge-utils to bring the LAN and wireless interfaces.  On Ubuntu, these are available directly through apt-get.  From there, it's relatively straight forward:

  1. Configure the AP (edit /etc/hostapd/hostapd.conf to specify SSID, access keys, etc.)
  2. Configure the bridge (edit /etc/network/interfaces to specify the interfaces you want to bridge, the gateway and DNS servers)
  3. Restart networking services
  4. Connect to the created wireless AP and enjoy your bridged connection
For parts 1 and 2, I used the settings from the original post directly.  I had to change eth1 to eth0 since that's what the LAN interface is called on my system.  The IP addresses for the laptop, gateway and DNS server were also obviously different, but readily available through netstat -nr.  If you skip step 2, then you won't be get an IP address at step 4 unless DHCP is running on the laptop.  If step 2 is performed correctly, then whatever is performing DHCP on the remote side of the LAN connection will perform the IP address allocation and everything will work alright.

It's amazing what you can do just by editing a couple of configuration files.  Of course, it would be great if all this would all be done through a GUI.  There's already nm-applet, which seems to allow one to create an AP, but I couldn't get that going, so I gave up and headed for the command-line option.

Friday, January 4, 2013

This Week's Coding Practice: Linked Lists

Linked lists are elementary data structures that any programmer worth their salt should know back to front.  In contrast to the contiguous array, linked lists naturally grow or contract as the data stored in them increases or decreases, respectively.  Furthermore, the memory occupied by the data does not need to be allocated contiguously.  However, this flexibility comes at a price.  Accessing random elements in constant time is no longer possible -- with a linked list, accessing the nth element takes O(n) time.

There are many ways to skin a cat, and there are also many ways to implement linked lists.  Some important decisions to make along the way include determining the degree of abstraction; picking between singly and doubly-linked lists; and picking between sentinel nodes and sentinel values.  I'll discuss each decision briefly below.

At a high-level, a linked list can be defined recursively: a linked list is a node followed by a linked list.  You have at most two entities to worry about -- the Node, which contains the data you want to store and a pointer to the rest of the linked list, and the List itself.  Ideally, you want to model the list as a "black box" and abstract away the fact that it consists of a chain of Nodes.  This requires defining and implementing the List and Node explicitly as separate data types (as separate structs, for example).  Frameworks, like the C++ STL's vector or Python's list(), do this for you.  Home-brew implementations of the linked list often avoid defining an explicit data type for the list itself, and model the list implicitly as a chain of nodes (see how I implemented Bins in a hash table a few weeks ago).  This saves the developer the effort of having to define separate List and Node data types, at the expense of reduced abstraction.  If the particular linked list implementation is only used internally (as opposed to being part of a framework), then the loss of abstraction is not a big deal.

List linking comes in two flavors: single and double.  A node of singly-linked lists contains a pointer to the next node only.  A node of a doubly-linked list contains pointers to both the previous and the next nodes.  Singly-linked lists are the simplest and most flexible -- so flexible, in fact, that they can be "abused" by chaining lists together in a tree-like structure.  Again, this flexibility comes at a price.  Updating (inserting and removing elements) singly-linked lists can be a pain in the ass, since such operations require access to the node that is before the update location.  Since singly-linked lists can only be traversed in one direction (from the head to the tail), locating this previous node requires a full traversal from the head of the list -- which can cost O(n).  Doubly-linked lists allow the previous node to be determined directly, at the expense of increased storage overhead, due to the extra pointer in each node, and reduced flexibility -- abusing the list as a tree is no longer possible.

All good things come to an end, and linked lists are no exception (OK, with the exception of circular lists...).  Primarily, there are two ways of indicating the end of a list: sentinel values and sentinel nodes.  Sentinel values are special values like NULL that indicate that the previous or next node doesn't exist.  It's a simple idea, but requires additional conditional code to handle search and retrieval operations near the head and tail of the list.  Sentinel nodes simplify the code by removing such conditionals, at the expense of adding nodes that do not hold any real data.

How one goes about making the above-mentioned three decisions depends on the application -- what will you be using the linked list for?  This week, the tasks were to implement a simple singly-linked list and:
  1. Write a function to remove a list's 3rd from last element. Challenge: can you do it in a single list traversal?
  2. Write a function to remove all duplicates from a linked list. Challenge: can you do it without storing any extra data?
  3. Write a function to detect a loop in a linked list.
For the abstraction level, I decided to roll with abstracting away the Node and exposing only the top-level List.  This looks a bit awkward at first, since all the List is is just a pointer to the first Node.  However, it does simplify the representation of border cases, such as empty lists.  I also decided not to use sentinels this time, as they would be more trouble than they are worth in some cases.  Specifically, removing duplicates from a list without storing any extra data requires pre-sorting the list with an algorithm such as merge-sort.  If we're using sentinels, then the divide step of merge-sort would require us to create additional sentinel nodes, and the merge step would require us to delete those additional sentinel nodes.

Finally, here's the code for this week: