Tuesday, February 5, 2013

Coding Practice: Depth-first Tree Traversal

A few weeks ago, I wrote about trees -- binary search trees, to be specific.  Working with a tree, for example inserting or retrieving elements, often involve scanning through the elements of the tree, starting at the root.  This is known as a traversal.

Unsurprisingly, there are many ways to perform a traversal.  A major distinguishing factor is the order in which nodes are visited: if all children of a node are visited before any of the node's siblings, then the traversal is known as depth-first.  If, on the other hand, the siblings are visited before the children, then the traversal is known as breadth-first.  In this article, I'll focus on describing depth-first traversals.

There are three main ways of traversing a binary tree depth-first: pre-order, in-order and post-order. They are typically defined recursively, with each step of the recursion consisting of three sub-steps: do something to the current node (this is referred to as visiting), traverse the left subtree, traverse the right subtree. By convention, the left subtree is traversed before the right subtree. Pre-order, in-order and post-order traversals perform the visit sub-step before, in-between and after the two subtree traversals, respectively.  Here's an example of performing each of the three traversal on a small tree.

The big three traversal algorithms can be implemented recursively or iteratively. Recursive implementations are easier to understand since they follow directly from definition, but can be less efficient than iterative implementations due to the function call overhead (for more details, see http://stackoverflow.com/questions/72209/recursion-or-iteration).

Picking the traversal algorithm to use depends on the application. For example, propagating changes from the leaf nodes to the root (e.g. calculating the sum of the tree) can only be accomplished with a post-order traversal, since the sum of each subtree needs to be known before the sum of the current node can be calculated. In contrast, propagating changes from the root to the leaves would be best done with a pre-order traversal.

The problems to solve this week were:
1. Implement a simple binary (non-search) tree node data structure in your favorite programming language and write the following methods: (1) print nodes pre-order, (2) print nodes in-order, (3) print nodes post-order.
2. Write a function that, given two nodes and a tree root, finds the two nodes' lowest common ancestor. That is, the function should find the ancestor that both nodes share that is furthest away from the root.
I went with a C++ implementation this time to take advantage of the STL's sets and maps. For finding the lowest common ancestor (LCA), I didn't use a parent pointer in the Node, and instead used an arbitrary traversal to calculate the parent of each node in the tree and store it in a map. This costs $O(n)$ for both space and time. Once that's done, finding all ancestors of one node and searching for the LCA both cost $O(log(n))$ given a data structure with a fast membership function (a set). The approach is thus $O(n)$, but can be reduced to $O(log(n))$ if the results of the traversal are pre-calculated and stored somewhere.  This pre-calculation would be practically identical to keeping parent pointers in each Node.

The code is below: