1. Introduction

In this tutorial, we’ll talk about tree isomorphism and how to check if two trees are isomorphic.

2. Tree Isomorphism

Since trees are connected acyclic graphs, tree isomorphism is a special case of graph isomorphism.

The word isomorphism means the same shape. So, intuitively, we say that two trees are isomorphic if they have the same structure.

How do we prove isomorphism? We can redraw one tree to take the same shape as the other and match their nodes. For instance:

Reshaping a tree to prove isomorphism

That, however, isn’t always easy to do by hand, so we need an efficient algorithm and a precise definition of isomorphism.

2.1. Mathematical Definition

We can define isomorphism rigorously. Let T_1 = (V_1, E_1) and T_2= (V_2, E_2) be two trees, where V_1 and V_2 denote their nodes, and E_1 and E_2 their edges.

We say that \boldsymbol{T_1} and \boldsymbol{T_2} are isomorphic if there exists a bijection \boldsymbol{f: V_1 \rightarrow V_2}  such that any two nodes \boldsymbol{u,v \in V_1} are adjacent in \boldsymbol{T_1} if and only if they’re adjacent in \boldsymbol{T_2}:

    [(u, v) \in E_1 \iff (f(u), f(v)) \in E_2]

We call such a mapping f the isomorphism of T_1 and T_2.

Although we didn’t call it that way, we formulated the following isomorphism by reshaping T_2 in the previous example:

    [\begin{pmatrix} u & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ f(u)& D & F & G & A & B & C & E\\ \end{pmatrix}]

3. Encoding

The idea is to encode each tree so that only the nodes of the trees sharing the same structure have the same encoding.

3.1. AHU Encoding

We’ll use a variation of the AHU (Aho, Hopcroft, Ullman) encoding scheme. It maps nodes to strings representing the structures of their subtrees.

If the node u is a leaf, its AHU encoding AHU(u) is just the symbol 0.

In contrast, let u be a non-leaf node and v_1, v_2, \ldots, v_k its children. Its encoding is the string containing its children’s encodings in the non-descending order: '(AHU(v_{(1)}), AHU(v_{(2)}), \ldots, AHU(v_{(k)}))' where AHU(v_{(1)}) \leq AHU(v_{(2)}) \leq \ldots \leq AHU(v_{(k)}).

Finally, \boldsymbol{AHU(T) = AHU(T.root)}.

For example:

The AHU encoding of a tree

3.2. Pseudocode

Here’s the pseudocode:

algorithm AHUEncoding(T):
    // INPUT
    //    T = a tree
    // OUTPUT
    //    The AHU encoding of T

    D <- make a dictionary mapping each level to the nodes at it
    // Let h be the tree's height and root's level be 1

    for u in D[h]:
        AHU(u) <- '0'

    k <- h - 1

    while k >= 1:
        for u in D[k]:
            if u is a leaf:
                AHU(u) <- '0'
            else:
                AHU(u) <- '(' + sort the AHU codes of the children of u non-decreasingly + ')'
        k <- k - 1

    return AHU(T.root)

Before we start encoding the nodes, we first need to traverse the tree to assign each node its level so that we can iteratively compute the AHU strings starting from the bottom. Using the postorder traversal, we can get the levels in time linear in the number of nodes.

4. Checking for Isomorphism

Now, we can extend the AHU encoding algorithm to isomorphism testing.

4.1. Root

If \boldsymbol{T_1} and \boldsymbol{T_2} are isomorphic, they’ll have the same AHU encoding only if their roots are at the same place in their shared structure:

Isomorphic trees with different roots

On the right, we have the same tree as on the left but rooted at a different node. As we see, their AHU encodings differ.

To fix this, we’ll first find the centers of \boldsymbol{T_1} and \boldsymbol{T_2} and root the trees at them. If they have the same structure, their encodings, when rooted at the centers, will be the same.

If T_1 or T_2 have two centers, we’ll try all pairings.

4.2. Pseudocode

Here’s the pseudocode:

algorithm CheckIsomorphicTrees(T1, T2):
    // INPUT
    //    T1 and T2 = two trees
    // OUTPUT
    //    true if T1 and T2 are isomorphic, false otherwise
    
    for center1 in FIND-CENTERS(T1):
        T1.root <- center1
        
        for center2 in FIND-CENTERS(T2):
            T2.root <- center2
            
            if AHU(T1) = AHU(T2):
                return true

    return false

So, we first find the centers. Then, we use the centers as the roots and try to match the encodings. If any combination gives a match, we’ll prove the isomorphism of T_1 and T_2. However, if we find no match, the trees aren’t isomorphic, so we return false.

4.3. Improvement

However, we don’t use the fact that the nodes at the same levels will have the same AHU tuples in the case of isomorphism, so a mismatch at any level means we can start with the next pair of centers. In the previous algorithm, we computed the entire AHU encodings even though we can stop if we find the levels that differ.

Also, since we’ll compare the trees level by level, we can shorten the encodings as follows. After computing the strings for all the nodes at the level k, we find the distinct encodings. Then, sort them, and we map the shortest unique string to 0, the second one to 1, and so on. When computing the encodings of the nodes at the level k-1, we use the unique integers we assigned to the strings instead of the actual strings:

algorithm FasterIsomorphismCheck(T1, T2):
    // INPUT
    //    T1 and T2 = two trees
    // OUTPUT
    //    true if T1 and T2 are isomorphic, false otherwise

    for center1 in FIND-CENTERS(T1):
        T1.root <- center1

        for center2 in FIND-CENTERS(T2):
            T2.root <- center2

            D1 <- make a dictionary mapping each level of T1 to the nodes at it
            D2 <- do the same for T2

            // Let h1 and h2 be the heights of T1 and T2
            if h1 != h2:
                return false

            h <- h1

            for u in (D1[h] + D2[h]):
                AHU(u) <- 0

            k <- h - 1

            while k >= 1:
                Determine the unique strings at level k in both trees
                L1 <- sort the strings of the nodes of T1 at level k
                L2 <- sort the strings of the nodes of T2 at level k

                if L1 != L2:
                    return false
                else:
                    Set the the encodings of the nodes 
                      to the ranks of their strings in L1 and L2
                
                k <- k - 1
            
            return true

    return false

After rooting the input tree at the centers, we check their heights because only trees with the same height can be isomorphic.

4.4. Complexity

In the worst case, both trees will have two centers. So, the while loop is executed at most four times. Its worst-case complexity is O(n). Why? First, every node is processed exactly once in the entire algorithm. That amounts to O(n) time.

Second, if m is the trees’ arity, in each iteration, we sort at most m strings, each having no more than m characters. Since m is fixed and doesn’t grow with respect to n, the overhead is O(1) per iteration. As there are h-1 iterations in the worst case, where h is the height, the total overhead is O(h). Since h \leq n, we get the O(n) overhead.

Finally, the postorder traversal is also O(n). So, the algorithm has the \boldsymbol{O(n)} time complexity in the worst case.

5. Conclusion

In this article, we showed how to check if two trees are isomorphic. We do this by rooting them at their centers and trying to match their AHU encodings. If the threes are indeed isomorphic, we’ll get a match.