从后缀表达式到表达式树

1. Introduction

In this tutorial, we’ll show how to convert an expression written down in postfix notation into an expression tree.

2. Expressions

All the operators come after their arguments in the postfix representation of an expression. For example, the expression:

$[(5 + 2) \times (3 - 1)]$

has the following postfix representation:

$[5\ 2 + 3 \ 1 - \times]$

An expression tree is a graphical representation of an expression where:

leaf nodes denote constant values or variables
internal nodes contain operators

For example, here’s the above expression’s tree:

An example of an expression tree

Since the order of computation is clear in postfix notation, it doesn’t need parentheses. That makes postfix expressions easier to write. However, they aren’t human-readable. To represent an expression in a readable way, we often write them down as expression trees.

So, let’s say that $x = [x_1, x_2, \ldots, x_n]$ is the postfix representation where each x_i is a constant, variable or an operator. Our goal is to transform it into a tree.

3. Algorithm for Expressions With Binary Operators

First, we’ll cover the case when all the operators are binary.

That means that as soon as we hit an operator token in , we can be sure it acts on two preceding operands. So, we need to store them in a way that makes it easy to retrieve them in reverse order. A stack seems like a natural choice.

However, the trick is that operands can also be sub-expressions of the original. Therefore, we need a data structure capable of representing literals and variables in addition to expressions. For instance, we could define the following class hierarchy in an object-oriented language such as Java:

Class hierarchy

It follows the recursive definition of expressions:

Literals and variables are expressions.
If and are expressions and $\circ$ is a binary operator, $A_1 \circ A_2$ is also an expression.

3.1. Pseudocode

Here’s the pseudocode:

algorithm ConvertBinaryPostfixExpression(x):
    // INPUT
    //    x = [x1, x2, ..., xn], an array of symbols (constant values, variables, operators) 
    //    representing a postfix expression
    // OUTPUT
    //    The expression tree of x

    nodes <- an empty stack

    for i <- 1 to n:
        if x[i] is not an operator:
            nodes.push(x[i])
        else:
            right <- nodes.pop()
            left <- nodes.pop()
            
            new_node <- Node(left, right)
            new_node.operator <- x[i]
            nodes.push(new_node)

    root <- nodes.pop()
    
    return root

The idea is to iterate over and handle x_i depending on whether it’s an operator. If not, we store it in the stack. If yes, we pop the two most recent items. The first pop returns the right operand since it’s processed the last. Then, we glue them together to make their parent node and push the new node onto the stack.

In the end, the stack holds the tree’s root.

3.2. Example

Here is how our algorithm would handle $5 \ 2 + 3 \ 1 - \times$ :

Converting a postfix expression into a tree

As we see, it didn’t need more than seven steps, which is the number of symbols in the expression.

3.3. Complexity

Since we assumed that all the operators are binary, we pop at most two items from the stack for each $\boldsymbol{i=1,2,\ldots, n}$ . Therefore, the algorithm’s worst-case time complexity is $\boldsymbol{O(n)}$ .

3.4. Assumptions and Limitations

We assume that is an array of symbols (tokens), where each is either an operator, a variable, or a literal. If the input expression is a string, we first need to tokenize it before applying this algorithm.

Additionally, the algorithm doesn’t check if the expression is malformed. If so, building the tree will fail.

Last but not least, it assumes the operators are binary. For instance, that means it can’t handle negative numbers since the minus in -5 is a unary operator.

4. Algorithm for Expressions With Arbitrary Operators

We can adapt the algorithm to handle operators of any arity. However, we need a way to determine the number of operands for each operator symbol. To do that, we can use a hash table whose keys are the operators, and the values denote their arities.

Here’s the pseudocode:

algorithm ConvertPostfixExpression(x, info):
    // INPUT
    //    x = an array of symbols (constant values, variables, operators) 
    //    representing a postfix expression
    //    info = a hash table whose keys are operator symbols and values are their arities
    // OUTPUT
    //    The expression tree of x
    
    nodes <- initialize an empty stack

    for i <- 1 to length(x):
        if x[i] is not an operator:
            nodes.push(x[i])
        else:
            m <- info[x[i]]
            operands <- pop m items from nodes
            operands <- reverse operands
            new_node <- Node(operands)
            new_node.operator <- x[i]
            nodes.push(new_node)

    root <- nodes.pop()
    
    return root

We only changed the else branch in the loop. If $\boldsymbol{x_i}$ is an operator, we first query the hash table to get its arity $\boldsymbol{m}$ . Then, we pop items from the stack to get the operands. However, the stack returns them in reverse order, so we need to correct that. Afterward, the only thing left to do is to instantiate the parent node and push it onto the stack.

4.1. Complexity

Let be the operators’ maximal arity. When processing the symbol x_i , we pop no more than items from the stack. Since we iterate over symbols, the time complexity is O(mn) .

If is a constant, the complexity reduces to O(n) , the same as for the algorithm handling binary operators only. However, we expect this one to run slower in practice because, in general, it will pop more operands from the stack.

5. Conclusion

In this article, we presented a linear-time algorithm for converting a postfix expression into an expression tree.

Persistence

REST

Security