1. Overview
In this tutorial, we’ll explain how to convert finite automata to regular expressions using the state elimination method.
Regular expressions are used in many applications, and most programming languages offer a way to implement them.
2. Problem Description
Let be a finite automaton and the transition label of the branch going from to , where they denote any two states of . The conversion algorithm should return a regular expression defining the regular language that accepts.
For example, let’s say we have the following automaton:
The regular expression it’s equivalent to is .
A use case for this is natural language processing (NLP). NLP engineers often design string-matching patterns, and for them, it’s much easier to read a regular expression than to look at a complex finite automaton.
3. The State Elimination Method
We first need to make uniform.** A uniform finite automaton has an initial state with no incoming transitions and a final state with no outgoing transitions.
Then, we repeatedly remove a randomly selected non-endpoint state and compute the regular expressions of the paths between the remaining pairs until we reduce to only the initial and final states.
There are other methods besides the state elimination method; these include Adren’s method, Brzozowski’s algebraic method and the transitive closure method. We’ll show this one because it’s simple and easy to understand and implement.
3.1. Pseudocode
Here’s the pseudocode:
algorithm StateEliminationMethod(A):
// INPUT
// A = a uniform finite automaton
// OUTPUT
// a regular expression defining the regular language that A accepts
while A contains more than two states:
Randomly choose a state k (not the initial or the final state)
for each pair of states (p, q) connected via k:
path_regex <- alpha[p,q] + alpha[p,k] alpha[k,k]* alpha[k,q]
Simplify path_regex
Set path_regex as the transition label alpha[p,q]
Remove k from A
return the regular expression defining the regular language that A accepts
As we see, the algorithm repeats two steps in each iteration.
The first one is to randomly choose a state (not the initial or the final state) and remove it from .
The second step is to compute the regular expressions for the paths between each pair of the remaining states connected directly through . Then, we change the transition labels to the computed regular expressions. The regular expressions of pairs of states that aren’t directly connected through remain the same.
The final step is to remove from since all the paths going through it are now in the updated or newly added transition labels.
3.2. Computing Regular Expressions for Paths
Let be the randomly chosen state.
We compute the new regular expressions for all possible pairs of states and in () such that there’s a path using the following equation:
where:
- is the transition label from state to state
- is the transition label from state to state
- is the transition label from state to state
- is the transition label from state to state
Afterward, we simplify the obtained regular expressions and set the corresponding transition labels to those expressions.
4. Example
Let’s work out an example from a paper on state elimination.
Our input finite-state machine is:
First, we make it uniform:
Now, we randomly choose a state. Let it be . The next step is to compute the regular expressions for paths between the states connected directly via .
The new regular expression for the transition from to is:
Substituting the labels for their values, we get:
Following this, we simplify the regular expression to . Then, we add a new transition between the states and :
Since there’s a path , we need to update the transition label . Following the same procedure, we get that the regular expression for the transition from to itself is . Consequently, the transition label from state to state is replaced with the new regular expression:
The new regular expression for the transition from state to state remains the same because the path between them doesn’t go through .
Finally, we remove :
Now, the only choice is since it’s the only non-initial and non-ending state. Repeating the steps, we get:
As a result, the automaton now contains only the initial and final states. Hence, we can no longer choose more states to remove. The label of the transition is the regular expression equivalent to the input automaton.
5. Complexity
In the worst-case scenario, all the states are directly connected in both directions ( and ), there’s a transition from each state to itself, and each state connects all the pairs of the automaton’s states ( and for any and ).
Let be the number of states (excluding the initial and the final states).
Since all the states are connected in both directions, there are possible pairs of states in the first iteration. Hence, we compute regular expressions.
In the second iteration, we compute expressions because there are states.
We keep removing states and computing the paths between states until there are no more states except the initial and the final ones. Hence, the last regular expression computation is for the transition from the initial to the final states.
Therefore, the worst-case time complexity of the state elimination method is :
The assumption is that computing regular expressions for paths is an operation. Since the labels and expressions are strings, we can achieve that if we concatenate them in time. For example, we can treat strings as arrays that can end with a pointer to another array. That way, we can construct in five steps, so regex computation is .
6. Conclusion
In this article, we showed how to convert a finite-state automaton to a regular expression. We explained the state elimination method. It works by eliminating states and changing the transition labels to the regular expressions of the three-state paths going over the eliminated states.