使用Manacher算法在O(n)时间内找到回文子串

1. Overview

When dealing with string problems, we’ll come across a term called palindromes.

In this tutorial, we’ll show what a palindrome is. Also, we’ll explain Mahacher’s algorithm, which handles palindrome substrings inside a string .

Finally, we’ll give some applications that use Manacher’s algorithm.

2. Definitions

2.1. Definition of Palindromes

First of all, we need to define what a palindrome is. A palindrome string is the same when reading forward and backward. For example, strings { aba , , , abcacba } are palindromes, while {, cbabd } are not.

From this definition, we can define the center of a palindrome. The center of a palindrome is the middle of the string that divides it into halves. Each of these halves is a mirror image of the other.

Let’s take the following example:

Example Palindrome

As we can see, in the case of odd-length palindromes, the center is an element from the palindrome itself. The part before the center is called the left side of the palindrome. Similarly, the part after the center is called the right side.

However, in the case of even-length palindromes, we don’t have a center element. Instead, we just have a left and right side of the palindrome.

In both cases, the right side of the palindrome is a mirror image of the left side. In other words, the left and right sides are the reverses of each other.

One more note to consider is that if a string is a palindrome, then if we remove the first and last characters from , we’ll get a palindrome string as well.

Similarly, if a string is a palindrome, then adding the same character to the beginning and the end of will result in a palindrome string as well.

These properties will prove useful once we have a look at the algorithms later.

2.2. Defining the Problem

Now that we have a better understanding of palindromes, we can explain the problem that Manacher’s algorithm solves.

In the beginning, the problem gives us a string . After that, the problem asks us to calculate two arrays and . Both arrays should store an indication of the longest palindrome substrings (LPS) that are centered at each index .

As a result, ![s[i - oddP[i], i + oddP[i]]](/wp-content/ql-cache/quicklatex.com-c7373748ba0324603ac14c3cb9b6ef5e_l3.svg "Rendered by QuickLaTeX.com") will be the longest odd-length palindrome whose center is at index . Likewise, ![s[i - evenP[i] + 1, i + evenP[i]]](/wp-content/ql-cache/quicklatex.com-5c72e5f834c72f5b4cb26e6e0a32dfec_l3.svg "Rendered by QuickLaTeX.com") will be the longest even-length palindrome, such that the last character on its left side is at index .

To begin with, we’ll explain the naive approach to solve this problem. After that, we’ll explain the optimization that Manacher’s algorithm makes over the naive approach.

3. Naive Approach

The naive approach is straightforward. From each index inside the string, it tries to expand both sides as long as possible.

Let’s take a look at the naive approach implementation:

algorithm NaivePalindromeSubstrings(s, n):
    // INPUT
    //    s = The string to process
    //    n = The length of string s
    // OUTPUT
    //    Returns the oddP and evenP arrays indicating the longest palindrome substrings centered at each index

    oddP <- new array of size n initialized to 0
    evenP <- new array of size n initialized to 0

    for i <- 0 to n - 1:
        // Odd length palindromes
        oddP[i] <- 0
        L <- i - oddP[i] - 1
        R <- i + oddP[i] + 1 
        while L >= 0 and R < n and s[L] = s[R]:
            oddP[i] <- oddP[i] + 1
            L <- L - 1
            R <- R + 1

        // Even length palindromes
        evenP[i] <- 0
        L <- i - evenP[i]
        R <- i + evenP[i] + 1 
        while L >= 0 and R < n and s[L] = s[R]:
            evenP[i] <- evenP[i] + 1
            L <- L - 1
            R <- R + 1

    return (oddP, evenP)

We iterate over all possible indices inside the string . We set oddP[i] for each index to zero. Also, we set and to check the indexes of the next range.

Next, we check whether the current range forms a palindrome. The idea is that if the range [L, R] forms a palindrome, and we move to the next range, which is [L-1, R+1] , we only need to check the characters at index L-1 and R+1 .

If the current range forms a palindrome, we increase oddP[i] by one and extend our range by one to the left and right sides.

After that, we perform a similar operation to calculate evenP array. The difference is that we pay attention to the initial values of and . The reason is that in even-length palindromes, is the index of the last character on the left side of the palindrome substring. The rest is similar to calculating the oddP array.

Ultimately, we return the calculated oddP and evenP arrays.

The complexity of the naive approach is , where is the length of the string .

4. Manacher’s Algorithm

Manacher’s algorithm calculates the oddP and evenP arrays similarly to the naive approach. However, if possible, it tries to use the already calculated values rather than checking all the possible ranges from scratch.

Let’s explain the general idea first. After that, we can discuss implementation and complexity.

4.1. General Idea

Let’s take the following example to explain the idea of odd-length palindromes better:

Manacher Odd Example

Suppose we’ve calculated all values of oddP from index 0 to 7, and we need to calculate oddP[8] .

Before calculating oddP[8] , we’ll keep the palindrome substring that ends as far to the right as possible so far. The reason for choosing the right-most range will be explained in section 4.4. In our example, that is the substring s[3, 9] . Hence, L=3 and R=9 .

We can follow the naive approach to calculate oddP[8] . However, we can use the already calculated values to reduce the number of comparisons.

Since the substring s[L, R] is a palindrome, we can calculate the mirror image of index based on the center of the range [L, R] . In this example, the center is index 6. Therefore, the mirror of i=8 is j=4 .

Since the left side mirrors the right side, we might say that oddP[i] = oddP[j] .

However, there are two cases to consider. The first case is similar to the given example. We can extend over the value of . Therefore, we can extend oddP[i] from 2 and move forward without checking the value of oddP[i]=1 . We’ll get oddP[8]=2 in the example.

The second case is given in the second example:

Manacher Odd Example 2-1

In this case, we can’t say that because oddP[j] corresponds to the range [0, 8] , which is outside the range with L=3 and R=9 .

Therefore, we can only be sure of the value of because it’s the maximum range contained in the range . After that, we can try to extend the range following the naive approach. In this example, the range can’t be extended anymore.

4.2. Algorithm for Odd-Length Palindromes

Now that we understand the general idea, we can check the implementation. Take a look at the implementation of Manacher’s algorithm for odd-length palindromes:

algorithm ManachersAlgorithmForOddLengthPalindromes(s, n):
    // INPUT
    //    s = The string to process
    //    n = The length of string s
    // OUTPUT
    //    Returns the oddP array for odd-length palindromes

    L <- 0
    R <- -1
    oddP <- new array of size n initialized to 0

    for i <- 0 to n - 1:
        if i < R:
            oddP[i] <- min(oddP[L + R - i], R - i)
        left <- i - oddP[i] - 1
        right <- i + oddP[i] + 1 
        while left >= 0 and right < n and s[left] = s[right]:
            oddP[i] <- oddP[i] + 1
            left <- left - 1
            right <- right + 1
        if i + oddP[i] > R:
            L <- i - oddP[i]
            R <- i + oddP[i]

    return oddP

Initially, we initialize with zero and with , indicating an empty range. Then, we iterate over the string from left to right.

For each index , we initialize oddP[i] to the default value of the naive approach, which is zero. Now, we try to use values that have already been calculated.

If is inside the range , we update the value of to become the value of its mirror index. However, we pay attention to the case when the mirror index result becomes outside the range [L, R] .

Therefore, we take the minimum between oddP for the mirror index and the maximum length we can take from the current palindrome range [L, R] .

Since L+R-i is the mirror index of , then it’s enough to check if R-i is smaller than oddP[L+R-i] . The reason is that R - i equals j - L , where is the mirror index of . Therefore, it’s enough to check whether any of them is smaller than oddP for the mirror index.

Once we assign the initial value to oddP[i] , we simply follow the naive approach and try to expand the range centered at index as far as possible. But, in this case, we start from .

After that, we check whether we need to update the values of and . Since we need to store the palindrome range that goes as far to the right as possible, we check whether the current range is further to the right than the stored range. If so, we update the value of and .

Finally, we return the calculated array of oddP .

4.3. Algorithm for Even-Length Palindromes

The algorithm for even-length palindromes has small modifications over the one for odd-length palindromes. Take a look at its implementation:

algorithm ManachersAlgorithmForEvenLengthPalindromes(s, n):
    // INPUT
    //    s = The string to process
    //    n = The length of string s
    // OUTPUT
    //    Returns the evenP array for even-length palindromes

    L <- 0
    R <- -1
    evenP <- new array of size n initialized to 0

    for i <- 0 to n - 1:
        if i < R:
            evenP[i] <- min(evenP[L + R - i - 1], R - i)
        left <- i - evenP[i]
        right <- i + evenP[i] + 1
        while left >= 0 and right < n and s[left] = s[right]:
            evenP[i] <- evenP[i] + 1
            left <- left - 1
            right <- right + 1 if i + evenP[i] > R:
            L <- i - evenP[i] + 1
            R <- i + evenP[i]

    return evenP

This algorithm is similar to the second algorithm. However, we change the initial values of left and right to fit with the next range to check. Even-length palindromes have two center indexes. In our implementation, is the left index between the two center indexes.

After that, we’ll try to extend the palindrome range as much as possible. We pay attention to start the comparison from the value of , rather than starting it always from zero.

Once we’re finished, we update the currently stored right-most range.

Finally, we return the calculated array of evenP .

4.4. Proof of Concept

First of all, let’s prove the optimality of always keeping in hand the right-most palindrome substring. Suppose we were to keep a range whose right side is smaller. However, it’s a larger range, and its beginning is further to the left than the stored range.

In this case, we’ll minimize the value of R - i that we consider when calculating either oddP[i] or evenP[i] . Hence, we might end up with a smaller initial value for oddP[i] or . As a result, we’ll end up performing more comparisons than we’re supposed to.

From that, we can conclude that keeping the right-most range is always optimal.

Now, let’s discuss the complexity.

4.5. Complexity

At first glance, we might think that complexity is similar to the naive approach. However, a closer inspection will show us that, every time, we start oddP[i] either from oddP[L + R - i] or from R - i .

If oddP[L + R - i] is smaller than R - i , then the range won’t be extended any more. The reason is that if the range were to extend, the value would’ve been larger because both sides of the range [L, R] are mirrors of each other.

Therefore, in this case, we won’t make any successful comparisons.

Otherwise, if R - i is smaller than oddP[L + R - i] , then we might make successful comparisons. However, in this case, we’ll be exploring a palindrome range that is farther to the right than the current range. Hence, every successful comparison will result in a later forward movement of to the right.

As a result, each successful comparison operation also results in a one-step movement for forward. Also, we can notice that never gets reduced.

Therefore, the inner while loop gets executed at most times. Hence, the complexity of Manacher’s algorithm is , where is the number of characters inside the string .

5. Applications

Let’s examine a few applications where Manacher’s algorithm can be used.

5.1. Longest Palindrome Substring

Suppose the problem gives us a string and asks us to calculate the longest palindrome substring inside the string .

This is a straightforward application for Manacher’s algorithm. We simply calculate both arrays oddP and evenP as shown in algorithms 2 and 3.

Since these arrays store the longest palindrome substring for each index, we just need to check the values $oddP[i] \times 2 + 1$ and $evenP[i] \times 2$ . From all these values, the answer is the maximum possible among them.

The time complexity of the described approach is , where is the length of string .

5.2. Number of Palindrome Substrings

Another application is to calculate the number of palindrome substrings inside a given string .

Similarly, we calculate both arrays oddP and evenP as shown in algorithms 2 and 3. After that, we iterate over all possible values of these two arrays.

The array oddP stores the length of each side of the longest palindrome substring. By removing the first and last character of each of these substrings, we get a new palindrome.

Hence, we should calculate:

$[sumOdd = \sum_{i=0}^{n - 1} oddP[i] + 1]$

Note that, we added one for each index indicating that every single character is a palindrome of its own.

In addition, we need to calculate:

$[sumEven = \sum_{i=0}^{n - 1} evenP[i]]$

Finally, we should return the value of .

The described solution can be implemented in time complexity, where is the length of string .

5.3. Number of Different Palindrome Substrings

This problem is similar to the previous one. However, instead of calculating the number of palindrome substrings in general, we’re asked to calculate the number of different palindrome substrings. Therefore, if two or more equal substrings are palindromes, the answer needs to be incremented only by one.

This problem is a little tricky and needs a good understanding of Manacher’s algorithm.

First of all, when we take the LPS value of the mirror of index , this value corresponds to the exact same substring as the one at index . We don’t need to calculate this value because it was already calculated when we were at the mirror index.

However, each extending operation might result in a new different substring. Note that the expansion operation is done at most times.

We can use a hash function, similar to the one used in the Rabin-Karp algorithm. The hashing function can give us the hash value of a certain substring in constant time complexity. Therefore, we can calculate the hash value of all the substrings resulting from expansion operations in linear time.

Finally, the answer is the number of the resulting hash values without repetition. To do this, we can insert all the hash values into a hash-set. Then, the answer will be the size of the hash-set because it adds the same value only once.

This approach can be efficiently implemented in time complexity, where is the length of the string .

6. Conclusion

In this tutorial, we explained the term palindrome and discussed palindrome substrings inside a given string.

First of all, we explained the naive approach.

After that, we explained Manacher’s algorithm with both its versions that get the odd-length and even-length palindromes.

Finally, we presented some problems that can be solved using Manacher’s algorithm in linear time complexity.

Persistence

REST

Security