1. Introduction
In this tutorial, we’ll present three ways to check the periodicity of a string. In other words, we check if it’s a repetition of one of its substrings.
2. Problem Statement
We have a string () and want to check if we can obtain it by concatenating one of its proper prefixes to itself multiple times. The substring of we repeatedly concatenate to get has to be its prefix. Otherwise, it wouldn’t be possible to form the whole string.
For example, if , then we get by concatenating two times to itself:
which we write as . Since isn’t a prefix of , we can’t get from it.
So, formally, the prefix we want to find is of the form where .
Let’s see how we can determine it.
3. The Brute-Force Approach
We can iterate over the prefixes of and check if concatenating them gives us . To test if concatenates to , we iterate over and check if for all . This way, we cyclicly iterate over :
If we find an for which the test fails, we can stop the iteration. Otherwise, we return since we’ve proven that for some integer .
Here’s the pseudocode:
algorithm BruteForceAlgorithm(s):
// INPUT
// s = the input string (lenght = n, 1-based indexing)
// OUTPUT
// If there is a proper prefix z = s[1 : m] (1 ≤ m < n) of s
// such that s = z^k for an integer k, returns z.
// Otherwise, returns an indication of failure.
for m <- 1 to n - 1:
// Test a new candidate
z <- s[1 : m]
i <- 1
while i <= n and s[((i - 1) mod m) + 1] = s[i]:
i <- i + 1
if i > n:
// The other condition was satisfied for all i ≤ n, so s = z + z +...+ z
return z
// If we've come this far, then no prefix of s can be repeated to obtain s
return failure
3.1. Complexity
This algorithm’s worst-case time complexity is . Here’s why. The worst-case input is a string of the form . If given such an input, the algorithm runs symbol equality tests for each of the outer loop. So, the total number of checks is .
The complexity would stay quadratic even if we started checks from in the inner loop, skipping the unnecessary checks of , , , :
The problem with this approach is that it tries to get even from the prefixes for which we can conclude in advance that they can’t form . The key is to observe that has to divide without remainder for to be a repetition of . For instance, and . Even without trying to get the latter from the former, we can say that it’s an impossible task because .
4. The Improved Algorithm
So, we start from in the outer loop as before but don’t test whether unless is a divisor of .
The iteration can stop at because no number greater than divides except , and has to be strictly lower than :
Here’s the pseudocode:
algorithm ImprovedAlgorithm(s):
// INPUT
// s = the input string (lenght = n, 1-based indexing)
// OUTPUT
// If there is a proper prefix z = s[1 : m] (1 ≤ m < n) of s
// such that s = z^k for an integer k, returns z.
// Otherwise, returns an indication of failure.
for m <- 1, 2, ..., floor(n/2):
if n mod m = 0:
// Test a new candidate
z <- s[1 : m]
i <- 1
while i ≤ n and s[((i-1) mod m) + 1] = s[i]:
i <- i + 1
if i > n:
// The other condition was satisfied for all i, so s = z + z +...+ z
return z
// If we've come this far, then no prefix of s can be repeated to obtain s
return failure
Another way would be to identify the divisors of in advance to skip unnecessary checks whether .
4.1. The Complexity
Let’s derive the worst-case complexity of this approach. The worst-case input is of the same form as in the previous algorithm (). The iterations of the inner loop still perform checks but since we enter the inner loops only if , the number of inner loops isn’t but is equal to , where is the number of different divisors of .
Since , the total complexity is , so the algorithm is definitely sub-quadratic. We can get a tighter bound if we use the result:
From there, we conclude that , so the algorithms’ complexity is:
That means that the algorithm’s worst-case performance is better than but worse than . Is there a way to solve the problem in the time?
5. The Rotation Algorithm
We can design a linear-time algorithm by using the theory of strings.
5.1. The Rotation Theorem
We’ll use the theorem which states that a string is a repetition of one of its substrings if and only if is a non-trivial rotation of itself. Let’s prove it.
If for some of length , then by removing the first symbols and appending them to the last occurrence of in , we get the same string.
To prove the theorem in the other direction, we assume that is a non-trivial rotation of itself. That means that there’s an such that we can remove the first symbols, append them to the rest of in the same order, and obtain the starting string, :
Since the LHS and RHS are equal, it must hold that , , , . So, we’ve proven that:
However, since the rotation of is equal to , we can rotate it again by symbols to get the same string and conclude that , , , , so . Following the approach, we get that .
5.2. The Algorithm
So, the problem is now to check whether the input string is a non-trivial rotation of itself. If that’s the case, will be a proper substring of (not starting at the st and the th positions). Therefore, we reduce the original problem to a string search. We try to find the index of in different from 1 and . If successful, we’ll show that is periodic, and the building block will be .
We can use an efficient string matching algorithm that uses the substring index of (constructible in time) and then look for a non-trivial occurrence of in time. Therefore, we can solve the original problem in the linear worst-case time.
6. Conclusion
In this article, we presented three algorithms to check string periodicity. The first and the second ones are easier to understand and implement than the third one. However, the rotation algorithm is more efficient. Its worst-case complexity is , whereas the other two have super-linear time complexities.