1. Introduction
In this tutorial, we’ll present three ways to check the periodicity of a string. In other words, we check if it’s a repetition of one of its substrings.
2. Problem Statement
We have a string (
) and want to check if we can obtain it by concatenating one of its proper prefixes to itself multiple times. The substring of
we repeatedly concatenate to get
has to be its prefix. Otherwise, it wouldn’t be possible to form the whole string.
For example, if , then we get
by concatenating
two times to itself:
which we write as . Since
isn’t a prefix of
, we can’t get
from it.
So, formally, the prefix we want to find is of the form where
.
Let’s see how we can determine it.
3. The Brute-Force Approach
We can iterate over the prefixes of and check if concatenating them gives us
. To test if
concatenates to
, we iterate over
and check if
for all
. This way, we cyclicly iterate over
:
If we find an for which the test fails, we can stop the iteration. Otherwise, we return
since we’ve proven that
for some integer
.
Here’s the pseudocode:
algorithm BruteForceAlgorithm(s):
// INPUT
// s = the input string (lenght = n, 1-based indexing)
// OUTPUT
// If there is a proper prefix z = s[1 : m] (1 ≤ m < n) of s
// such that s = z^k for an integer k, returns z.
// Otherwise, returns an indication of failure.
for m <- 1 to n - 1:
// Test a new candidate
z <- s[1 : m]
i <- 1
while i <= n and s[((i - 1) mod m) + 1] = s[i]:
i <- i + 1
if i > n:
// The other condition was satisfied for all i ≤ n, so s = z + z +...+ z
return z
// If we've come this far, then no prefix of s can be repeated to obtain s
return failure
3.1. Complexity
This algorithm’s worst-case time complexity is . Here’s why. The worst-case input is a string of the form
. If given such an input, the algorithm runs
symbol equality tests for each
of the outer loop. So, the total number of checks is
.
The complexity would stay quadratic even if we started checks from in the inner loop, skipping the unnecessary checks of
,
,
,
:
The problem with this approach is that it tries to get even from the prefixes for which we can conclude in advance that they can’t form
. The key is to observe that
has to divide
without remainder for
to be a repetition of
. For instance,
and
. Even without trying to get the latter from the former, we can say that it’s an impossible task because
.
4. The Improved Algorithm
So, we start from in the outer loop as before but don’t test whether
unless
is a divisor of
.
The iteration can stop at because no number greater than
divides
except
, and
has to be strictly lower than
:
Here’s the pseudocode:
algorithm ImprovedAlgorithm(s):
// INPUT
// s = the input string (lenght = n, 1-based indexing)
// OUTPUT
// If there is a proper prefix z = s[1 : m] (1 ≤ m < n) of s
// such that s = z^k for an integer k, returns z.
// Otherwise, returns an indication of failure.
for m <- 1, 2, ..., floor(n/2):
if n mod m = 0:
// Test a new candidate
z <- s[1 : m]
i <- 1
while i ≤ n and s[((i-1) mod m) + 1] = s[i]:
i <- i + 1
if i > n:
// The other condition was satisfied for all i, so s = z + z +...+ z
return z
// If we've come this far, then no prefix of s can be repeated to obtain s
return failure
Another way would be to identify the divisors of in advance to skip unnecessary checks whether
.
4.1. The Complexity
Let’s derive the worst-case complexity of this approach. The worst-case input is of the same form as in the previous algorithm (). The iterations of the inner loop still perform
checks but since we enter the inner loops only if
, the number of inner loops isn’t
but is equal to
, where
is the number of different divisors of
.
Since , the total complexity is
, so the algorithm is definitely sub-quadratic. We can get a tighter bound if we use the result:
From there, we conclude that , so the algorithms’ complexity is:
That means that the algorithm’s worst-case performance is better than but worse than
. Is there a way to solve the problem in the
time?
5. The Rotation Algorithm
We can design a linear-time algorithm by using the theory of strings.
5.1. The Rotation Theorem
We’ll use the theorem which states that a string is a repetition of one of its substrings if and only if
is a non-trivial rotation of itself. Let’s prove it.
If for some
of length
, then by removing the first
symbols and appending them to the last occurrence of
in
, we get the same string.
To prove the theorem in the other direction, we assume that is a non-trivial rotation of itself. That means that there’s an
such that we can remove the first
symbols, append them to the rest of
in the same order, and obtain the starting string,
:
Since the LHS and RHS are equal, it must hold that ,
,
,
. So, we’ve proven that:
However, since the rotation of is equal to
, we can rotate it again by
symbols to get the same string and conclude that
,
,
,
, so
. Following the approach, we get that
.
5.2. The Algorithm
So, the problem is now to check whether the input string is a non-trivial rotation of itself. If that’s the case,
will be a proper substring of
(not starting at the
st and the
th positions). Therefore, we reduce the original problem to a string search. We try to find the index
of
in
different from 1 and
. If successful, we’ll show that
is periodic, and the building block
will be
.
We can use an efficient string matching algorithm that uses the substring index of (constructible in
time) and then look for a non-trivial occurrence of
in
time. Therefore, we can solve the original problem in the linear worst-case time.
6. Conclusion
In this article, we presented three algorithms to check string periodicity. The first and the second ones are easier to understand and implement than the third one. However, the rotation algorithm is more efficient. Its worst-case complexity is , whereas the other two have super-linear time complexities.