1. Overview
In this tutorial, we’ll learn a few advanced techniques of using the sed editor to replace the nth occurrence of a string.
2. Gaps in the Naive Approach
Let’s say that we have a file named teams.txt containing the names of players in multiple teams:
$ cat teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alex, Linda, Peter
We’re told that one of the players, “Alex,” is listed twice in the file due to a typo, so we need to replace the second occurrence of the word “Alex” with the right player name, “Alexa”.
At first thought, we might be inclined to use the basic substitute function (s) directly with the /2 flag using the word boundary matching (\b):
$ sed -e 's/\bAlex\b/Alexa/2' teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alex, Linda, Peter
We can see that the naive word substitution with the /2 flag didn’t change the second occurrence. That’s because, by default, sed consumes input data as a stream and executes the substitution function on each line separately.
In the following sections, we’ll learn a few advanced techniques for solving this use case. Further, we must note that although we can use the -i flag to do in-place modification of a file, we’ll avoid it in favor of showing the changes directly on stdout.
3. Using tr With sed
In this approach, we intend to use the tr command to transform the entire input string into a single line of data. So, when we use the sed command to substitute with the /2 flag, it matches with exactly one occurrence.
First, we need to find one character not present in the original file and replace all the newline characters with this character. In our case, there is no occurrence of the “@” character in the teams.txt file, so let’s go ahead and replace all the “\n” characters with the “@” character:
$ tr '\n' '@' < teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector@Team-2: Sally, David, Alex, Linda, Peter@
Next, let’s pipe this with the basic substitution function to replace the 2nd occurrence of the pattern:
$ tr '\n' '@' < teams.txt | sed 's/\bAlex\b/Alexa/2'
Team-1: Alex, Bill, Reeta, Ted, Hector@Team-2: Sally, David, Alexa, Linda, Peter@
Finally, let’s translate all the occurrences of the “@” character with the “\n” character and see the complete command altogether:
$ tr '\n' '@' < teams.txt | sed 's/\bAlex\b/Alexa/2' | tr '@' '\n'
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alexa, Linda, Peter
It looks like we’ve got the correct result this time. However, finding a character with zero occurrences in a large file can be costly, and manipulating the newlines twice is an additional overhead.
4. Brute Force Approach Using N Function
In this approach, our goal is to get all the lines from the file in the pattern space and then execute the substitute function (s) to replace a specific occurrence of the pattern.
First, we use the N function to append each line except the last one in the pattern space one at a time. Additionally, we delay the substitution function’s execution until we have the last line in the pattern space. So let’s go ahead and define the replace-nth-occurrence.sed script:
$ cat replace-nth-occurrence.sed
N
$s/\bAlex\b/Alexa/2
Next, let’s execute this script with sed using the -f option:
$ sed -f replace-nth-occurrence.sed teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alexa, Linda, Peter
Looks good! We’ve achieved the goal of replacing the 2nd occurrence of the pattern using sed without relying on any other utility.
5. Using N Function With Branching
We can do a memory optimization of our approach of using the N function by using the pattern space effectively. Since we don’t need to make any chance after finding the nth occurrence, we can stop appending lines after the substitution. To do this, we can leverage the flow control in sed using the conditional and unconditional branching with the t and b functions, respectively.
First, let’s define the attempt_second_substitution label in the replace-nth-occurrence-using-branching.sed script:
:attempt_second_substitution
s/\bAlex\b/Alexa/2
t noop_until_end
b append_next_line
Next, let’s define the append_next_line label in the script:
:append_next_line
N
b attempt_second_substitution
We can see that after appending the next line, we do an unconditional branching to the attempt_second_substitution label, thereby creating a loop that breaks when the substitution happens. Further, once the substitution occurs, we don’t need to make any changes to the remaining lines of the file. So, let’s go ahead and define the noop_until_end label in the script:
:noop_until_end
n
b noop_until_end
We can see that we’re using the n function to slide through the remaining lines as they are without doing any string manipulation. Moreover, we’re using unconditional branching function (b) to keep us in the loop until the end of the file.
Moving on, let’s see the entire replace-nth-occurrence-using-branching.sed script:
$ cat replace-nth-occurrence-using-branching.sed
:attempt_second_substitution
s/\bAlex\b/Alexa/2
t noop_until_end
b append_next_line
:append_next_line
N
b attempt_second_substitution
:noop_until_end
n
b noop_until_end
Finally, let’s test our script by putting it into action:
$ sed -f replace-nth-occurrence-using-branching.sed teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alexa, Linda, Peter
Perfect! It works as expected.
6. Memory-Optimized Approach
In this section, we’ll learn an even better approach by further limiting the lines in the pattern space starting from the first occurrence and ending at the second occurrence.
Let’s start by extending the replace-nth-occurrence-using-branching.sed script as replace-nth-occurrence-using-branching-optimized.sed, by adding the attempt_first_substitution label:
:attempt_first_substitution
s/\bAlex\b/Alex/1
t append_next_line
b noop_until_first_occurrence
We can see that the substitution uses the same search and replace string, thereby acting as a checkpoint without an actual string manipulation. If the first occurrence is not yet found, we need to continue our search in the file. Otherwise, we need to start appending the lines in the pattern search.
Next, let’s go ahead and define the noop_until_first_occurrence labels in the script:
:noop_until_first_occurrence
n
b attempt_first_substitution
As a result, the optimized version of our script is ready. Let’s take a look at the complete script:
:attempt_first_substitution
s/\bAlex\b/Alex/1
t append_next_line
b noop_until_first_occurrence
:noop_until_first_occurrence
n
b attempt_first_substitution
:attempt_second_substitution
s/\bAlex\b/Alexa/2
t noop_until_end
b append_next_line
:append_next_line
N
b attempt_second_substitution
:noop_until_end
n
b noop_until_end
Finally, let’s run the script and see it in action:
$ sed -f replace-nth-occurrence-using-branching-optimized.sed teams.txt
Team-1: Alex, Bill, Reeta, Ted, Hector
Team-2: Sally, David, Alexa, Linda, Peter
We can see that the script is working as expected.
7. Conclusion
In this article, we learned how to replace the nth occurrence of a pattern by using a few advanced concepts in sed. Additionally, we focused on how we can optimize our approach to minimize the memory footprint of our sed script.