如何在Shell中提取字符串的最后n个字符

1. Overview

String manipulation is an essential aspect of shell scripting. The GNU Coreutils package in Bash along with sed and awk provides several tools for text processing and manipulating strings.

In this tutorial, we’ll explore how to extract the last n characters of a string in Bash.

2. Sample Task

Let’s suppose we have data in a file named data.txt:

$ cat data.txt
A1B2C3RST
D4E5F6UVW
G7H8I9XYZ
CL

Here, cat shows us that the contents of data.txt comprise lines of capital letters intertwined with digits.

Our objective is to extract the last three characters of each line. Importantly, if a line contains less than three characters, any existing characters on that line should still be printed.

Let’s explore different methods for solving the task.

3. Using rev and cut

The rev command reverses the order of the characters in every line of a given file. Therefore, we can use rev in conjunction with the cut command to extract the last three characters of each line:

$ rev data.txt | cut -c 1-3 | rev
RST
UVW
XYZ
CL

The procedure consists of three steps:

rev reverses the order of the characters in each line
cut extracts the first three characters in each line by using the -c option and the 1-3 range
apply rev again to re-order the sequence of the extracted characters

As a result, we get the last three characters in their proper order.

4. Using sed

Alternatively, we can use sed to extract the last three characters using a regular expression:

$ sed -E 's/.*(...)/\1/' data.txt
RST
UVW
XYZ
CL

In this case, we use the -E option with sed to enable extended regex. The regular expression pattern consists of zero or more characters followed by exactly three characters, represented by a dot, that are captured in a group.

Finally, the entire pattern is replaced by the captured group, represented by \1. This way, we replace each line with its last three characters.

Notably, if a line consists of less than three characters, no match is found, and sed doesn’t modify that line.

5. Using awk

Another approach is to use the built-in substr() function in awk to extract the required substring in each line:

$ awk '{ print substr($0, length($0)-2) }' data.txt
RST
UVW
XYZ
CL

Each line of the input file is represented by $0. In the substr() function, we specify the start index of the substring as length($0)-2. This cuts each line from the start index until the very end of the string.

Therefore, we obtain the last three characters in each line.

6. Using tail

We can also use the tail command with the -c option to specify the last three characters of a string.

However, in this case, we need to iterate over each line of the file before we use the tail command:

$ cat last_three_characters.sh
#!/usr/bin/env bash
while read -r line; do
    echo -n "$line" | tail -c 3
    echo
done < data.txt

In the last_three_characters.sh script above, we use a while loop to iterate over the lines of the data.txt file which is provided as input via stdin.

Within the while loop, we perform a series of steps:

read each line of the file and save it in the line variable
echo the line variable using the -n option to avoid the trailing newline character at the end of each line
pipe the result to the tail -c 3 command to extract only the last three characters
echo a newline character so that each result ends up on a separate line

Notably, it’s common practice in to use the -r option with read in the first step to interpret backslash characters literally, in case it’s present.

Next, we grant the script execute permissions via chmod:

$ chmod +x last_three_characters.sh

Finally, we run the script:

$ ./last_three_characters.sh
RST
UVW
XYZ
CL

Again, we see that we obtained the expected result.

7. Using Python

Another approach for solving the task is to use the Python interpreter to extract the required substring from each line:

$ cat last_three_characters.sh
#!/usr/bin/env bash
while read -r line; do
    echo "$line" | python3 -c "print(input()[-3:])"
done < data.txt

In the modified script, we still pass the data.txt file via stdin to a while loop that iterates over each line of the file. We also read each line into a variable using the read command. However, this time, we process the lines using Python.

To do so, we echo the line variable and pipe the result to the Python interpreter. The -c option with the python3 command enables us to specify commands to execute within a quoted string.

In particular, we use the input() function in Python to read the result from stdin. Then, we print the last three characters as indicated by the negative indexing. The indexed slice starts at index 3 counting from the end and continues till the very end of the string.

Next, we run the script:

$ ./last_three_characters.sh
RST
UVW
XYZ
CL

We see that we obtain the same result as before.

8. Conclusion

In this article, we explored different methods for extracting the last n characters of a string. In particular, one method included using rev in conjunction with cut. Other methods included the use of sed and regex, the substr() function in awk, as well as the tail command, and the Python interpreter.

Persistence

REST

Security