如何在文本中查找字符的位置

1. Introduction

In this tutorial, we’ll discuss various ways of finding the position of a character in a text on the Linux OS.

We’ll do that using the grep and the awk commands. Both these commands can be used in multiple scenarios that we’ll review in detail. We’ll also understand their usage with the help of different examples

2. Using the grep Command

The grep command primarily locates a certain text or pattern, but we can also use it to locate any character’s position.

2.1. Finding Character Position

Let’s find the position of the character d in the string baeldung:

$ echo "baeldung" | grep -ob "d"
4:-

Now, let’s understand the command:

echo “baeldung”: The echo command displays the text “baeldung” on the terminal
|: The pipe operator (|) connects the first command’s output on its left side as an input to the second command on its right side
grep: The grep command searches for any specific pattern from the text
-o: The -o option displays the exact matching character, i.e., “d” for all occurrences
-b: The -b option displays the 0-based byte offset within the input

The output depicts the 0-based index position of the character, which is 4, along with the character (d).

Now let’s see how the grep command handles multiple occurrences of a character in the text:

$ echo "This is a sample text" | grep -ob "a"
8:a
11:a

As we can see, the grep command displays a list of 0-based indices of all the occurrences of the selected character.

Furthermore, if we only want to display the position number, then the following command can do this job:

$ echo "This is a sample text" | grep -ob "a" | grep -oE "[0-9]+" 
8 
11

In the last part of the command, i.e., grep -oE “[0-9]+”, the -E option is the extended regular expression that searches for the digits [0 – 9] from the previous output. The + joins the digits occurring together and displays them as a single number. Only the matching digits are displayed on the terminal.

2.2. Finding the First Occurrence

We can use the head and grep commands together to display the first occurrence of the character from the string:

$ echo "This is a sample text" | grep -ob "a" | grep -oE "[0-9]+" | head -n 1
8

Here the head -n 1 extracts the first line from the previous output and displays it on the terminal.

The head command can display the first k occurrences of a character in a text using head -n k, where k is the number of desired occurrences.

2.3. Finding the Last Occurrence

Similarly, we can use the tail command to display the last occurrence of a character from the string:

$ echo "This is a sample text" | grep -ob "a" | grep -oE "[0-9]+" | tail -n 1
11

Here the tail -n 1 extracts the last line from the previous output and displays it on the terminal.

2.4. Finding Multiple Characters Positions

We can also use the grep command to locate multiple characters’ positions from a text:

$ echo "This is a sample text" | grep -obe "i" -e "a" 
2:i
5:i
8:a
11:a

We use the -e option to search for multiple patterns. In this case, we want to search for characters i and a. We can see their position in the text in the output.

2.5. Finding Characters Position From a Text File

One useful feature of the grep command is its ability to handle text files. No matter the length, it can easily find the character’s position from the whole file.

Let’s suppose we have a text file named sample.txt, with the same text used in our earlier examples. Now let’s find the character a position*:*

$ grep -ob "a" sample.txt | grep -oE '[0-9]+'
8
11

The results are as expected, with a similar output as before. However, this time, we’ve provided the text file name rather than the text itself.

We can also use the grep command to find the characters from multiple files simultaneously. For example, we want to find the character a’s position from all files with the .txt extension within the current directory*:*

$ grep -ob "a" *.txt
file1.txt:8:a
file1.txt:11:a
file2.txt:9:a
file2.txt:12:a
file2.txt:23:a

The output shows that it detects multiple text files. The character a’s position, along with the file name, can be seen in the output.

3. Using the awk Command

The awk command is primarily used for text processing but can also locate the character’s position in a string.

3.1. Finding Character Position

Let’s find the character position using the same text as earlier:

$ echo "This is a sample text" | awk '{print index($0, "a")}' 
9

On the right side, the index() function only finds the first occurrence of a character a. Later, the print displays the matching output on the terminal. Unlike the grep command, the output position is in the 1-based byte offset format instead of the 0-based offset.

3.2. Finding Characters Position From a Text File

We can also use the awk and cat commands to display the character’s position from a text file.

Here we again use the sample.txt file, which we used earlier for the grep case:

$ cat sample.txt | awk '{print index($0, "a")}'
9

We’ve placed the same text in the sample.txt file. The cat command displays the content of the desired file, and then the awk command prints the character a position.

3.3. Finding All Occurrences

We can create a Bash script to display all occurrences of a character in a text. We use Bash here since it’s the default shell in most Linux distributions.

We can use any text editor for this purpose, such as vi:

$ vi test.sh

The .sh is the extension of the bash script.

Next, let’s write a Bash script and then save it:

echo "This is a sample text" | awk '{
    for(i=1; i<=length($0); i++){
        if(substr($0, i, 1) == "a"){
            print i;
        }
    }
}'

Now let’s focus on the awk part:

for(i=1; i<=length($0); i++): This for loop starts from the first character of the string and executes till the last character
if(substr($0, i, 1) == “a”): This looks for the character a in the current position i
print i: This prints the character a’s position for all the occurrences

We’ll get the following output when we execute this Bash script:

$ bash test.sh
9
12

It now shows the position of all the occurrences for character a.

4. Conclusion

In this article, we learned how we can find the position of a character within a text.

Both the grep and awk commands offer effective ways to find the character’s position from the text. They also work very well in handling text files. One main difference between these two is that the grep command considers the 0-byte index, while the awk doesn’t.

Additionally, the grep command is more flexible and user-friendly. Its syntax is more straightforward to understand. Moreover, the Bash scripting for the awk command required for locating multiple occurrences of a character can be cumbersome for beginners.

Persistence

REST

Security