1. Overview
Sometimes, when editing a text file, we may need to be able to display non-printable control characters. For example, this can be useful if we want to remove non-expected symbols from the file.
In this tutorial, we’ll learn multiple ways of finding control characters in a file using command-line tools in Linux. First, we’ll look at how to filter the lines that contain a specific control character. Next, we’ll explore the tools to display full file content while showing the control characters.
2. Creating an Example File With Control Characters
To begin with, let’s create a test file ~**/test_file and add the following content to it:
$ echo -n $'tab \t character\nbackspace character \b\nno special character\n' > ~/test_file
Here, we added three lines of text that include non-printable control characters in them:
- \t is the HORIZONTAL TAB character
- \b is the BACKSPACE character
- \n is the LINE FEED character
In the above command, we used the -n option to avoid appending a new line, and the $’ ‘ construct which converts the ANSI backslash notation to the actual control character.
Now, we should be able to see the contents of the ~/test_file file:
$ cat -A ~/test_file
tab ^I character$
backspace character ^H$
no special character$
Here, option -A shows all non-printing characters, while the symbols ^I, $, and ^H represent the special characters \t, \n, and \b respectively.
3. Search for Specific Control Character
If we need to search for a specific control character in a file, we can use command-line text-searching tools, such as grep or sed.
Let’s look at them in more detail.
3.1. Using grep
grep is a powerful tool for filtering based on lines that contain specific sequences in a file. Thus, we can use it to search for lines that contain a specific control character.
As grep doesn’t support the backslash notation, we utilize the $” construct to provide the control characters.
Let’s see the template of the grep command:
grep $'<CONTROL CHARACTER>' <FILE_NAME>
Now, we can search for the lines that contain the horizontal tab character in the ~**/test_file file:
$ grep $'\t' ~/test_file
tab character
As we can see, the line with the horizontal tab character has been found.
Similarly, we can search for the lines with the backslash character:
$ grep $'\b' ~/test_file
backspace character
As expected, the output shows the line of the ~/test_file file that includes the backspace.
3.2. Using sed
We can also use the sed command to find lines with a specific control character.
For example, let’s find the line with the tab in it:
$ sed -n /$'\t'/p ~/test_file
tab character
As we can see, the command has printed the line that contains the horizontal tab control character.
Here, the -n option suppresses automatic printing of the ~/test_file, while the /
Moreover, we can make sed print the lines in the format that shows all control characters in it.
For that, we replace the p letter with the l letter in the sed command:
$ sed -n /$'\t'/l ~/test_file
tab \t character$
Now, the line includes all control characters, such as the \t character, and the $ character, representing the line feed.
4. Using Regular Expressions to Find All Control Characters
We can also use other text-searching tools to find all lines that contain non-printable control characters.
For that, we utilize the regular expression [^ -~]. This expression searches for patterns that don’t contain the printable ASCII range, starting from the space character and ending with a tilde. Since we want to exclude printable characters from the search, we add the ^ symbol in front of the sequence.
Before we apply this regex pattern, we make sure that the locale uses the standard ASCII settings:
$ export LANG=C
Now, we are ready to use the text-searching tool with the regex to find all lines that contain control characters. Let’s use the sed command for that:
$ sed -n '/[^ -~]/l' ~/test_file
tab \t character$
backspace character \b$
As expected, the lines with the horizontal tab and the backspace character have been printed.
5. Use less to Display All Control Characters
In case we want to display the whole file while showing control characters in it, we can use the less utility:
$ less --underline-special --UNDERLINE-SPECIAL ~/test_file
tab ^I character
backspace character ^H
no special character
/test_file (END)
As we can see, the ~/test_file is now printed showing the control characters in it. The special character notation includes ^I for the tab and ^H for the backspace. In the command, the option -u (–underline-special) displays backspaces and carriage returns, while -U (–UNDERLINE-SPECIAL) handles the rest.
6. Conclusion
In this article, we learned how to find control characters in a file. Initially, we looked at multiple ways of doing so using the grep and sed tools. After that, we explored the less command to display full file content while showing the control characters.