1. Introduction
Terminals are what Linux is all about. Be it a terminal emulator, Pseudo-Terminal Secondary (PTS), or just a plain TeleTYpewriter (TTY), they are the central input and output mechanism for a Linux system. However, the character set in a terminal setting may be limited, especially when it comes to binary and Unicode.
In this tutorial, we explore how to show special characters in Linux with the less command. First, we start with a brief introduction of terminal character sets, defining special characters. Next, we create an example file containing several such characters. After that, less takes the spotlight, revealing how it handles special characters. Finally, we see the flags and notation less uses to replace characters.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Terminal Character Sets and Special Characters
There are multiple factors that determine which characters are available in a given terminal:
- values of the local environment variables
- font of the terminal
- shell in use
- terminal emulator, if any
- running application
Of course, it’s hard to know all of the conditions above. However, they control whether a terminal can display special characters. Here, we define special characters as non-printable or non-ASCII symbols.
Because of this, testing is commonly the easiest way to check. For example, we can try to output a Cyrillic character via the echo command:
$ echo $'\u0436'
ж
In this case, we use a backslash escape character. Specifically, \u begins a four-digit code from the Unicode table, which corresponds to the Cyrillic letter а. Here, the terminal can output this character.
On the other hand, the result is different with the $’\u4e2d’ (中) Chinese hieroglyph:
$ echo $'\u4e2d'
�
With it, the terminal produces a replacement character in our case.
Importantly, characters can also be hidden even when the terminal could theoretically display them. For instance, there is the non-printable $’\u0007′ (BELL,
$ echo $'\u0007'
By default, with most applications and terminals, there is no visible output when printing non-printable characters. With this in mind, let’s check a command that can show all special characters.
3. Creating an Example Special-Character File
First, let’s create a test file /file and populate it with the characters zж中
-
, the BELL character, sounding the system speaker, i.e., \u0007 -
, the BACKSPACE character, i.e., \b, \u0008 -
, the HORIZONTAL TAB character, i.e., \t, \u0009 -
, the LINE FEED character, i.e., \n, \u000a -
, the CARRIAGE RETURN character, i.e., \r, \u000d
To create the file with these contents, we’ll again use the echo command, but without appending a newline (-n):
$ echo -n $'\u007a\u0436\u4e2d\u0007\u0008\u0009\u000a\u000d' > /file
As we’ve already seen, there are many options to find non-ASCII characters in files. Tools like xxd exist that include non-printable characters as well:
$ xxd /file
00000000: 7ad0 b6e4 b8ad 0708 090a 0d z..........
Admittedly, xxd does not really show special characters and instead replaces them with periods, but it does provide their codes. Let’s move on to a command that can actually show unique symbols for unique characters.
4. Special Characters in the less Command
To get more detail, there’s the less command. Despite its name, it’s much more feature-rich than its older peers.
For instance, let’s try to simply output /file from the previous section with less:
$ less /file
zж� ^G^H
/file (END)
Here, we see two printable characters, followed by an unknown character and a space. In the end, there are ^G (
However, we still don’t see our third character (中), which is instead replaced by two incorrect ones. That’s something that our profile and terminal settings just don’t support.
On the other hand, while we can note the presence of a newline (
$ less --underline-special --UNDERLINE-SPECIAL /file
zж� ^G^H^I
^M
/file (END)
Now, we can see all four non-printable characters, including ^I for the tab, ^H for the backspace, and ^M for the carriage return. To be clear, -u (–underline-special) takes care of backspaces and carriage returns, while -U (–UNDERLINE-SPECIAL) handles the rest.
What is less substituting these normally hidden special characters with?
5. Special Character Notation
In fact, the cat manual specifies the -v (–show-nonprinting) flag, which uses the same ^ and M- notations we saw above.
To see all characters and how they are replaced, we can use perl and cat:
$ perl -e 'for(my $c = 0; $c < 256; $c++) { print(sprintf("%c is %d %x\n", $c, $c, $c)); }' | cat --show-nonprinting
^@ is 0 0
^A is 1 1
^B is 2 2
^C is 3 3
[...]
M-^] is 157 9d
M-^^ is 158 9e
[...]
In short, the above perl script goes through all 255 codes of the ASCII table and prints each as a character, a decimal code, and a hexadecimal code. Passing this output through cat –show-nonprinting, the characters are replaced with their notation, and we can see the correspondence.
6. Summary
In this article, we saw how less could show special characters, as well as the notation it uses to do so.
In conclusion, there are many tools to display non-ASCII and non-printable characters, and less is among the more convenient.