1. Overview
Computers use character encoding to map characters to binary numbers in order to store text data. Examples of character encodings include UTF-8, UTF-16, UTF-32, ASCII.
In this tutorial, we’ll learn how to find the encoding of a file in Linux.
2. Using file
One way to find the encoding of a file is using the file command:
$ file -bi text1.txt
text/plain; charset=us-ascii
$ file -bi text2.txt
text/plain; charset=utf-8
In the above snippet:
- -b tells file to exclude the file name from the output; so the output is brief
- -i tells file to include MIME-type information in the output; this information includes the media type and the character encoding of the file
Firstly, text1.txt is a plain text file with US-ASCII character-set encoding. Secondly, text2.txt is a plain text file with UTF-8 character-set encoding.
3. Using enca
Another way to find file encoding is to use enca. However, enca is not installed by default. So we need to install it first:
$ sudo apt update
$ sudo apt install enca
Now we can use enca:
$ enca -L none text1.txt
7bit ASCII characters
$ enca -L none text2.txt
Universal transformation format 8 bits; UTF-8
In the snippet above, -L determines the language of the input file, which is in English. We should set it to none if it’s English.
Firstly, text1.txt uses 7-bit ASCII, aka US-ASCII, as character-set encoding. Secondly, text2.txt uses UTF-8 character-set encoding.
4. Conclusion
In this brief article, we discussed two methods to find the character encoding of a file in Linux.