1. Introduction
While opening certain files in Vim we may see a ^M character. The presence of this character can be traced back to the early era of typewriters.
In this tutorial, we’ll delve into a brief history of CR (carriage return) and LF (line feed) characters, understand the different conventions followed for line endings, learn how Vim deals with them, and explore ways to manage the ^M character.
2. The Typewriter Era
In old typewriters, two operations were required when the print head reached the end of the line. First, the paper is moved to the next line which is known as a Line Feed (LF). The second is to move the carriage with the print head to the leftmost position identified as a Carriage Return (CR).
In teletypewriters, the repositioning of the print head is automated. On reaching the end of the line they encountered some delay in repositioning their printing heads to the leftmost position. Hence they needed some time before they were ready for the next line. The different applications that controlled teleprinters managed this delay differently.
Some inserted more characters for new lines like CR and LF or more NUL characters to give the printer head enough time to reach its designated position. On the other hand, some utilized the device drivers to take care of the movement of heads and not to print anything during this action. This solution required only one LF character to indicate the line ending.
3. Line Endings Across Operating Systems
As computing systems evolved, the minicomputers took cue from these conventions. Of those, Multics by Bell Labs and CP/M by IBM, are the ones that paved the basement for modern operating systems.
Multics, which originated at Bell Labs, decided to use only LF for line endings. Later, Unix decided to follow this, considering the simplicity, while IBM’s CP/M used a combination of CR+LF for line endings, and so did their descendants, DOS and Windows. The pre-OSX Mac decided to use only CR for line endings. But later, with Mac OSX, this practice shifted to using LF.
This created a lot of interoperability problems while working with text files between these operating systems. Bash scripts edited in Windows machines don’t run properly on Linux. Likewise, text files created in Linux show up unformatted on Windows.
4. Viewing Files in Vim
In Linux, Vim anticipates Unix format line endings when opening text files. However, files edited in Windows typically contain CR+LF line endings. When Vim opens such files, the additional CR characters are perceived as part of the text and are displayed as the ^M character. This is the reason we see this character.
4.1. Hide ^M Character
Even though Vim doesn’t translate the CR+LF line endings correctly by default, we can configure Vim to recognize them. This is done by setting the file format option.
If we see a ^M character after we open any file then we can set the fileformats option to DOS and reopen the file.
Let’s see how this can be done:
:set fileformats=dos
:e
Here we set the file format to DOS with the first command. Then, using the second command, we reload the file.
Upon reloading the file, the ^M character stays invisible. And now while editing Vim inserts CR+LF for line endings. Thus, it preserves the file format. This is the method we should follow if we need to keep the file format intact.
Closing Vim clears this file format settings. If we need to persist this setting we can add an entry in the ~/.vimrc file:
set fileformats=unix,dos,mac
This makes Vim recognize the file formats correctly.
4.2. Remove ^M Character
If we want to remove all of the CR characters we can simply use the search and replace option in Vim:
:%s/^M//g
Here we are using the substitute command in Vim to replace the ^M character. To enter the ^M character here, we need to follow a specific method. First press the Control+V key combination, then the Control+M key sequence. On the contrary, if we enter it as two separate characters ‘^’ and ‘M’ it will not work.
One other way, which is easier, is to use the below format:
:%s/\r//g
This also removes all the CR characters in the file.
In some cases, we see the file doesn’t have newlines and the whole text comes in a single line. This typically happens when a lone CR is considered as the line ending. In this scenario, we need to replace the CR with the LF character.
Let’s look at that also:
:%s/\r/\r/g
Here we search for the CR character and replace it with the CR character itself. Even though the replacement character is CR Vim treats it as LF while substituting it. This fixes the line endings issue in the file.
5. Other Options
Let’s explore additional alternatives for eliminating the ^M character. These methods are particularly useful when handling a large number of files and require script-based solutions.
5.1. Using the dos2unix Command
The dos2unix application is designed to convert the line endings to LF format.
Let’s examine how to utilize this tool:
$ dos2unix -n win.txt lin.txt
$ dos2unix win.txt
The first command uses the -n (newfile) option to convert the win.txt file and saves the output to lin.txt. The second command directly modifies the input file. Therefore, if we need a backup of the original file we should choose the first command.
5.2. Using the sed Command
We can use the Stream Editor tool to remove CR characters from a file.
Let’s see an example:
$ sed -i 's/\r//g' win.txt
Let’s understand the different options used:
- -i: In-place editing, meaning it directly modifies the file specified
- ‘s/\r//g’: performs a global substitution by searching for carriage return characters (\r) and replaces them with nothing (essentially removing them)
5.3. Using the tr Command
We can use the tr command, typically used for character translation or deletion, to eliminate the CR character.
We can use the delete option:
$ tr -d "\r" < win.txt > lin.txt
The command takes input from win.txt, deletes CR character, and redirects the output to lin.txt.
6. Conclusion
In this article, we’ve explored a concise history of different kinds of line endings in files, examined how Vim displays the CR character, and learned approaches to handle this particular character.