1. Overview

In Linux, files that end with a carriage return and line feed (CRLF) can lead to some troubles when processing.

In this tutorial, we’ll learn how to find those files and convert the line endings to LF.

2. Creating an Example File

First, let’s create a set of example files to test our strategies:

$ mkdir -p /tmp/test_folder

With that, we’ve created our test folder. Let’s fill it with two files ending with CRLF and LF, respectively:

$ printf "Hi \r\n" | tee /tmp/test_folder/crlf_ending{1,2}
Hi
 $ printf "Hi \n" | tee /tmp/test_folder/lf_ending{1,2}
Hi

In the first one-liner, we’ve created two files called crlf_ending1 and crlf_ending2. Both files are filled with the message “Hi ” and followed by a CRLF line ending.

In the second, we’ve made it the same but with LF line endings and their respective filenames.

3. Searching for Files With CRLF Endings

There are a few ways to find files ending with CRLF using Linux commands.

3.1. cat

Let’s start with the cat command:

$ cat -A /tmp/test_folder/{crlf_ending1,lf_ending1}
Hi ^M$
Hi $

Here, we can see the difference in both files by using the -A parameter that makes cat display non-printing characters.

3.2. grep

Now, let’s try with the grep command:

$ grep -rIl -m 1 $'\r' /tmp/test_folder/
/tmp/test_folder/crlf_ending2
/tmp/test_folder/crlf_ending1

Let’s take a look at the arguments:

  • -r, to read the entire folder recursively
  • -I, to ignore binary files
  • -l, to print only the name of the matching file
  • -m 1, to stop reading after the first match

3.3.  file

Additionally, we can use the file command to extract information:

$ file /tmp/test_folder/*
/tmp/test_folder/crlf_ending1: ASCII text, with CRLF line terminators
/tmp/test_folder/crlf_ending2: ASCII text, with CRLF line terminators
/tmp/test_folder/lf_ending1:   ASCII text
/tmp/test_folder/lf_ending2:   ASCII text

3.4. dos2unix

The dos2unix command is a great tool to achieve this task but isn’t always installed in all Linux distributions.

To install it on Debian-based systems, we can type:

$ sudo apt-get install dos2unix

Next, for getting information about the line endings, let’s use a nice feature included with the -i parameter:

$ dos2unix -i /tmp/test_folder/*
       1       0       0  no_bom    text    /tmp/test_folder/crlf_ending1
       1       0       0  no_bom    text    /tmp/test_folder/crlf_ending2
       0       1       0  no_bom    text    /tmp/test_folder/lf_ending1
       0       1       0  no_bom    text    /tmp/test_folder/lf_ending2

Here, in the first and second columns, we can see the number of DOS and Unix line breaks, respectively.

4. Convert CRLF to LF

Now that we’ve learned how to identify files with CRLF line endings, let’s use some tools to convert CRLF to LF.

4.1. sed Command

The sed command is a great tool for text processing. Let’s use it to find and replace the line endings in the crlf_ending1 file:

$ sed 's/\r//' /tmp/test_folder/crlf_ending1 | cat -A -
Hi $

In this example, we’re using the sed command ‘s/\r//’ to find and replace the \r character with an empty value.

Additionally, if we want to edit the file inline, we can use the -i parameter.

Finally, by using the cat command, we can see the final output without the ^M character.

4.2. tr Command

The tr command is a simple and powerful tool that can remove or translate characters.

Let’s use the parameter -d to remove the \r character:

$ tr -d '\r' < /tmp/test_folder/crlf_ending1 | cat -A -
Hi $

4.3. awk Tool

Also, we can use the awk tool to remove the \r character:

$ awk 'gsub(/\r/,"")' /tmp/test_folder/crlf_ending1 | cat -A -
Hi $

Here, we’re using the gsub function to make the replacement. Then, by omitting the action, awk prints the entire record with the substitution.

4.4. Perl

We can also use the Perl interpreter as in our scenario with sed:

$ perl -pe 's/\r//' /tmp/test_folder/crlf_ending1 | cat -A -
Hi $

Let’s take a close look at the parameters:

  • -p, for reading each line
  • -e ‘s/\r//’, to enter the script that will delete the \r character

4.5. dos2unix

Again, we can use the dos2unix tool to keep things simple.

Now, let’s use it in an example file:

$ dos2unix /tmp/test_folder/crlf_ending1
dos2unix: converting file /tmp/test_folder/crlf_ending1 to Unix format...

Let’s take a look at the content of the file:

$ cat -A /tmp/test_folder/crlf_ending1
Hi $

We can see that the CRLF line ending has been converted to LF.

Finally, let’s recover our file:

$ unix2dos /tmp/test_folder/crlf_ending1
unix2dos: converting file /tmp/test_folder/crlf_ending1 to DOS format..

As a final note, if we only want to see the content converted but without actually changing the file, we can use a redirection:

$ dos2unix < /tmp/test_folder/crlf_ending1 | cat - -A
Hi $

4.6. recode

recode is an interesting tool that converts files between character sets.

Let’s use it on our file:

$ recode CP1252...UTF-8 /tmp/test_folder/crlf_ending1

Here, we’ve converted our file from CP1252 (or Windows-1252) encoding to UTF-8.

Now, let’s see the content:

$ cat -A /tmp/test_folder/crlf_ending1
Hi $

Finally, let’s convert our file to the previous encoding:

$ recode UTF-8...CP1252 /tmp/test_folder/crlf_ending1

4.7. Using the Vim Editor

To convert the CRLF line endings to LF with vim, let’s open our file:

$ vim /tmp/test_folder/crlf_ending1

Now, we can type ESC + “:” to enter command mode.

Then, we’ll type set ff=unix and press ENTER.

Finally, let’s press ESC + ZZ to exit and save the file.

Let’s see the content:

$ cat -A /tmp/test_folder/crlf_ending1
Hi $

To recover our file, we can repeat the previous steps but type set ff=dos instead.

4.8. Using the Bash Builtins

Finally, let’s use some bash builtins to convert the line endings:

$ while read line
do 
    echo "${line/$'\r'/}"
done < /tmp/test_folder/crlf_ending1 | cat -A

As a result, we should see:

Hi $

In this scenario, we’ve fed the while loop with our test file. Then, we’ve used parameter expansion to delete the \r character.

5. Find and Convert Files at the Same Time

Now that we know how to find files with CRLF line endings and convert them to LF, we can combine these operations.

First, we can avoid finding files and, instead, apply commands like dos2unix, or sed, directly to an entire folder or pattern:

$ dos2unix /tmp/test_folder/crlf_ending*
dos2unix: converting file /tmp/test_folder/crlf_ending1 to Unix format...
dos2unix: converting file /tmp/test_folder/crlf_ending2 to Unix format...

With sed:

$ sed -i 's/\r//' /tmp/test_folder/crlf_ending*

But, if we want to convert only the files with CRLF endings, we can combine some tools using the xargs command:

$ grep -rIl -m 1 $'\r' /tmp/test_folder/ | xargs -P0 -I {} dos2unix {}

Let’s use another combination:

$ file /tmp/test_folder/* \
    | awk -F : '/CRLF/ && $0=$1' \
    | xargs -P0 -I {} sed -i 's/\r//' {} 

Here, we’re using awk to list only the name of each file containing just CRLF endings.

6. Conclusion

In this article, we’ve learned how to identify files with CRLF line endings.

Then, we’ve looked at how to convert the line endings from CRLF to LF.

And finally, we combine these strategies for finding and converting the files in a one-liner.


« 上一篇: Bash模式匹配介绍
» 下一篇: Linux中的strace命令