1. Overview
As Linux users, we often perform various operations on our files. One of the more common operations is delimiter conversion. For example, we may wish to convert a tab-delimited file to Comma Separated Values (CSV) in order to use it with an application that needs that format.
In this tutorial, we’ll look at various ways to accomplish this using bash.
2. Setting up an Example
Let’s create a sample file input.txt with tabs in it:
str1 str2
str3 str4
str5 str6
str7 str8
And then let’s check that this file has the right content:
$ cat --show-tabs input.txt
str1^Istr2
str3^I^Istr4
str5^I^I^I^Istr6
str7^I^I^Istr8
In the above example, we’ve used the –show-tabs option of the cat command. This option displays the TAB character as ^I.
3. Using the tr Command
We can use tr manipulate a file when we want to translate or delete characters. Let’s use it to convert the TAB characters to commas:
$ cat input.txt | tr -s "\\t" "," > output.txt
In this example, -s represents the squeeze-repeats operation, which we’ve used to replace multiple TAB characters with a single comma.
Let’s check the result:
$ cat output.txt
str1,str2
str3,str4
str5,str6
str7,str8
We should note that although the file had multiple tab delimiters, tr has converted them to single commas in each case.
4. Using the awk Command
The awk command is an interpreter for the AWK programming language. It allows us to perform complex text processing using concise code. We can use its string manipulation functions to achieve the desired results:
$ awk '{ gsub(/[\t]/,","); print }' input.txt > output.txt
$ cat output.txt
str1,str2
str3,,str4
str5,,,,str6
str7,,,str8
In the above example, we used a regular expression with the gsub function. This converted each tab into a separate comma. We could, if we prefer, use the expression gsub(/[\t]+/,”,”); to substitute multiple TAB characters.
5. Using the sed Command
sed is a stream editor for filtering and transforming text. It allows us to perform text processing in a non-interactive way. We can use its substitute command for converting TABs to commas:
$ sed 's/\t\+/,/g' input.txt > output.txt
$ cat output.txt
str1,str2
str3,str4
str5,str6
str7,str8
In this example, we used a regular expression with the sed command. Here we’ve chosen to replace multiple tabs using the \t\+ regular expression.
6. Conclusion
In this article, we discussed some of the common ways of converting tab-delimited files to CSV.
First, we used the tr command. Then we saw how to use sed and awk with regular expressions. We also looked at whether we wanted to convert all TAB characters to a single comma, or whether we wished to preserve the blank columns and convert each TAB character individually.