1. Introduction
A Comma-Separated Value (CSV) file consists of columns, separated by a comma. Sometimes, converting such CSV files to Tab-Separated Value (TSV) files may be necessary for use by different applications.
In this tutorial, we’ll explore multiple solutions for converting a CSV file to a TSV file in Linux.
2. Example Files and Results
Let’s check the content of our example CVS file:
$ cat input_file.csv
Username,Identifier,First_name,Last_name
james23,9012,James,Booker
greyjay07,2080,Laura,Grey
johnson99,4344,Craig,Johnson
jenkins36,9315,Mary,Jenkins
smith25,2311,Jamie,Smith
With the above CSV, let’s see what our expected TSV file would be:
$ cat output_file.csv
Username Identifier First_name Last_name
james23 9012 James Booker
greyjay07 2080 Laura Grey
johnson99 4344 Craig Johnson
jenkins36 9315 Mary Jenkins
smith25 2311 Jamie Smith
Since this is a simple CSV file, the result will be the same in all the sections.
Let’s take a look at a more complex example, involving quotes, which surround values that contain the separator:
$ cat input_quote.csv
"First_name","Last_name","Address"
"James","Booker","93 NORTH 9TH STREET, BROOKLYN NY"
"Laura","Grey","80 WESTMINSTER ST, PROVIDENCE RI"
"Craig","Johnson","177 MAIN STREET, LITTLETON NH"
"Mary","Jenkins","520 5TH AVE, MCKEESPORT PA"
"Jamie","Smith","41 N 4TH ST, ALLENTOWN PA"
In this example, we need to skip the comma in the address column.
Notably, directly replacing commas with tabs without parsing the CSV can give us wrong results. Still, some of the simpler methods described here don’t support this formation. So, some of the methods would break quoted column values with a comma.
3. Using tr
The tr command is a Unix utility used to translate or delete characters. tr can also convert CSV to TSV by replacing commas with tabs.
tr is a part of the coreutils package. Let’s install the package via yum:
$ yum install -y coreutils
Then, we can use tr to convert our file:
$ tr ',' '\t' < input_file.csv
In this example, we replace commas with tabs in input_file.csv.
Moreover, we can use > redirection to print the result to another file:
$ tr ',' '\t' < input_file.csv > output_file.tsv
Here, output_file.tsv is our output file. Moreover, we can also use tr to convert any character into another character, thus replacing the separators.
Critically, tr doesn’t support quote formatting and would break or incorrectly replace quoted column values with a separator.
4. Using sed
sed is a powerful text editor that can perform various text manipulations. We can use sed to parse and transform text files.
First, let’s install sed:
$ yum install -y sed
Then, we can start our converting:
$ sed 's/,/\t/g' input_file.csv > output_file.tsv
In this example, we used several options:
- s – a substitution command
- , – the pattern for our search
- t – replacement text, Tab character
- g – global replacement
We also used > to specify our output file.
Again, sed doesn’t support quote formatting, so our more complex example would break.
5. Using awk
awk is a powerful text processing tool designed for text processing and data extraction.
First, let’s use yum to install awk:
$ yum install -y gawk
Then, we can convert our sample CSV file to a TSV file:
$ awk 'BEGIN {FS=","; OFS="\t"} {$1=$1; print}' input_file.csv > output_file.tsv
In this example, we use FS to set the field separator to a comma character. Moreover, we use OFS to set the output field separator to the Tab character. Then, we print each line using tabs as separators. Afterward, we include the {$1=$1; print} section to force awk to use the new field separator. Finally, at the end of the command, we specify the input file and an output file.
Quote formatting isn’t supported by the awk command. awk would break quoted column values with a comma.
6. Using csvtool
csvtool is a command-line tool that can be used to manipulate CSV files.
Let’s take our more complex input file with quote formatting:
$ cat input_file.csv
"First_name","Last_name","Work_Days"
"James","Booker","Mon,Wed,Fri"
"Laura","Grey","Mon,Sat"
"Craig","Johnson","Tue,Wed"
"Mary","Jenkins","Fri,Sat"
"Jamie","Smith","Mon,Sun,Wed"
First, let’s use yum to install csvtool:
$ yum install csvtool
Once installed, we can use csvtool for our converting:
$ csvtool -t COMMA -u TAB cat input_file.csv
First_name Last_name Work_Days
James Booker Mon,Wed,Fri
Laura Grey Mon,Sat
Craig Johnson Tue,Wed
Mary Jenkins Fri,Sat
Jamie Smith Mon,Sun,Wed
The csvtool supports quote formatting, so we got our desired result.
In this example, we used several options:
- t – input file uses commas as field separator
- u – output file uses tabs as field separators
- cat – reads the contents of the input file
Moreover, we can use > output_file.tsv to save the output of csvtool to an output file.
7. Using Python
Python is a programming language that can be used for various tasks, including text processing.
First, let’s install Python via yum:
$ yum install -y python3
Then, we use the Python csv module to convert the CSV file to a TSV file:
$ cat csv2tsv.py
import argparse
import csv
parser = argparse.ArgumentParser(description='Convert a CSV file to a TSV file.')
parser.add_argument('input_file', help='the input CSV file')
args = parser.parse_args()
with open(args.input_file, 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('output_file.tsv', 'w') as tsv_file:
tsv_writer = csv.writer(tsv_file, delimiter='\t')
for line in csv_reader:
tsv_writer.writerow(line)
Firstly, this code opens the CSV file using the csv module. Secondly, we read the contents of the input CSV file and parse it into individual rows. Finally, we write those rows to a new TSV file using tabs as the delimiter.
Now, let’s check the content of our input CSV quote formatting:
$ cat input_file.csv
"First_name","Last_name","Work_Days","Address"
"James","Booker","Mon,Wed,Fri","93 NORTH 9TH STREET, BROOKLYN NY"
"Laura","Grey","Mon,Sat","80 WESTMINSTER ST, PROVIDENCE RI"
"Craig","Johnson","Tue,Wed","177 MAIN STREET, LITTLETON NH"
"Mary","Jenkins","Fri,Sat","520 5TH AVE, MCKEESPORT PA"
"Jamie","Smith","Mon,Sun,Wed","41 N 4TH ST, ALLENTOWN PA"
Here, we have comma characters in the third and fourth columns.
Now, we’ll test our script to convert our CSV file:
$ python csv2tsv.py input_file.csv
Here, we get the input file name from the argument.
Finally, we can check our output file:
$ cat -T output_file.tsv
First_name^ILast_name^IWork_Days^IAddress
James^IBooker^IMon,Wed,Fri^I93 NORTH 9TH STREET, BROOKLYN NY
Laura^IGrey^IMon,Sat^I80 WESTMINSTER ST, PROVIDENCE RI
Craig^IJohnson^ITue,Wed^I177 MAIN STREET, LITTLETON NH
Mary^IJenkins^IFri,Sat^I520 5TH AVE, MCKEESPORT PA
Jamie^ISmith^IMon,Sun,Wed^I41 N 4TH ST, ALLENTOWN PA
In this example, we used the -T option of the cat command to show the Tab character as ^I. As we can see, commas between quotes are ignored and the output is as expected.
8. Using Perl
Perl is a scripting language that is often used for text processing. For converting CSV in Perl, we use the Text::CSV module.
First, let’s install Perl:
$ yum install -y perl
Then, we use cpan to install the Text::CSV module:
$ cpan install Text::CSV
Afterward, let’s see how we can convert CSV to TSV with Perl:
$ cat csv2tsv.pl
#!/usr/bin/perl
use Text::CSV;
use Getopt::Long;
my $input_file = 'input_file.csv';
my $output_file = 'output_file.tsv';
GetOptions('input=s' => \$input_file, 'output=s' => \$output_file);
my $csv = Text::CSV->new({ sep_char => ',' });
open(my $input, '<', $input_file) or die "Can't open input file: $!"; open(my $output, '>', $output_file) or die "Can't open output file: $!";
while (my $row = $csv->getline($input)) {
print $output join("\t", @$row), "\n";
}
close($input);
close($output);
In this example, we open the input and output files and read each row of the input CSV file. Then, we concatenate the fields using a Tab character as the separator. Next, we write the result string to the output file. Finally, we close both files to ensure that all changes are saved.
Now, let’s see the content of our input CSV file which is in quote formatting:
$ cat input_file.csv
"First name","Last name","Work Days","Available Months","Address",
"James","Booker","Mon,Wed,Fri","Jan,Feb","93 NORTH 9TH STREET, BROOKLYN NY"
"Laura","Grey","Mon,Sat","Feb,Mar,Apr","80 WESTMINSTER ST, PROVIDENCE RI"
"Craig","Johnson","Tue,Wed","Jan,May,Dec","177 MAIN STREET, LITTLETON NH"
"Mary","Jenkins","Fri,Sat","Feb,Mar,Apr","520 5TH AVE, MCKEESPORT PA"
"Jamie","Smith","Mon,Sun,Wed","Apr,May,Dec","41 N 4TH ST, ALLENTOWN PA"
In this example, we have three columns that contain comma characters. So, we need to skip values inside all quoted columns.
Finally, let’s run our Perl script to start converting:
$ perl csv2tsv.pl input_file.csv
Here, we used input_file.csv as an argument of our script.
Now, we can check our output:
$ cat -T output_file.tsv
First Name^ILast Name^IWork Days^IAvailable Months^IAddress^I
James^IBooker^IMon,Wed,Fri^IJan,Feb^I93 NORTH 9TH STREET, BROOKLYN NY
Laura^IGrey^IMon,Sat^IFeb,Mar,Apr^I80 WESTMINSTER ST, PROVIDENCE RI
Craig^IJohnson^ITue,Wed^IJan,May,Dec^I177 MAIN STREET, LITTLETON NH
Mary^IJenkins^IFri,Sat^IFeb,Mar,Apr^I520 5TH AVE, MCKEESPORT PA
Jamie^ISmith^IMon,Sun,Wed^IApr,May,Dec^I41 N 4TH ST, ALLENTOWN PA
Again, we used -T to check the exact position of the Tab character. As we can see, the result is as expected.
9. Conclusion
In this article, we explored several solutions for converting CSV files to TSV files in Linux. The solutions discussed include using the tr, sed, awk, and csvtool commands, as well as Python and Perl scripts.
Some of them support quote formatting and some don’t. Of course, all these solutions can be used based on our personal preferences and the input data.