1. Overview
We often use tab-separated values (TSV) files for storing or transmitting data. TSV files are column-based.
In this tutorial, let’s explore how to add a new column to a given TSV file in the Linux command line.
2. Introduction to the Problem
As usual, let’s understand the problem through an example. Let’s say we have a TSV file called employee.tsv:
$ cat employee.tsv
Name Gender Age Department
Eric Male 31 Marketing
Kevin Male 32 Sales
Kent Male 33 Development
Amanda Female 30 HR
As the cat output shows, the employee.tsv file has four columns. There are three scenarios if we want to add a new column to this file:
- Inserting a new column before the first column
- Adding a new column after the last column
- Inserting a new column in the middle of the original ones
This tutorial will cover all three cases.
Further, as the awk command is a powerful command-line text-processing utility and good at handling column-based data, we’ll use the awk command to add columns.
Next, let’s see how to add columns using awk in action.
3. Adding a Column Before the First Column
Let’s say we want to add a new column, “ID“, before the first column, “Name“, to the employee.tsv file.
Setting the awk command’s FS and OFS variables allows us to control how to parse records and output results. Since both our input and output are TSV format, we can set the FS and OFS to have the same value: a Tab.
Moreover, as we’re planning to insert a new column before all the other ones, the following awk command will do the job:
awk 'BEGIN{ FS = OFS = "\t" } { print "NewColumn", $0 }' input.tsv
Now, let’s look at our employee.tsv file. First, we want to insert an “ID” column to it.
Further, in the file, we have a header line, so we need to check if the current record is the header line and put the header “ID” there:
$ awk 'BEGIN{ FS = OFS = "\t" } { print (NR==1? "ID" : NR-1), $0 }' employee.tsv
ID Name Gender Age Department
1 Eric Male 31 Marketing
2 Kevin Male 32 Sales
3 Kent Male 33 Development
4 Amanda Female 30 HR
As the awk command above shows, we’re following the command pattern that we’ve concluded and get a new “ID” column at the very front. For simplicity, we’ve assigned their record sequence number as their IDs.
It’s worth mentioning that even though the awk command has produced the expected output, the file content is not changed.
We can use a temp file to save changes in place with awk:
$ awk 'BEGIN{ FS = OFS = "\t" } { print (NR==1? "ID" : NR-1), $0 }' employee.tsv > tmp && mv tmp employee.tsv
$ cat employee.tsv
ID Name Gender Age Department
1 Eric Male 31 Marketing
2 Kevin Male 32 Sales
3 Kent Male 33 Development
4 Amanda Female 30 HR
Next, let’s see how to append a new column to a TSV file:
4. Adding a New Column at the End
We’ve discussed the general command pattern to add a column in the first place to a TSV file. We can exchange the “NewColumn” and “*$0” to append the “NewColumn*“:
awk 'BEGIN{ FS = OFS = "\t" } { print $0, "NewColumn" }' input.tsv
Next, let’s append the “HiringYear” column to our employee.tsv file. We assume all employees are hired in the same year, 2022, for simplicity:
$ awk 'BEGIN{ FS = OFS = "\t" } { print $0, (NR==1? "HiringYear" : "2022") }' employee.tsv > tmp && mv tmp employee.tsv
$ cat employee.tsv
ID Name Gender Age Department HiringYear
1 Eric Male 31 Marketing 2022
2 Kevin Male 32 Sales 2022
3 Kent Male 33 Development 2022
4 Amanda Female 30 HR 2022
Again, in the awk command above, we’ve done the same check to distinguish whether the current record is the header line or a regular data record.
Next, let’s explore how to add a column in the middle.
5. Adding a New Column in the Middle of Original Ones
Let’s say we would like to have a new column “Role” between the columns “Age” and “Department“. Since the new column is in the middle of the columns, we cannot simply use the $0 variable in the awk command as we did before.
An idea to solve the problem is extending the adjacent column value with the new column value separated by a tab. So, if we want to add a new column after the x-th column, the command looks like:
awk 'BEGIN{ FS = OFS = "\t" } { $x = $x FS "NewColumn" }1' input.tsv
Of course, we can extend the column after:
awk 'BEGIN{ FS = OFS = "\t" } { $(x+1) = "NewColumn" FS $(x+1) }1' input.tsv
Now, let’s look at our example. For simplicity, let’s say all employees in the file are in the “Manager” role.
The “Age” column is the fourth column in the original input file in our example. Therefore, we can follow the command pattern we’ve discussed to solve the problem:
$ awk 'BEGIN{ FS=OFS="\t" } {$4 = $4 FS (NR==1? "Role" : "Manager") }1' employee.tsv > tmp && mv tmp employee.tsv
$ cat employee.tsv
ID Name Gender Age Role Department HiringYear
1 Eric Male 31 Manager Marketing 2022
2 Kevin Male 32 Manager Sales 2022
3 Kent Male 33 Manager Development 2022
4 Amanda Female 30 Manager HR 2022
6. Conclusion
In this article, we’ve discussed three scenarios of adding a new column to a TSV file.
Then, we’ve addressed how to do it through examples.