在Linux中向制表符分隔的文件添加一列值

1. Overview

We often use tab-separated values (TSV) files for storing or transmitting data. TSV files are column-based.

In this tutorial, let’s explore how to add a new column to a given TSV file in the Linux command line.

2. Introduction to the Problem

As usual, let’s understand the problem through an example. Let’s say we have a TSV file called employee.tsv:

$ cat employee.tsv
Name    Gender  Age Department
Eric    Male    31  Marketing
Kevin   Male    32  Sales
Kent    Male    33  Development
Amanda  Female  30  HR

As the cat output shows, the employee.tsv file has four columns. There are three scenarios if we want to add a new column to this file:

Inserting a new column before the first column
Adding a new column after the last column
Inserting a new column in the middle of the original ones

This tutorial will cover all three cases.

Further, as the awk command is a powerful command-line text-processing utility and good at handling column-based data, we’ll use the awk command to add columns.

Next, let’s see how to add columns using awk in action.

3. Adding a Column Before the First Column

Let’s say we want to add a new column, “ID“, before the first column, “Name“, to the employee.tsv file.

Setting the awk command’s FS and OFS variables allows us to control how to parse records and output results. Since both our input and output are TSV format, we can set the FS and OFS to have the same value: a Tab.

Moreover, as we’re planning to insert a new column before all the other ones, the following awk command will do the job:

awk 'BEGIN{ FS = OFS = "\t" } { print "NewColumn", $0 }' input.tsv

Now, let’s look at our employee.tsv file. First, we want to insert an “ID” column to it.

Further, in the file, we have a header line, so we need to check if the current record is the header line and put the header “ID” there:

$ awk 'BEGIN{ FS = OFS = "\t" } { print (NR==1? "ID" : NR-1), $0 }' employee.tsv
ID    Name    Gender  Age Department
1    Eric    Male    31  Marketing
2    Kevin   Male    32  Sales
3    Kent    Male    33  Development
4    Amanda  Female  30  HR

As the awk command above shows, we’re following the command pattern that we’ve concluded and get a new “ID” column at the very front. For simplicity, we’ve assigned their record sequence number as their IDs.

It’s worth mentioning that even though the awk command has produced the expected output, the file content is not changed.

We can use a temp file to save changes in place with awk:

$ awk 'BEGIN{ FS = OFS = "\t" } { print (NR==1? "ID" : NR-1), $0 }' employee.tsv > tmp && mv tmp employee.tsv 

$ cat employee.tsv
ID    Name    Gender  Age Department
1    Eric    Male    31  Marketing
2    Kevin   Male    32  Sales
3    Kent    Male    33  Development
4    Amanda  Female  30  HR

Next, let’s see how to append a new column to a TSV file:

4. Adding a New Column at the End

We’ve discussed the general command pattern to add a column in the first place to a TSV file. We can exchange the “NewColumn” and “*$0” to append the “NewColumn*“:

awk 'BEGIN{ FS = OFS = "\t" } { print $0, "NewColumn" }' input.tsv

Next, let’s append the “HiringYear” column to our employee.tsv file. We assume all employees are hired in the same year, 2022, for simplicity:

$ awk 'BEGIN{ FS = OFS = "\t" } { print $0, (NR==1? "HiringYear" : "2022") }' employee.tsv > tmp && mv tmp employee.tsv

$ cat employee.tsv
ID    Name    Gender  Age Department    HiringYear
1    Eric    Male    31  Marketing    2022
2    Kevin   Male    32  Sales   2022
3    Kent    Male    33  Development    2022
4    Amanda  Female  30  HR        2022

Again, in the awk command above, we’ve done the same check to distinguish whether the current record is the header line or a regular data record.

Next, let’s explore how to add a column in the middle.

5. Adding a New Column in the Middle of Original Ones

Let’s say we would like to have a new column “Role” between the columns “Age” and “Department“. Since the new column is in the middle of the columns, we cannot simply use the $0 variable in the awk command as we did before.

An idea to solve the problem is extending the adjacent column value with the new column value separated by a tab. So, if we want to add a new column after the x-th column, the command looks like:

awk 'BEGIN{ FS = OFS = "\t" } { $x = $x FS "NewColumn" }1' input.tsv

Of course, we can extend the column after:

awk 'BEGIN{ FS = OFS = "\t" } { $(x+1) = "NewColumn" FS $(x+1) }1' input.tsv

Now, let’s look at our example. For simplicity, let’s say all employees in the file are in the “Manager” role.

The “Age” column is the fourth column in the original input file in our example. Therefore, we can follow the command pattern we’ve discussed to solve the problem:

$ awk 'BEGIN{ FS=OFS="\t" } {$4 = $4 FS (NR==1? "Role" : "Manager") }1' employee.tsv > tmp && mv tmp employee.tsv 

$ cat employee.tsv
ID    Name    Gender    Age    Role    Department    HiringYear
1    Eric    Male    31    Manager    Marketing    2022
2    Kevin    Male    32    Manager    Sales    2022
3    Kent    Male    33    Manager    Development    2022
4    Amanda    Female    30    Manager    HR    2022

6. Conclusion

In this article, we’ve discussed three scenarios of adding a new column to a TSV file.

Then, we’ve addressed how to do it through examples.

Persistence

REST

Security