如何删除文件的特定列

1. Overview

For a Linux administrator, manipulating and managing files is a common task. In this tutorial, we’ll discuss how to delete a certain column of a file. To demonstrate, we’ll make use of the awk and cut commands.

2. Understanding the File Structure

Before we begin deleting columns in a file, we first need to understand its structure. For instance, we need to know whether the file columns are separated by a space, comma, or tab delimiter. Knowing this helps to correctly remove a column.

To illustrate, we’ll use two files containing similar information but different delimiters. So, the first file is a space-delimited file named cars.txt. We’ll view it with the cat command:

$ cat cars.txt 
Make Model Year Color
Toyota Camry 2022 Blue
Honda Accord 2021 Silver
Ford Mustang 2023 Red

Meanwhile, the second file is a comma-delimited file named cars.csv:

$ cat cars.csv 
Make,Model,Year,Color
Toyota,Camry,2022,Blue
Honda,Accord,2021,Red
Ford,Mustung,2023,Silver

In the upcoming sections, we’ll cover how to delete a column from both files.

3. Using the awk Command

awk is a command line tool that we can use for manipulating and analyzing files. In particular, it processes input files based on the rules we provide. This makes it flexible and customizable:

$ awk 'pattern { action }' input_file

This example represents its general syntax. awk allows parameters that help in customizing the output:

pattern – represents a condition that is checked against each line in the input file
{action} – represents a set of commands or operations to be performed when the pattern is true
input_file – represents the name of the file that awk will process

3.1. Deleting a Column From a Space-Delimited File

First, let’s delete the third column in the cars.txt file:

$ awk '{OFS=" "; $3=""; gsub(/[[:space:]]+/, " "); print $0}' cars.txt > updated_cars.txt

Let’s break down this command:

OFS=” “ – sets the Output Field Separator to a space
$3=”” – deletes the content of the third column by modifying it to an empty string
gsub(/[[:space:]]+/, ” “) – replaces one or more whitespace characters (spaces, tabs) with a single space, thus removing extra spaces caused by deleting a column
print $0 – prints the entire line after its modification
cars.txt – represents the input file
> updated_cars.txt – redirects the modified output of the awk command to a new file named updated_cars.txt

Above, we delete the third column in the cars.txt file. Then, we save the modified information in the updated_cars.txt file to prevent overwriting the original file.

Next, let’s check the updated_cars.txt file:

$ cat updated_cars.txt 
Make Model Color
Toyota Camry Blue
Honda Accord Silver
Ford Mustang Red

The output above shows that we’ve successfully deleted the third column.

3.2. Deleting a Column From a Comma-Delimited File

Here, let’s delete the first column in the cars.csv file:

$ awk -F',' '{OFS=","; $1=""; sub("^,",""); gsub(",,",","); sub(",$",""); print $0}' cars.csv > updated_cars.csv

Further, let’s break down the command that helps us achieve this:

-F’,’ – sets the field separator to a comma
OFS=”,” – defines the Output Field Separator as a comma
$1=”” – sets the value of the first column to an empty string
sub(“^,”,””) – ensures there is no comma at the beginning of a line
gsub(“,,”,”,”) – removes any occurrence of multiple commas and replaces them with a single comma
sub(“,$”,””) – ensures there is no comma at the end of a line
print $0 – prints the entire line after it has been modified
cars.csv – represents the input file
> updated_cars.csv – used to redirect the output of the awk command to a file named updated_cars.csv

In summary, awk reads the content of the cars.csv file, defines the Output Field Separator as a comma, deletes the first column, and handles any issue regarding commas at the start, middle, and end of each line.

4. Using the cut Command

The cut command is crucial for extracting specific columns or fields from each line of a file. For this reason, it’s useful when working with files organized into columns or fields separated by spaces, tabs, or commas:

$ cut OPTION... [FILE]...

Considering the general syntax above, OPTION determines the behavior of the cut command whereas [FILE] represents the file from which data is extracted.

4.1. Deleting a Column From a Space-Delimited File

To demonstrate, let’s delete the second column in the cars.txt file:

$ cut -d' ' --complement -f2 cars.txt > updated_cars.txt

-d’ ‘ – specifies the delimiter used in this file as space
–complement – selects all the fields in the file except the specified one
-f2 – specifies the column to remove, which in this case is the second one
cars.txt – represents the input file
> updated_cars.txt – used to redirect the output of the above command to a new file named updated_cars.txt

Afterward, we can check whether the updated_cars.txt file contains the desired content.

4.2. Deleting a Column From a Comma-Delimited File

Equally important, we can use the cut command to delete a column from our file:

$ cut -d',' --complement -f3 cars.csv > updated_cars.csv

Now, let’s explain the above syntax:

-d’,’ – specifies the delimiter used, which in this case is a comma
–complement – this option tells cut to select all the columns except the specified ones
-f3 – specifies the column to be excluded from the output, which in this case is the third column
cars.csv – represents the input file
> updated_cars.csv – used to redirect the output of the cut command to a new file named updated_cars.csv

At this point, the third column is absent in the new updated_cars.csv file.

5. Conclusion

In this article, we discussed how to delete a certain column from a file using the Linux command line. To achieve this, we utilized both the awk and cut commands.

Persistence

REST

Security