1. Overview
For a Linux administrator, manipulating and managing files is a common task. In this tutorial, we’ll discuss how to delete a certain column of a file. To demonstrate, we’ll make use of the awk and cut commands.
2. Understanding the File Structure
Before we begin deleting columns in a file, we first need to understand its structure. For instance, we need to know whether the file columns are separated by a space, comma, or tab delimiter. Knowing this helps to correctly remove a column.
To illustrate, we’ll use two files containing similar information but different delimiters. So, the first file is a space-delimited file named cars.txt. We’ll view it with the cat command:
$ cat cars.txt
Make Model Year Color
Toyota Camry 2022 Blue
Honda Accord 2021 Silver
Ford Mustang 2023 Red
Meanwhile, the second file is a comma-delimited file named cars.csv:
$ cat cars.csv
Make,Model,Year,Color
Toyota,Camry,2022,Blue
Honda,Accord,2021,Red
Ford,Mustung,2023,Silver
In the upcoming sections, we’ll cover how to delete a column from both files.
3. Using the awk Command
awk is a command line tool that we can use for manipulating and analyzing files. In particular, it processes input files based on the rules we provide. This makes it flexible and customizable:
$ awk 'pattern { action }' input_file
This example represents its general syntax. awk allows parameters that help in customizing the output:
- pattern – represents a condition that is checked against each line in the input file
- {action} – represents a set of commands or operations to be performed when the pattern is true
- input_file – represents the name of the file that awk will process
3.1. Deleting a Column From a Space-Delimited File
First, let’s delete the third column in the cars.txt file:
$ awk '{OFS=" "; $3=""; gsub(/[[:space:]]+/, " "); print $0}' cars.txt > updated_cars.txt
Let’s break down this command:
- OFS=” “ – sets the Output Field Separator to a space
- $3=”” – deletes the content of the third column by modifying it to an empty string
- gsub(/[[:space:]]+/, ” “) – replaces one or more whitespace characters (spaces, tabs) with a single space, thus removing extra spaces caused by deleting a column
- print $0 – prints the entire line after its modification
- cars.txt – represents the input file
- > updated_cars.txt – redirects the modified output of the awk command to a new file named updated_cars.txt
Above, we delete the third column in the cars.txt file. Then, we save the modified information in the updated_cars.txt file to prevent overwriting the original file.
Next, let’s check the updated_cars.txt file:
$ cat updated_cars.txt
Make Model Color
Toyota Camry Blue
Honda Accord Silver
Ford Mustang Red
The output above shows that we’ve successfully deleted the third column.
3.2. Deleting a Column From a Comma-Delimited File
Here, let’s delete the first column in the cars.csv file:
$ awk -F',' '{OFS=","; $1=""; sub("^,",""); gsub(",,",","); sub(",$",""); print $0}' cars.csv > updated_cars.csv
Further, let’s break down the command that helps us achieve this:
- -F’,’ – sets the field separator to a comma
- OFS=”,” – defines the Output Field Separator as a comma
- $1=”” – sets the value of the first column to an empty string
- sub(“^,”,””) – ensures there is no comma at the beginning of a line
- gsub(“,,”,”,”) – removes any occurrence of multiple commas and replaces them with a single comma
- sub(“,$”,””) – ensures there is no comma at the end of a line
- print $0 – prints the entire line after it has been modified
- cars.csv – represents the input file
- > updated_cars.csv – used to redirect the output of the awk command to a file named updated_cars.csv
In summary, awk reads the content of the cars.csv file, defines the Output Field Separator as a comma, deletes the first column, and handles any issue regarding commas at the start, middle, and end of each line.
4. Using the cut Command
The cut command is crucial for extracting specific columns or fields from each line of a file. For this reason, it’s useful when working with files organized into columns or fields separated by spaces, tabs, or commas:
$ cut OPTION... [FILE]...
Considering the general syntax above, OPTION determines the behavior of the cut command whereas [FILE] represents the file from which data is extracted.
4.1. Deleting a Column From a Space-Delimited File
To demonstrate, let’s delete the second column in the cars.txt file:
$ cut -d' ' --complement -f2 cars.txt > updated_cars.txt
- -d’ ‘ – specifies the delimiter used in this file as space
- –complement – selects all the fields in the file except the specified one
- -f2 – specifies the column to remove, which in this case is the second one
- cars.txt – represents the input file
- > updated_cars.txt – used to redirect the output of the above command to a new file named updated_cars.txt
Afterward, we can check whether the updated_cars.txt file contains the desired content.
4.2. Deleting a Column From a Comma-Delimited File
Equally important, we can use the cut command to delete a column from our file:
$ cut -d',' --complement -f3 cars.csv > updated_cars.csv
Now, let’s explain the above syntax:
- -d’,’ – specifies the delimiter used, which in this case is a comma
- –complement – this option tells cut to select all the columns except the specified ones
- -f3 – specifies the column to be excluded from the output, which in this case is the third column
- cars.csv – represents the input file
- > updated_cars.csv – used to redirect the output of the cut command to a new file named updated_cars.csv
At this point, the third column is absent in the new updated_cars.csv file.
5. Conclusion
In this article, we discussed how to delete a certain column from a file using the Linux command line. To achieve this, we utilized both the awk and cut commands.