1. Introduction

Columns are vertical divisions in tables or spreadsheets that organize data into categories, making it easier to read, analyze, and manipulate. Each column represents specific information, such as names or dates. Furthermore, they help structure data, allowing for sorting, filtering, and effective analysis through calculations and visualizations.

Columns maintain data consistency, enable data validation and conditional formatting, and support pivot tables for concise analysis. Overall, columns are essential for organizing and managing data efficiently.

Using Linux commands to extract specific columns from data files or outputs streamlines data extraction and analysis, ensuring focus on relevant information. Moreover, this enhances clarity, simplifies analysis, and supports automation, saving time on repetitive tasks.

In this tutorial, we’ll use Linux commands to extract rows based on specific values in columns.

First, we’ll start by exploring the grep command to extract specific lines from a file’s column. Next, we’ll delve into the sed command to achieve similar results. Lastly, we’ll investigate how awk can be used to extract our desired rows on specific values in columns from the file.

2. Dataset and Expected Results

Let’s begin by inspecting the contents of the files using the cat command before executing the main command:

$ cat myfile.txt
ID,Type,Code,Value1,Value2,Description
3,ST,ST01,3,3,856
3,ST,ST02,4.5,9,LOKcast
6,3N1,N101,2.1,3,ST
6,N1,a02,1,60.2,Comcast
6,N1B,N103,1,2,92
2,ST,ST03,5.7,10,2024-06-08
4,N2,N201,aBc.3,5,Skylook_48
5,ST,ST04,2,2.7,20-6-20
7,N3,N301,4.5.00,8,patternMiddleEnd

After executing the Linux commands, the expected results will be displayed in the following column. We’ll be retrieving rows where the second column includes the pattern ST:

3       ST      ST01    3       3       856
3       ST      ST02    4.5     9       LOKcast
2       ST      ST03    5.7     10      2024-06-08
5       ST      ST04    2       2.7     20-6-20

Next, let’s continue with the tutorial to explore different methods to achieve similar results.

3. Using the grep Command

To begin with, we started by looking at the grep command to extract our desired rows on specific values in columns. We’ll be extracting rows that contain the pattern ST in the second column.

The grep command in Linux is used to search for specific patterns or strings within files or output. It’s commonly used to filter and display lines that match a given pattern.

Now, let’s view the code to get the lines in a column:

$ grep -E '^\S+,ST,.+' myfile.txt | sed 's/,/\t/g'
3       ST      ST01    3       3       856
3       ST      ST02    4.5     9       LOKcast
2       ST      ST03    5.7     10      2024-06-08
5       ST      ST04    2       2.7     20-6-20

The above command is a concise and powerful way to process and transform text data in Linux. It begins with grep, which searches the file myfile.txt for lines matching specific patterns, lines where the second field, separated by commas, contains the string ST. The -E flag enables extended regular expressions for a more complex search pattern.

Once grep finds these lines, it passes them through a pipe | to sed, the stream editor. In the sed command, ‘s/,/\t/g’ performs a substitution operation globally g flag within each line. It replaces all commas (,) with tab characters (\t), effectively converting the comma-separated values or CSV into tab-separated values or TSV.

Overall, this command scans myfile.txt to find lines with the specified criteria where the second field includes ST. It then transforms the output by substituting commas with tabs, making the data easier to handle or present as tab-separated values instead of comma-separated values.

4. Using the sed Command

Another method to extract rows on specific values in columns specific columns from a file is by using the sed command.

The sed command, short for stream editor, is used in Linux for parsing and transforming text in files or input streams. It allows us to perform basic text manipulation tasks like search and replace, insertion, deletion, and more, using regular expressions. Additionally, we often use it for batch processing of text files, automating editing tasks, and making quick in-place modifications without opening a text editor.

Next, let’s look into the code and view it’s working:

$ sed -n '/,ST,/s/,/\t/gp' myfile.txt
3       ST      ST01    3       3       856
3       ST      ST02    4.5     9       LOKcast
2       ST      ST03    5.7     10      2024-06-08
5       ST      ST04    2       2.7     20-6-20

The command sed -n ‘/,ST,/s/,/\t/gp’ myfile.txt uses sed to search for specific patterns in a file and perform substitutions on matching lines.

Now, let’s look at how sed works as a stream editor for parsing and transforming text. The -n option suppresses the automatic printing of the pattern space, so it will only print lines that are explicitly specified. The script /,ST,/s/,/\t/gp matches lines containing the substring ,ST, , replaces all commas with tab characters using the substitution command s, and the g flag ensures that it replaces all occurrences of the comma on each matching line. Finally, the p command prints the modified lines.

5. Using the awk Command

**Alternatively, we can achieve the same outcome by employing the awk command to extract rows on specific values in columns from a file.
**

The awk command processes and extracts text in Linux. Users commonly parse and manipulate structured data with it. Thus, making it a powerful tool for data analysis and text processing tasks.

Now, let’s dive into the code and see how it operates:

$ awk -F',' '$2 ~ /ST/ { print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6 }' myfile.txt
3       ST      ST01    3       3       856
3       ST      ST02    4.5     9       LOKcast
2       ST      ST03    5.7     10      2024-06-08
5       ST      ST04    2       2.7     20-6-20

The awk command provided processes a text file named myfile.txt that contains comma-separated values or CSV. It filters out lines where the second field $2 contains the substring ST. After that, it uses -F’,’ to specify that fields are separated by commas. When a line matches the pattern and the second field contains ST. It executes the action block within {} and prints the first six fields of that line separated by tabs \t.

Therefore, this breakdown helps us understand how awk efficiently handles pattern matching and text processing tasks, making it a versatile tool for data manipulation in Unix-like environments.

6. Conclusion

In this tutorial, we explored various methods for getting rows based on specific values in columns using Linux commands.

We started by combining the grep and sed commands to extract specific lines from a file’s column and format them. Following that, we delved into the sed command to achieve similar results independently. Finally, we employed awk to extract our desired rows on specific values in columns directly from the file.