1. Overview
Linux provides various utilities for processing file contents and output from commands. A very useful one among these is the cut command.
In this tutorial, we’ll see how we can use the cut command to slice files and command output.
2. Basics
The cut command is a command-line utility for cutting sections from each line of a file. It writes the result to the standard output.
It’s worth noting that it does not modify the file, but only works on a copy of the content.
Although typically the input to a cut command is a file, we can pipe the output of other commands and use it as input.
3. Slicing by Bytes
First, let’s see how we can slice the data in a file by byte.
Let’s suppose we have a file of employee records, employee_data.txt:
Name Age Department
John Smith 36 HR
John Wayne 48 Finance
Edward King 40 Finance
Stephen Fry 50 IT
The individual fields above are separated by the tab character.
To slice by bytes, we’ll use the -b or –bytes option:
$ cut -b 2 employee_data.txt
This will print the second byte from each line in the file:
a
o
o
d
t
Here, we’re not restricted to slicing by a single byte. Consequently, we can select multiple bytes from each line.
For example, we can slice by the 3rd, 5th, and 8th bytes simultaneously using the “*,”* separator:
$ cut -b 3,5,8 employee_data.txt
m e
h i
h y
wrK
eh
We can also specify a range, using the “-“ separator:
$ cut -b 2-5 employee_data.txt
ame
ohn
ohn
dwar
teph
It’s worth noting that we can omit the starting position or the ending position while specifying the range. So, “-5” will select all bytes from the first position to the 5th position. And, “5-“ will select all bytes from the 5th position to the end of the line.
As mentioned above, apart from files, we can also pipe output from other Linux commands as input to the cut command:
$ echo slicing example | cut -b 3-7
icing
4. Slicing by Characters
For slicing by character, we’ll use the -c or –characters option.
It’s similar to slicing by byte, except that it uses the character position rather than the byte position.
So, if a character uses multiple bytes, the output will include the whole character instead of a byte from the character.
Let’s look at an example:
$ echo spéciale | cut -c 3
é
$ echo spéciale | cut -b 3
?
$ echo spéciale | cut -b 3,4
é
Note that ? is printed by the second command above as the first byte of the two-byte character is not printable.
It’s worth noting that tabs and backspaces are treated as a character.
5. Slicing by Fields
Now, let’s see how we can slice file data by field.
Let’s say we want to list only the names of all the employees from the file. We can do this by slicing the file data by the first field in the file using the -f or –fields option:
$ cut -f 1 employee_data.txt
Here, we’ve used the -f option of the cut command and sliced the input using 1 as the field number:
Name
John Smith
John Wayne
Edward King
Stephen Fry
Above, we’re assuming that the fields in the file are separated using the tab delimiter. But, we can override this behavior by using the -d or –delimiter option to specify a different delimiter:
$ cut -d " " -f 2 employee_data.txt
Here, we’ve used the -d option to specify space as the delimiter. Also, we’re slicing the data using field number 2.
Now, let’s look at the output:
Smith 36 HR
Wayne 48 Finance
King 40 Finance
Fry 50 IT
It’s worth noting that the output includes part of the earlier first field and all the rest of the fields. This is because tab is now treated like any other character, and there are no spaces in any of the other fields. Similarly, the first line is blank because it does not contain any spaces.
As with the other options, we can select multiple fields using the “,” separator:
$ cut -f 1,3 employee_data.txt
Name Department
John Smith HR
John Wayne Finance
Edward King Finance
Stephen Fry IT
And, we can select a range of fields using the “-“ separator:
$ cut -f 2- employee_data.txt
Age Department
36 HR
48 Finance
40 Finance
50 IT
The above command will output all fields from the second field onwards.
By default, the cut command prints all lines from the input, even if the delimiter is not present. But, we can alter this behavior using -s or –only-delimited. Using this option, we can tell the cut command not to print the lines that don’t have the delimiter.
6. Other Options
Now, let’s look at other options that can be used with the above slicing methods.
When we use “,” to specify multiple bytes/characters/fields, the cut command concatenates the output without using a delimiter. But, we can add a custom delimiter using the –output-delimiter option:
$ echo slicing example | cut -c 2-5,9,11-13 --output-delimiter=@
This will add the delimiter character ‘@’ between each part of the output:
lici@e@amp
Another interesting option is –complement. This will print everything except the content at the specified position.
Let’s look at an example:
$ echo slicing example | cut -c 5-10 --complement
slicample
As we can see, the output includes all characters except the ones between positions 5 and 10.
7. Conclusion
In this article, we saw examples of using the cut command. This command can be a useful tool for extracting data from files, or outputs of other commands.