1. Overview

Linux provides various utilities for processing file contents and output from commands. A very useful one among these is the cut command.

In this tutorial, we’ll see how we can use the cut command to slice files and command output.

2. Basics

The cut command is a command-line utility for cutting sections from each line of a file. It writes the result to the standard output.

It’s worth noting that it does not modify the file, but only works on a copy of the content.

Although typically the input to a cut command is a file, we can pipe the output of other commands and use it as input.

3. Slicing by Bytes

First, let’s see how we can slice the data in a file by byte.

Let’s suppose we have a file of employee records, employee_data.txt:

Name    Age    Department
John Smith    36    HR
John Wayne    48    Finance
Edward King    40    Finance
Stephen Fry    50    IT

The individual fields above are separated by the tab character.

To slice by bytes, we’ll use the -b or –bytes option:

$ cut -b 2 employee_data.txt

This will print the second byte from each line in the file:

a
o
o
d
t

Here, we’re not restricted to slicing by a single byte. Consequently, we can select multiple bytes from each line.

For example, we can slice by the 3rd, 5th, and 8th bytes simultaneously using the “*,”* separator:

$ cut -b 3,5,8 employee_data.txt
m    e
h i
h y
wrK
eh

We can also specify a range, using the “-“ separator:

$ cut -b 2-5 employee_data.txt
ame 
ohn 
ohn 
dwar
teph

It’s worth noting that we can omit the starting position or the ending position while specifying the range. So, “-5” will select all bytes from the first position to the 5th position. And, “5-“ will select all bytes from the 5th position to the end of the line.

As mentioned above, apart from files, we can also pipe output from other Linux commands as input to the cut command:

$ echo slicing example | cut -b 3-7
icing

4. Slicing by Characters

For slicing by character, we’ll use the -c or –characters option.

It’s similar to slicing by byte, except that it uses the character position rather than the byte position.

So, if a character uses multiple bytes, the output will include the whole character instead of a byte from the character.

Let’s look at an example:

$ echo spéciale | cut -c 3
é
$ echo spéciale | cut -b 3
?
$ echo spéciale | cut -b 3,4
é

Note that ? is printed by the second command above as the first byte of the two-byte character is not printable.

It’s worth noting that tabs and backspaces are treated as a character.

5. Slicing by Fields

Now, let’s see how we can slice file data by field.

Let’s say we want to list only the names of all the employees from the file. We can do this by slicing the file data by the first field in the file using the -f or –fields option:

$ cut -f 1 employee_data.txt

Here, we’ve used the -f  option of the cut command and sliced the input using 1 as the field number:

Name
John Smith
John Wayne
Edward King
Stephen Fry

Above, we’re assuming that the fields in the file are separated using the tab delimiter. But, we can override this behavior by using the -d or –delimiter option to specify a different delimiter:

$ cut -d " " -f 2 employee_data.txt

Here, we’ve used the -d option to specify space as the delimiter. Also, we’re slicing the data using field number 2.

Now, let’s look at the output:

Smith    36    HR
Wayne    48    Finance
King    40    Finance
Fry    50    IT

It’s worth noting that the output includes part of the earlier first field and all the rest of the fields. This is because tab is now treated like any other character, and there are no spaces in any of the other fields. Similarly, the first line is blank because it does not contain any spaces.

As with the other options, we can select multiple fields using the “,” separator:

$ cut -f 1,3 employee_data.txt
Name    Department
John Smith      HR
John Wayne      Finance
Edward King     Finance
Stephen Fry     IT

And, we can select a range of fields using the “-“ separator:

$ cut -f 2- employee_data.txt
Age     Department
36      HR
48      Finance
40      Finance
50      IT

The above command will output all fields from the second field onwards.

By default, the cut command prints all lines from the input, even if the delimiter is not present. But, we can alter this behavior using -s or –only-delimited. Using this option, we can tell the cut command not to print the lines that don’t have the delimiter.

6. Other Options

Now, let’s look at other options that can be used with the above slicing methods.

When we use “,” to specify multiple bytes/characters/fields, the cut command concatenates the output without using a delimiter. But, we can add a custom delimiter using the –output-delimiter option:

$ echo slicing example | cut -c 2-5,9,11-13 --output-delimiter=@

This will add the delimiter character ‘@’ between each part of the output:

lici@e@amp

Another interesting option is –complement. This will print everything except the content at the specified position.

Let’s look at an example:

$ echo slicing example | cut -c 5-10 --complement
slicample

As we can see, the output includes all characters except the ones between positions 5 and 10.

7. Conclusion

In this article, we saw examples of using the cut command. This command can be a useful tool for extracting data from files, or outputs of other commands.


« 上一篇: Linux中的date命令
» 下一篇: sed流编辑器指南