1. Overview

In this tutorial, we’ll learn how to remove the first n characters of a line using the tools provided by GNU/Linux.

2. Using cut

cut allows us to select certain sections of a line either by length or by a delimiter.

Let’s use the first of these to remove the first three letters of our string. We’ll tell it to remove up to the 4th character:

$ echo '123456789' | cut -c 4-
456789

3. Using sed

Since we know that there has to be a finite number of letters to delete, then we have a pattern. sed allows us to filter and transform text, in many cases, with the help of patterns.

Using a regular expression, we can search for the first three characters and have sed remove them from the line:

$ echo '123456789' | sed -r 's/^.{3}//'
#                             |____||____ sed removes them
#                                |                
#                                |__ search for the first three characters

With the parameter -r, we’ll be able to use extended regular expressions.

4. Using grep

Just like sed, grep also operates using text patterns. With the same regular expression we’ll look for the first three characters:

$ echo '123456789' | grep -Po '^.{3}\K.*'

The -Po flags, instruct grep to interpret the pattern as a Perl-compatible regular expression.

The \K escape sequence causes what was previously matched (the first three characters) not to be included at the end, then .* matches everything that follows.

Further use cases and examples of grep can found on Common Linux Text Searches.

5. Using awk

awk enables us to apply actions to certain patterns.

Recalling our regular expression, we can use it in our awk script as an argument to the sub function to remove the desired characters:

$ echo '123456789' | awk 'sub(/^.{3}/,"")'

And, there are a few other ways that awk can achieve this for us.

In the remaining examples, we’ll use a variable that we’ll define as range. While we could do this without a variable – inlining the value in the expression – but variables can make our command more readable, just like in coding.

Additionally, with the introduction of variables, we can control the size of the range by sending it through a parameter, keeping intact the awk script. So, by parametrizing, we’ll not lose generality in our script.

Going back to our first approximation, let’s make use of the variable:

$ echo '123456789' | awk -v range="3" 'sub(sprintf("^.{%s}",range),"")'
#                                                  |____________|
#                                                         |
#           Here we compose our regular expression _______|

Also, we can instruct awk to consider the empty char as the field separator. Then, we can iterate over each character printing only from the desired position to the end of the line:

$ echo '123456789' | awk -F '' -v range=3 '{for (i=1; i<=NF; i++) if (i > range) printf $i; print ""}'
#                        |___| |________|
#                          |       |_____ We assign the value "3" to the variable "range"
#                          |
#                          |_________ We set the input field separator as the null string and
#                                     we let a space between the null character and the -F parameter.

A more convenient way to do this is with the substr function:

$ echo '123456789' | awk -v range=3 '{print substr($0,range+1)}'

In the latter case, we can exploit that the default behavior of awk is to print the entire record (stored in the variable $0), so we can only modify it:

$ echo '123456789' | awk -v range=3 '$0 = substr($0,range+1)'

6. Using perl

perl is an interpreter of the Perl language bringing a great set of features to text processing.

As we did for sed, grep, and the sub function of awk, we can apply the regular expression in our perl call*:*

$ echo '123456789' | perl -pe 's/^.{3}//'

7. Using Parameter Expansion

Available in Bash and Zsh, parameter expansion is useful to manipulate ranges of characters:

$ var="123456789"
$ echo ${var:3}

Or, only with Zsh:

$ var="123456789"
$ echo $var[4,-1]

A disadvantage of this approach is that the lines coming from the character streams will have to be assigned to a variable before they are cut. If we wanted to do something like that, we would have to use:

$ while read var || [[ -n $var ]]; do echo ${var:3}; done < example_file.txt

Or:

$ <command> | while read var || [[ -n $var ]]; do echo ${var:3}; done

8. Conclusion

In this tutorial, we use some tools provided by GNU/Linux to remove the first n characters from a string.


« 上一篇: chgrp 命令指南
» 下一篇: tee 命令介绍