1. Overview

There are various occasions when we might want to remove the text after a specific character or set of characters. For example, one typical scenario is when we want to remove the extension of a particular filename.

In this quick tutorial, we’re going to explore several approaches to see how we can manipulate strings to remove text after a given pattern. We’ll be using the Bash shell in our examples, but these commands may also work in other POSIX shells.

2. Native String Manipulation

Let’s start by taking a look at how we can remove text using some of the built-in string manipulation operations offered by Bash. For this, we’re going to be using a feature of the shell called parameter expansion.

To quickly recap, parameter expansion is the process where Bash expands a variable with a given value. To achieve this we simply use a dollar sign followed by our variable name, enclosed in braces:

my_var="Hola Mundo"
echo ${my_var}

As expected the above example results in the output:

Hola Mundo

But as we’re going to see during the expansion process, we can also modify the variable value or substitute it for other values.

Now that we understand the basics of parameter expansion, in the next subsections, we’ll explain several different ways of how to delete parts of our variable.

In all our examples we’ll focus on a pretty simple use case to remove the file extension of a filename.

2.1. Extracting Characters Using a Given Position and Length

We’ll start by seeing how to extract a substring of a particular length using a given starting position:

my_filename="interesting-text-file.txt"
echo ${my_filename:0:21}

This gives us the output:

interesting-text-file

In this example, we’re extracting a string from the my_filename variable. Starting at position 0 and with a length of 21 characters. In effect, we’re saying remove all the text after position 21 which in this case is the .txt extension.

Although this solution works there are some obvious downsides:

  • Not all the filenames will have the same length
  • We’d need to calculate where the file extension starts to make this a more dynamic solution
  • To the naked eye, it isn’t very intuitive what the code is actually doing

In the next example, we’ll see a more elegant solution.

2.2. Deleting the Shortest Match

Now we’re going to see how we can delete the shortest substring match from the back of our variable:

echo ${my_filename%.*}

Let’s explain in more detail what we’re doing in the above example:

  • We use the ‘%’ character which has a special meaning and strips from the back of the string
  • Then we use the bash glob  ‘.*’ to match the substring that starts with a dot
  • We then execute the echo command to output the result of this substring manipulation

Again we delete the substring ‘.txt’ resulting in the output:

interesting-text-file

2.3. Deleting the Longest Match

Likewise, we can also delete the longest substring match from our filename. Let’s now imagine we have a slightly more complicated scenario where our filename has more than one extension:

complicated_filename="hello-world.tar.gz"
echo ${complicated_filename%%.*}

In this variation ‘%%.*’ strips the longest match for ‘.*’ from the back of our complicated_filename variable. This simply matches “.tar.gz” resulting in:

hello-world

It’s worth mentioning that neither Bash shortest nor longest match supports regular expressions. So, we need to use glob.

2.4. Using Find and Replace

In this final string manipulation example, we’ll see how to use the built-in find and replace capabilities of Bash:

echo ${my_filename/.*/}

In order to understand this example, let’s first understand the syntax of substring replacement:

${string/substring/replacement}

Now to put this into context we are replacing the first match of ‘.*’ in the my_filename variable and replacing it with an empty string. In this case, we again remove the extension.

3. Using the sed Command

In this penultimate section, we’ll see how we can use the sed command. The sed command is a powerful stream editor which we can use to perform basic and complex text transformations.

Using this command, we can find a pattern and replace it with another pattern. When the replace placeholder is left empty, the pattern gets deleted.

As per our other example, we’ll simply feed the input string to the sed command:

$ sed 's/[.].*//' <<< 'hello-world.tar.gz'
hello-world

In this example, the sed command searches for the first occurrence of the ‘*.*‘ character and removes it and all characters after it.

The pattern “*[.].**” is a regular expression. A single dot in regular expression has a special meaning: matching any character. For example, “.*” means any character sequence.

Therefore, if we would like to match a literal dot character, we need to either escape the dot “\.” or use the character class “[.]”.

4. Using the cut Command

In this final example, we’ll explore the cut command. As the name suggests we can use the cut command for cutting out fields from the text.

Back to our problem, no matter how many occurrences of X are in the text, the first field will always be the answer:

$ cut -f1 -d"." <<< 'hello-world.tar.gz'               
hello-world

Let’s take a look at the command in more detail to understand it properly:

  • We first use the -f option to specify the field number which indicates the field to extract
  • The -d option is used to specify the field separator or delimiter, in this example a ‘.’

Output fields are separated by a single occurrence of the field delimiter character. This means in our example we end up with three fields split by the dot. Consequently, we select the first one and in the process discarding the rest text.

5. Conclusion

In this quick tutorial, we’ve described a few ways that help us remove text from a string.

First, we explored native string manipulation using parameter expansion. Later, we saw an example with the power stream editing command sed. Then, we showed how we could achieve similar results using the cut command.


» 下一篇: Linux join 命令