1. Overview

When dealing with strings, formatting is crucial to the integrity of our data. Whitespace should follow a defined structure, so every data field is well interpreted.

While methods for dealing with whitespace in files can work, bash variables are usually shorter and less complex. Particularly, they’re normally single lines, which makes manipulation easier.

In this tutorial, we’ll look at some ways to remove whitespace from a bash variable. Methods can vary since we might just need to remove leading and trailing spaces or more complex formatting. This article is written with the bash shell in mind, so while it could work with other shells, it might not be optimal.

2. Using xargs

The xargs tool is used to execute commands from standard input.

Due to its automatic text formatting, when xargs interprets strings it removes extra whitespace.

Let’s see how it processes text:

$ echo "     welcome to   baeldung  " | xargs
welcome to baeldung

Although not its original purpose, we can use xargs for removing undesired spaces. As we can see, it removes not only the leading and trailing whitespaces (the spaces around the word) but also the multiple spaces between the words.

However, this tool doesn’t remove the newline character at the end of the string and is relatively crude.

3. Using tr

tr is used for translating characters and can be leveraged for string manipulation. We can also apply tr when replacing or removing characters.

Let’s use it for removing spaces:

$ echo " welcome to    baeldung " | tr -d '[:blank:]'
welcometobaeldung

We use the -d flag to delete all occurrences of a character. In this case, we’re deleting all blank characters. This is represented by the POSIX bracket expression [:blank:], which groups the space and tab character.

Because of this, the tr command doesn’t maintain the spaces between characters.

If we use [:space:] instead, we can remove all cases of whitespace, including newlines:

$ echo " welcome to    baeldung " | tr -d '[:space:]'
welcometobaeldung

We can also trim extra whitespace between characters with the -s flag:

$ echo " welcome to    baeldung " | tr -s '[:blank:]'
 welcome to baeldung 

The -s flag stands for squeeze, and we use it to remove sequential occurrences of a character.

4. Using sed

sed can also be used for removing whitespace. Like tr, the sed command takes a regex approach for string manipulation.

Let’s start by removing all whitespace from a string with a sed regular expression:

$ echo " welcome to    baeldung " | sed 's/[[:blank:]]//g'
welcometobaeldung

The s stands for substitution. Then, we present two patterns between slashes, where the second pattern substitutes the first pattern in the string. In our case, we’re replacing [:blank:] with nothing, which removes these characters.

The substitution is only applied to the first occurrence of the pattern. We can use the -g flag to counteract this behavior and apply it globally, to all occurrences.

Now, let’s trim extra whitespace by squeezing multiple spaces into one:

$ echo " welcome to    baeldung " | sed -r 's/[[:blank:]]+/ /g'
 welcome to baeldung 

Using the +, we managed to identify occurrences of one or more spaces. Then, we replaced it with only one space.

The -r flag activates extended regular expressions, which have more functionalities (e.g., +).

A common method of formatting strings is removing leading and trailing whitespace, while also trimming extra whitespace between characters.

So, let’s use both approaches together:

$ echo " welcome to    baeldung " | sed -re 's/^[[:blank:]]+|[[:blank:]]+$//g' -e 's/[[:blank:]]+/ /g'
welcome to baeldung

The -e flag allows for multiple edits with different rules. This way, we managed to eliminate all unwanted whitespace.

5. Using Built-in String Manipulation

The bash shell comes with string manipulation features. It’s a simpler system than its counterparts, so queries can seem more complicated.

This string manipulation works by identifying patterns and removing them. First, let’s clarify the format. Inside brackets, we select the variable, the operator, and the pattern to remove. In the pattern, a * matches any character, zero or more times.

Let’s start with a simple example:

$ var="ababc"
$ echo "${var#a*b}"
abc

Here, *the operator is #, which removes the pattern from the beginning of the string*.

Let’s use a double operator for a different result:

$ var="ababc"
$ echo "${var##a*b}"
c

As we saw before, a single operator will match the shortest subsequence available. In contrast, a double operator matches the longest sequence.

There’s another operator we can use:

$ var="ababc"
$ echo "${var%b*c}"
aba
$ echo "${var%%b*c}"
a

*The % works like the #, but removes the pattern from the end of the string*. We can also use it as a single or double operator.

Now, let’s remove the leading whitespace of a variable:

$ var=" welcome to    baeldung "
$ var="${var#"${var%%[![:space:]]*}"}"
$ echo "$var"
welcome to    baeldung 

It’s easier to understand if we look at it in parts.

The inner query removes the largest substring that starts with a non-space character from the end of the string. This leaves us with the leading whitespaces. The outer query removes them, by matching each at the beginning of the string.

Building upon the example, we can do the opposite for the trailing whitespace:

$ var="${var%"${var##*[![:space:]]}"}"
$ echo "$var"
welcome to    baeldung

In this case, the inner query leaves only the trailing whitespace. It matches the largest substring that ends with a non-space character, from the start of the string. The outer query then removes this space, from the end of the string.

6. Conclusion

In this article, we looked at different approaches for trimming whitespace in bash variables. This can translate into removing the leading and trailing whitespace in strings. It can also mean dealing with extra whitespace between words, by squeezing it.

We used some command line tools to manipulate the strings. Regex has an important role in string formatting and is adaptable to other patterns as well.

Generally, such operations are useful for sanitizing user input or using predefined formatting in unstructured data.