1. Overview
Bash, together with the GNU Coreutils package and tools such as sed and awk, provides several means for manipulating strings and variables. One common task is to remove specific characters from a variable. This can be useful when performing data processing or text formatting.
In this tutorial, we’ll explore several ways for removing characters from a variable in Bash.
2. Sample Task
Let’s suppose we have a variable named var consisting of quoted alphanumeric entries, delimited by a comma. The individual entries appearing in the variable are quoted either by single quotes, double quotes, or backticks:
$ var=\"A1B2C\",\'D3E4F\',\`G5H6I\`
$ echo "$var"
"A1B2C",'D3E4F',`G5H6I`
Our objective is to remove the three types of quotations in the variable. Therefore, we’d like to remove all occurrences of the “, ‘, or ` characters from the variable value.
Whichever method we choose to accomplish the task, we can then save the result into a new variable using command substitution:
$ var_new="$(...)"
We assign the final result to the new variable named var_new. When printed, the new variable should show a quotation-free output.
Let’s explore various approaches to accomplish our stated objective.
3. Using tr
tr is a powerful command-line tool for replacing or deleting characters. For our task, we can use tr to delete the three types of quotations:
$ echo "$var" | tr -d "\"\'\`"
A1B2C,D3E4F,G5H6I
The -d option with tr is for deleting the characters specified within double quotes. It’s important to note that we’ve escaped each of the three characters using a backslash so that they’re interpreted literally. Although in this case, the single quote doesn’t require escaping, we’ve escaped it with a backslash for better readability.
4. Using Bash Parameter Substitution
Bash has a built-in parameter substitution feature that enables pattern matching and replacement. We can use this feature to remove all occurrences of quotation characters in our variable:
$ echo "${var//[\"\'\`]/}"
A1B2C,D3E4F,G5H6I
Using the ${parameter//pattern/replacement} syntax, Bash performs a replacement operation over a variable. Here, the pattern is a character group consisting of the single, double, and backtick characters, while the replacement is the empty string. The double forward slashes right after var indicate that the substitution is global.
Importantly, we’ve escaped each of the characters within the character class with a backslash.
5. Using sed
sed is a versatile tool for performing various types of regex operations. We can use sed to substitute all unwanted characters of a variable with the empty string, effectively deleting them:
$ echo "$var" | sed "s/[\"\'\`]//g"
A1B2C,D3E4F,G5H6I
sed uses the “s/pattern/replacement/g” syntax to perform substitution as indicated by the s option. The g modifier at the end of sed means a global substitution. Additionally, for the pattern, we specify the three quotation characters within a character class and escape each of them with a backslash character just like we did earlier. The replacement for any of these characters in this case is the empty string.
6. Using Perl
Perl is a scripting language, the interpreter for which is preinstalled on most Linux distributions. It has advanced regex capabilities. Therefore, we can use Perl for our task in a way similar to how we used sed:
$ perl -pe "s/[\"\'\`]//g" <<< "$var"
A1B2C,D3E4F,G5H6I
The var variable is provided as input via a here-string. In this case, the regex expression used in Perl is identical to that in sed. The -p option is for processing and printing out each line of the input, while the -e option is for providing an expression to execute.
7. Using awk
GNU awk is another powerful tool for manipulating text. In particular, we can use the built-in gsub() function in awk to globally substitute any of the three quotation characters with the empty string:
$ echo "$var" | awk "{ gsub(/[\"\'\`]/, \"\"); print }"
A1B2C,D3E4F,G5H6I
Notably, we had to escape each of the two double quotes enclosing an empty string within the gsub() function. Additionally, we’ve escaped each of the three quotation characters within the character class so that their interpretation is literal.
8. Conclusion
In this article, we’ve explored various approaches for removing specific characters from a variable in Bash. In particular, we’ve used methods such as tr, Bash’s built-in parameter substitution feature, sed, Perl, and awk to accomplish the task.