1. Overview
When working with text files and documents, it’s often necessary to manipulate their contents. One common task is to delete text before a delimiter to omit unnecessary or personal information. There are many ways to accomplish this in Bash.
In this tutorial, we’ll explore some of the most common methods for removing text before a delimiter on each line of a file from the command line while retaining the delimiter itself.
2. Example Data and Goal
Let’s suppose we have a log file that contains time-stamped entries:
$ cat logfile
2023.05.01 12:24:56 - INFO - Everything is running OK
2023.05.02 01:52:45 - WARNING - A potential issue was detected
2023.05.03 04:16:23 - ERROR - Something went wrong
In this case, we want to extract only the severity level and message from each log entry without the timestamp. In particular, for each line, we’ll remove everything before the first hyphen (–), while retaining the hyphen itself.
The delimiter in this case is a hyphen, and we can delete the text preceding it using one of several methods. Let’s go over them one by one.
3. Using sed
sed is a stream editor for manipulating text files from the command line.
We can remove the text before a hyphen delimiter using sed while retaining the delimiter:
$ cat logfile | sed 's/[^-]*-/-/'
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
The sed command performs a substitution on each line of the file by replacing all characters up to and including the first delimiter with the delimiter itself. To match non-hyphen characters, the character class [^-] is used, which starts with a caret symbol (^) to indicate negation. The * following the character class means that the pattern should match zero or more non-hyphen characters.
As a result, the pattern being matched is composed of zero or more non-hyphen characters followed by a hyphen. If sed finds this pattern, it replaces it with a single hyphen.
If we wish to remove text prior to the last occurrence of the delimiter, we can modify our sed expression:
$ cat logfile | sed 's/.*-/-/'
- Everything is running OK
- A potential issue was detected
- Something went wrong
This replaces all characters up to and including the last hyphen with the same delimiter. The process relies on the fact that regex in Bash is inherently greedy. Therefore, it’ll look for the longest expression that matches the pattern.
4. Using cut
cut is a command line utility that we can employ to extract columns from text files.
We can use cut to delete text before a hyphen delimiter by specifying which fields to extract:
$ cut -d '-' -f 2- logfile | sed 's/.*/-&/'
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
The cut command splits each line of the file at the delimiter and prints everything from the second field onward. The -d option specifies the delimiter to use, while the -f option specifies the fields to extract.
If we use the cut command by itself, the output won’t display the delimiter before the second field. To overcome this, we pipe the output into sed, which replaces the entire line with a new one that starts with a hyphen followed by the original line.
Now, if we want to eliminate text preceding the last occurrence of the delimiter, we can use the cut command in combination with rev:
$ rev logfile | cut -d '-' -f 1 | rev | sed 's/.*/-&/'
- Everything is running OK
- A potential issue was detected
- Something went wrong
The rev command reverses the character sequence in each line of the log file. Then, we use cut to extract the first field and pipe the output into rev again. In effect, this process extracts the final field of the original line. Finally, we pipe the result into sed to prepend the hyphen delimiter, as we did previously.
5. Using grep
We can also use grep to extract the text we need from the log file:
$ grep -o '\-.*' logfile
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
The -o option returns the matched pattern instead of the entire line where the pattern exists. The pattern in this case consists of a hyphen, which we have escaped using a backslash, followed by zero or more characters. Thus, in each line, we’re extracting all the text starting from the first hyphen we encounter.
On the other hand, if our objective is to remove text before the last delimiter, we can take advantage of the greedy property of regex in Bash:
$ grep -o '\-[^-]*$' logfile
- Everything is running OK
- A potential issue was detected
- Something went wrong
In this case, the pattern we want to match consists of a hyphen, zero or more non-hyphen characters, and the end of a line, indicated by the $ symbol.
6. Using awk
awk is another powerful text-processing tool available in Unix-based systems. We can use awk to delete text before a hyphen delimiter:
$ awk -F '-' '{for (i = 2; i <= NF; i++) {printf "-%s", $i}; printf "\n"}' logfile
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
The -F option is for specifying the input field separator or delimiter. The awk command employs a for loop to iterate through and print the second field up to the last field. The last field is denoted by the awk variable, NF, which represents the total number of fields in the line. Furthermore, the printf command specifies that a hyphen should be printed before each field by using the “-%s” format. After processing all fields, awk prints a newline character.
Alternatively, we can use awk to substitute all the text up to and including the first hyphen with a single hyphen:
$ awk -F '-' '{sub(/[^-]*-/, "-"); print $0}' logfile
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
For each line in the log file, we replace the pattern we previously used with sed with a single hyphen, and then we print the entire line using the print $0 command.
Now, if we wish to delete text preceding the last delimiter, we can modify the regex expression employed in awk in the same way as we did with sed:
$ awk -F '-' '{sub(/.*-/, "-"); print $0}' logfile
- Everything is running OK
- A potential issue was detected
- Something went wrong
Due to the greedy nature of regex in Bash, this approach tries to match and replace all characters up to and including the last delimiter.
7. Using Shell Parameter Expansion
We can also use Bash’s built-in string manipulation features. In particular, options of the parameter expansion enable us to remove a pattern from the beginning of a text:
$ while read -r line; do
> echo "-${line#*-}"
> done < logfile
- INFO - Everything is running OK
- WARNING - A potential issue was detected
- ERROR - Something went wrong
Here, we use a while loop to process each line of the log file. *The loop echoes a single hyphen followed by ${line#*-}, which removes all characters up to and including the first hyphen of the line*. This effectively replaces all text up to and including the first hyphen with a single hyphen.
We can further expand the solution to delete text before any delimiter:
$ while read -r line; do
> echo "-${line##*-}"
> done < logfile
- Everything is running OK
- A potential issue was detected
- Something went wrong
The only modification required is the use of a double # in place of a single # as shown in the ${line##*-} expression. This is interpreted as a greedy match when removing text from the beginning of the line.
8. Conclusion
In this article, we learned several ways to delete text before a delimiter from the command line in Bash. The methods included the use of sed, cut, grep, awk, and string manipulation via parameter expansion. We can use these powerful tools for a wide range of text-processing tasks.