1. Introduction
When using a Linux-based command line interface, we often need to parse strings based on separators for post-processing logs. The separators can be ‘-‘ in a string representation of date or ‘/’ in file paths. Other examples of separators are ‘*:‘, ‘|‘, and ‘@*‘.
In this tutorial, we’ll discuss various techniques to split a string and extract various parts of the string. We’ll specifically use the date command to generate example input strings and use bash, sed, awk, cut, and grep to extract the interesting parts of the string.
2. Understanding the date Command
To begin with, let’s explore the date command:
$ date "+%Y-%m-%d"
2023-12-07
$ date "+%Y/%m/%d
2023/12/07
$ date "+%Y-%m-%d_%H:%M:%S"
2023-12-07_22:07:14
Here, we use the date command with different options to print the date and time in a specified format. The example specifications are:
- +%Y-%m-%d: prints the date in year-month-day format
- +%Y/%m/%d: prints the date in year/month/day format
- +%Y/%m/%d_%H:%M:%S: prints the date and time in year/month/day_hh:min:sec format
We’ll use the above command to generate the string and then extract various parts of the string.
3. Extract the Suffix of the String
We’ll extract the day of the month from today’s date using the output of the date command.
3.1. cut Command
We split the string using the cut command:
$ date "+%Y-%m-%d" | cut -f3 -d '-'
07
As shown above, the cut command has a couple of options:
- -d ‘-‘ specifies the ‘-‘ character as the field separator for output fields
- -f3 specifies the third field, i.e., the day in the given date
In summary, we used the cut command, to parse and extract the last part of the string.
3.2. awk Script
We’ll extract the interesting part of the string using the awk command:
$ date "+%Y-%m-%d" | awk -F- '{print $NF}'
07
Here, the awk command has a couple of options:
- -F- specifies ‘-‘ character as the field separator for output fields
- print $NF prints the last field in the given line
To emphasize, we piped the output of the date command to the awk script. The awk script prints the last field.
3.3. sed Script
We use a regular expression to extract the useful part of the string using the sed command:
$ date "+%Y_%m_%d" | sed 's/.*_//'
07
In the above example, the sed command uses the regular expression s/.*_// to blank out everything other than the suffix of the string after the ‘_’ separator.**
The key idea is the use of a regular expression, to extract the appropriate part of the string.
3.4. Bash Parameter Expansion
We use native parameter expansion support in Bash:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ printf "%s\n" "${DATE##*-}"
07
Here, we use the printf command with the following options:
- “%s\n” specifies the output format as a string
- ${DATE##*-} specifies an expression ‘##*-‘ to match the longest string ending in a ‘-‘ and deletes it
In summary, we used the parameter expansion feature in Bash to extract the part of the string.
3.5. Python
Let’s use a Python script to split a string:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ cat split.py
#!/usr/bin/env python
import sys
print(sys.argv[1].split("-")[-1])
$ split.py $DATE
07
As illustrated above, the Python script does the following:
- sys.argv[1] reads the command line parameter
- split(“-“) function splits the parameter using ‘-‘ as a separator
- [-1] expression reads the last item of the array
Python, being a general-purpose scripting language, is more flexible for string processing.
3.6. Bash Substring
We use a native substring operator in Bash to extract the day of the month:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
First, we find the index of the last hyphen ‘-‘ character:
$ date "+%Y-%m-%d" |
grep -ob "-"
4:-
7:-
Next, we extract the last digit, from the output of the grep command:
$ date "+%Y-%m-%d" |
grep -ob "-" |
grep -oE "[0-9]+" |
tail -1
7
In the above example, the sequence of the commands uses the following options:
- grep -ob “-“ prints all the index positions of the hyphen character
- grep -oE “[0-9]+” extracts the the the digit
- tail -1 extracts the last instance of the digit.
Then, we calculate the index of the character after the last hyphen and extract the suffix:
$ DATE=$(date "+%Y-%m-%d")
idx=$(date "+%Y-%m-%d" | grep -ob "-" | grep -oE "[0-9]+" | tail -1)
$ let "idx = $idx + 1"
$ printf %s "${DATE:$idx}"
As we see, the let command calculates the index of the character after the hyphen. Finally, we print the suffix of the string using substring expansion in Bash.
In summary, we used a sequence of commands in Bash to extract the suffix of a string.
4. Conclusion
In this article, we discussed multiple ways to extract the last part of a string, after a hyphen. To begin with, we used the Bash, awk, sed, and grep commands to split and extract the parts of the string. Later we also used a general-purpose scripting language i.e. Python.
The examples shown above can be modified to use any separator, apart from a hyphen.