1. Introduction

When using a Linux-based command line interface, we often need to parse strings based on separators for post-processing logs. The separators can be ‘-‘ in a string representation of date or ‘/’ in file paths. Other examples of separators are ‘*:‘, ‘|‘, and ‘@*‘.

In this tutorial, we’ll discuss various techniques to split a string and extract various parts of the string. We’ll specifically use the date command to generate example input strings and use bash, sed, awk, cut, and grep to extract the interesting parts of the string.

2. Understanding the date Command

To begin with, let’s explore the date command:

$ date "+%Y-%m-%d"
2023-12-07
$ date "+%Y/%m/%d
2023/12/07
$ date "+%Y-%m-%d_%H:%M:%S"
2023-12-07_22:07:14

Here, we use the date command with different options to print the date and time in a specified format. The example specifications are:

  • +%Y-%m-%d: prints the date in year-month-day format
  • +%Y/%m/%d: prints the date in year/month/day format
  • +%Y/%m/%d_%H:%M:%S: prints the date and time in year/month/day_hh:min:sec format

We’ll use the above command to generate the string and then extract various parts of the string.

3. Extract the Suffix of the String

We’ll extract the day of the month from today’s date using the output of the date command.

3.1. cut Command

We split the string using the cut command:

$ date "+%Y-%m-%d" | cut -f3 -d '-' 
07

As shown above, the cut command has a couple of options:

  • -d ‘-‘ specifies the ‘-‘ character as the field separator for output fields
  •  -f3 specifies the third field, i.e., the day in the given date

In summary, we used the cut command, to parse and extract the last part of the string.

3.2. awk Script

We’ll extract the interesting part of the string using the awk command:

$ date "+%Y-%m-%d" | awk -F- '{print $NF}'
07

Here, the awk command has a couple of options:

  • -F- specifies ‘-‘ character as the field separator for output fields
  • print $NF prints the last field in the given line

To emphasize, we piped the output of the date command to the awk script. The awk script prints the last field.

3.3. sed Script

We use a regular expression to extract the useful part of the string using the sed command:

$ date "+%Y_%m_%d" | sed 's/.*_//'
07

In the above example, the sed command uses the regular expression s/.*_//  to blank out everything other than the suffix of the string after the ‘_’ separator.**

The key idea is the use of a regular expression, to extract the appropriate part of the string.

3.4. Bash Parameter Expansion

We use native parameter expansion support in Bash:

$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ printf "%s\n" "${DATE##*-}"
07

Here, we use the printf command with the following options:

  • “%s\n” specifies the output format as a string
  • ${DATE##*-} specifies an expression ‘##*-‘ to match the longest string ending in a ‘-‘ and deletes it

In summary, we used the parameter expansion feature in Bash to extract the part of the string.

3.5. Python

Let’s use a Python script to split a string:

$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ cat split.py
#!/usr/bin/env python 
import sys 
print(sys.argv[1].split("-")[-1])
$ split.py $DATE
07

As illustrated above, the Python script does the following:

  • sys.argv[1] reads the command line parameter
  • split(“-“) function splits the parameter using ‘-‘ as a separator
  • [-1] expression reads the last item of the array

Python, being a general-purpose scripting language, is more flexible for string processing.

3.6. Bash Substring

We use a native substring operator in Bash to extract the day of the month:

$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07

First, we find the index of the  last hyphen ‘-‘ character:

$ date "+%Y-%m-%d" |
grep -ob "-"
4:-
7:-

Next, we extract the last digit, from the output of the grep command:

$ date "+%Y-%m-%d" |
grep -ob "-" |
grep -oE "[0-9]+" |
tail -1
7

In the above example, the sequence of the commands uses the following options:

  • grep -ob “-“ prints all the index positions of the hyphen character
  • grep -oE “[0-9]+” extracts the the the digit
  • tail -1 extracts the last instance of the digit.

Then, we calculate the index of the character after the last hyphen and extract the suffix:

$ DATE=$(date "+%Y-%m-%d")
idx=$(date "+%Y-%m-%d" | grep -ob "-" | grep -oE "[0-9]+" | tail -1)
$ let "idx = $idx + 1"
$ printf %s "${DATE:$idx}"

As we see, the let command calculates the index of the character after the hyphen. Finally, we print the suffix of the string using substring expansion in Bash.

In summary, we used a sequence of commands in Bash to extract the suffix of a string.

4. Conclusion

In this article, we discussed multiple ways to extract the last part of a string, after a hyphen. To begin with, we used the Bash, awk, sed, and grep commands to split and extract the parts of the string. Later we also used a general-purpose scripting language i.e. Python.

The examples shown above can be modified to use any separator, apart from a hyphen.