1. Overview

Shell scripting is a powerful way to automate tasks and manipulate data in Linux systems. One of the common operations that shell scripts perform is string splitting.

Basically, string splitting is the process of breaking a string into smaller pieces based on a specified delimiter. On the other hand, a delimiter is a character or a sequence of characters that separates the string into different segments.

In this tutorial, we’ll explore some practical methods and examples of splitting a string on multiple delimiters in the shell.

2. Sample String Splitting

Let’s say we have a string like name:abdul;age:25;gender:Male, we can split it by the delimiter ; to get three segments:

  • name:abdul
  • age:25
  • gender:Male

However, what if we have a string that contains multiple delimiters, such as first_name:abdul,second_name:arar;age:25;gender:Male? In this case, we want to split the string by both ; and , to get four segments:

  • first_name:abdul
  • second_name:arar
  • age:25
  • gender:Male

How can we achieve this in shell scripting?

In the following sections, we’ll discuss some of the tools we can use to split the string with multiple delimiters.

3. Using awk

awk is a programming language designed for text processing and manipulation. It can perform various operations on text files:

  • filtering
  • formatting
  • transforming
  • generating

One of the basic features of awk is string splitting, which can be done using the split() function. Additionally, the split() function takes a string, an array and a delimiter pattern as arguments. Then, it splits the string into the array based on the fieldsep delimiter pattern:

split(string, array, fieldsep)

For instance, let’s split the string first_name:abdul,second_name:arar;age:25;gender:Male by the delimiters ; and , using awk:

$ awk '{
  n=split($0, a, /[,;]/);
  for (i=1; i<=n; i++) {
    print a[i];
  }
}' <<< "first_name:abdul,second_name:arar;age:25;gender:Male"
first_name:abdul
second_name:arar
age:25
gender:Male

Now, let’s break down the command and see how it works:

  • <<< operator: pass the string as input to the awk command
  • $0 variable: refer to the entire input string
  • split() function: split the string by the delimiter pattern /[,;]/, store the resulting segments in the array a and the number of segments in the variable n
  • for loop: iterate over the array a from the first element (i=1) to the last element (i<=n) and print each element

As we can see, awk can split the string by multiple delimiters using regular expressions and arrays.

4. Using sed

sed is a tool that can perform various text-processing tasks:

  • searching
  • replacing
  • inserting
  • deleting
  • transforming

Further, sed operates on a stream of text, applying a series of commands to each line of input. One of the commands that sed can execute is the s command, which stands for substitute. Further, the s command can replace a pattern in the input with another string using regular expressions:

s/pattern/replacement/flags

For instance, we can replace all the occurrences of the word hello with hi in the input:

$ sed 's/hello/hi/g' <<< "hello world, hello everyone"
hi world, hi everyone

In this code snippet, the g flag indicates that the substitution should be done globally, i.e., for all the matches in the input.

Now, how can we use sed to split a string by multiple delimiters? The trick is to replace the delimiters with a special character that doesn’t appear in the string, such as a newline character \n. Then, we can print each line of the output as a separate segment.

Using the same example we used for the awk command, let’s now use sed instead to split the string by multiple delimiters:

$ sed 's/[;,]/\n/g' <<< "first_name:abdul,second_name:arar;age:25;gender:Male"
first_name:abdul
second_name:arar
age:25
gender:Male

The s command substitutes the pattern [;,], which matches either a ; or a , character, with the replacement \n, which is a newline character.

As a result, the delimiters are replaced by newlines, and each segment is printed on a separate line. This makes the delimiters universal and consistent and enables us to split the string by multiple delimiters using sed.

5. Customizing IFS

The IFS variable specifies the characters that Bash uses to separate words when interpreting unquoted strings. By default, IFS is set to whitespace characters (space, tab, and newline), which means Bash splits strings on these characters. Thus, to split a string on custom delimiters, we can temporarily modify IFS.

Now, we can illustrate this with an example script:

$ cat delim_by_ifs.sh 
#!/usr/bin/env bash

input_string="first_name:abdul,second_name:arar;age:25;gender:Male"
# Save the current IFS value
old_ifs="$IFS"
# Set IFS to ';,'
IFS=';,'
# Read the string into an array
read -ra elements <<< "$input_string"
# Restore the old IFS value
IFS="$old_ifs"

# Loop through the elements
for element in "${elements[@]}"; do
    echo "$element"
done

Let’s break down the script:

  1. save the current value of IFS in the old_ifs variable to ensure we can restore it later
  2. set IFS to the custom delimiters ‘;,’ to split the string on both comma and semicolon
  3. read command splits the input_string into an array named elements using the modified IFS
  4. restore the original IFS value to avoid affecting subsequent commands
  5. loop through the elements and print them

Then, we can run the script:

$ bash delim_by_ifs.sh 
first_name:abdul
second_name:arar
age:25
gender:Male

As we can see, Bash can split the string by multiple delimiters using the IFS variable and the array features.

6. Conclusion

String splitting is a useful skill that can help us manipulate and process data in various scenarios.

In this article, we’ve learned how to split a string on multiple delimiters in shell scripting. We’ve discussed three tools and techniques that can help us achieve this: awksed, and customizing IFS.