1. Introduction
When dealing with numerical data in Bash scripts, it’s often necessary to format numbers for better readability. One common formatting requirement is to add a thousands separator to large numbers.
In this tutorial, we’ll explore how to add a thousands separator to numbers in Bash.
First, we’ll discuss the sed command to edit and format numbers. After that, we’ll use printf to achieve the same. Next, we’ll examine the perl approach for adding a thousands separator in Bash. Lastly, we’ll see how we can use the awk command to add a separator within numbers.
2. Using sed
To begin with, we can use sed to add a thousands separator.
The sed command enables us to perform text transformations using regular expressions. We can use it to insert commas as a thousands separator within a number:
$ echo "123456789" | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
123,456,789
In the above command, we pipe the output of the echo command to sed, which employs a loop structure:
- :a is a label that provides branching
- ;s searches for the pattern given in the command
- \B matches a position that isn’t a word boundary
- [0-9]\{3\} matches exactly three digits
- \> matches a word boundary at the end of a word
- ,& replaces the matched sequence with itself, followed by a comma
- ta branches to label a, effectively creating a loop
The t in ta means that if any substitutions were made in the last s command, the script jumps back to a label and repeats the substitution. Otherwise, if no substitutions were made, the script continues with the next command after ta.
In summary, the substitution command matches every group of three digits that aren’t part of a larger number and inserts a comma after them.
After the substitution, the output becomes 123,456,789, where commas are inserted between each group of three digits.
3. Using printf
Another approach for adding separators is to use the built-in printf command to manipulate numbers.
The printf command in Bash enables us to format output strings. Hence, we can use it to add a thousands separator to a number directly.
To modify a number and add a thousands separators, we use the ‘ modifier:
$ printf "%'.f\n" 123456789
123,456,789
In the above code, we employ the ‘ modifier with f to indicate floating-point numbers separated by thousands separators. Notably, we use the %f modifier instead of %d, as %d may not work in some shell variations.
Furthermore, printf applies the thousands separator according to the current locale, which can vary between the usual comma, dot, or no separator and any other character in custom environments.
To force commas as the thousands separator, we can set LC_NUMERIC:
$ export LC_NUMERIC="en_US"
$ printf "%'.f\n" 123456789
123,456,789
Additionally, leading zeroes cause printf to interpret numbers as octal values, sometimes resulting in unexpected output. For instance, if we add a 0 before the digits, we get an output of 42,798:
$ num="0123456"; result=$(printf "%'d" "$num"); echo "$result"
42,798
Hence, we don’t get the expected output of 123,456.
4. Using perl
An alternative way to add a thousands separator is to use the perl command.
Perl is a powerful scripting language that provides extensive text manipulation capabilities. We can use it to achieve our goal by employing commas as thousands separators.
For instance, to add a separator between 12345789.0123456798, we combine perl with here-document:
$ cat <<'EOF' |
123456789
EOF
perl -wpe '1 while s/(\d+)(\d\d\d)/$1,$2/;'
123,456,789.0
In the above command, we use the here-document to enable easier multi-line input. Next, we utilize the perl interpreter to apply the script to the input. The perl script consists of a loop that repeats the substitution operation until it no longer matches.
Furthermore, the substitution script s/(\d+)(\d\d\d)/$1,$2/; looks for sequences of digits \d+ followed by exactly three digits \d\d\d. When it finds the match, it replaces the sequence with the first group of digits followed by a comma, and then the second group of digits. This effectively adds a thousands separator to the number.
In summary, we accomplish our goal by splitting the input into two groups, the right-hand group with three digits, and the left-hand group with at least one digit. The command subsequently replaces the entire match with these two groups, separated by a comma. This continues until the match fails.
5. Using awk
Similarly, awk provides powerful text processing capabilities in Bash scripts. We can use it to globally substitute groups of three digits from the end of the number with the same group followed by a comma.
5.1. Using gawk
First, let’s implement a solution using GNU awk:
$ echo not number 12345678900 12345 |
gawk '{while (match($0, /(^|[^.0-9])[0-9]{4,}/))
$0 = substr($0, 1, RSTART+RLENGTH-4) "," substr($0, RSTART+RLENGTH-3);
print}'
not number 12,345,678,900 12,345
In the above code, we use the g**awk command to process numbers. In particular, we use a while loop that iterates through each occurrence of a sequence of four or more consecutive digits, surrounded by non-digit characters or the start of the line.
Within the loop, we utilize the match function to locate the numerical sequences using the substr function:
- $0 represents the entire input line processed by awk
- substr($0, start, length) extracts a substring starting from position start, which is set to 1 for the beginning of the line, and extends to length characters
- length is calculated as RSTART + RLENGTH – 4, where RSTART holds the starting position of the recent match, and RLENGTH holds its length
- , serves as a separator between the two substrings
This operation extracts the portion of the input line from the start to just before the last three digits of the matched sequence. Similarly, for the second substring, RSTART + RLENGTH – 3 calculates the start position just after the last three digits of the matched sequence.
Together, these substrings replace the matched sequence of four or more consecutive digits with the same sequence. However, they insert a comma just before the last three digits, adding a thousands separator to the number. Lastly, we print the output on the terminal.
5.2. Using mawk
Notably, when using awk implementations such as mawk, which lack support for interval regex operators, we can modify the regular expression to match sequences of consecutive digits:
$ echo not number 1234567890 1234 | awk '{while (match($0, /[0-9][0-9][0-9][0-9]+/))
$0 = substr($0, 1, RSTART+RLENGTH-4) "," substr($0, RSTART+RLENGTH-3);
print}'
not number 1,234,567,890 1,234
In summary, this approach ensures that awk uses commas as a thousands separator uniformly, irrespective of the locale’s thousands_sep setting. Additionally, it avoids adding a separator after the decimal point in numbers such as 1.12345.
6. Conclusion
In this article, we explored several methods to add a thousands separator between numbers in Bash scripts.
Initially, we explored how to edit numbers using the sed command. Subsequently, we demonstrated how to accomplish the same task using printf. Following that, we discussed the Perl approach that adds a thousands separator in Bash. Finally, we saw two versions of the awk command for adding separators in numbers.