1. Overview
In this article, we’ll cover the implementation of arrays/lists in the Linux shell along with their quirks.
2. Bash vs. POSIX Shell Arrays
Compared to a basic POSIX-compliant shell, bash arrays are much more powerful and convenient to use. Let’s illustrate this by trying to create an indexed array and access its 3rd member.
In bash, we declare arrays with the (…) syntax:
$ array=(item1 item2 item3)
$ echo "${array[2]}"
item3
In POSIX shell, we declare arrays with set:
$ set -- item1 item2 item3
$ echo "$3" # Arrays indices start from 1
item3
We can see that bash has a much cleaner syntax for arrays, making it easier to use for more complex operations.
3. Creating Arrays
We don’t have first-class support for arrays in POSIX shell. However**, we can use the list of positional parameters as an array.**
Positional parameters are all the parameters passed to a shell script/function. For example, in my_function 1 2 3, the numbers following my_function are the positional parameters.
We can modify arrays using the set built-in and access them via the $@ variable, which represents all the positional parameters, i.e., our array:
$ set -- 1 2 3
$ echo $@
1 2 3
We mark the end of options with the double-dash.
4. Basic Array Operations
Now, let’s look at the various array operations we can perform.
4.1. Adding Items
For adding items, we simply pass the original array to set, along with the new items:
$ set -- 1 2 3
$ echo "$@" # Original array
1 2 3
$ set -- "$@" 4
$ echo "$@" # New array
1 2 3 4
Here “$@” expands to all of our original positional parameters, i.e., items of the array.
4.2. Removing Items
We can easily remove multiple items from the start of an array, but arbitrary removal is tricky.
For removing elements, we use the shift built-in, passing it the number of items to remove:
$ set -- 1 2 3
$ shift 2 # Remove first 2 items
$ echo "$@"
3
There is no direct way of removing items at a given index, so let’s write a function for it:
# Argument 1: The index to remove
# Argument 2: The array
# Usage: set -- "$(array_remove N "$@")"
array_remove() {
index="$
shift # Remove the index from argument list
counter=1 # Array indexing starts from 1
# Print elements upto the index, "-lt" means less than.
while [ "$counter" -lt "$index" ]; do
: $((counter+=1)) # Increment counter
echo "$1" # First item of current array
shift # Move to the next item
done
# Skip the element at the removal index, we've printed everything before it.
shift
# Print the rest of the array.
echo "$@"
}
Let’s test it:
$ set -- 1 2 3 4 5
$ set -- "$(array_remove 4 "$@")" # Remove at 4th index
$ echo "$@"
1 2 3 5
Note that this method is inefficient since we traverse the whole array up to the given index.
4.3. Indexing
We can index arrays with the variable ${N} where N is the required index:
$ set -- 3 2 1
$ echo "${3}"
1
We must wrap the number in curly braces to allow indexes greater than one digit long. For example, the shell might evaluate “*$98” as the string “8” appended to the value of the “$9” variable. “${99}*” prevents this behavior.
However, we need to resort to eval if we are storing the index in an environment variable:
$ set -- one two three
$ index=3
$ eval "echo \${${index}}"
three
To avoid eval, we can create a function to take in the array, skip N elements, and then print the first element:
# Argument 1: The index
# Argument 2: The array
array_index() {
shift "$1" # Shift N number of elements, including the first argument
# Return non-zero if index is out of bounds ($1 will be empty)
echo "${1:?Index out of bounds}" # Print the first item after shifting
}
Let’s run it:
$ set -- 0 1 2 3 4 5 6 7 8 9 10
$ array_index 12 "$@"
/bin/sh: 1: Index out of bounds
$ array_index 11 "$@"
10
4.4. Iteration
We use for to iterate over an array:
$ set -- 1 2 3
$ for item in "$@"; do echo "$item"; done
1
2
3
The in “$@” is optional here. We can iterate over positional parameters with for item; do …; done without the *in “$@*” as well.
4.5. Generating Arrays From Commands
We can also pass subshell commands as an argument to set to generate arrays. Say we want an array of 100 integers:
$ set -- $(seq 100)
$ echo "$@"
1 2 3 4 5 6 7 8 9 10 11 12 ...
The seq command generates numbers in a given range.
5. Associative Arrays / Hash Maps
If we’re at the point of needing hash maps in the shell, we should consider using more powerful languages such as Python.
While it is still possible to implement them using files and interact with them via functions, more complex operations like nested keys can’t be implemented cleanly.
Additionally, fetching or creating new keys will also have much more latency as the system needs to create new file descriptors each time for reading the data.
5.1. Implementation
For the implementation, we just use file names as hashed keys and their content as values.
We also take the checksum of the key instead of just using the key string as the filename. This allows us to not only bypass the filename length limit but also avoid extra slashes in the name. For example, creating a file with the name “filewith/slash” would be invalid since the slash separates directories.
The hash table directory itself is created with mktemp:
hm_create() {
# Create a temporary directory and return it's name
mktemp -d
}
# Lazy hash function that just generates a checksum.
# Feel free to replace this with a more secure checksum like sha256.
hm_hash() {
echo "$1" | md5sum -
}
# Argument 1: Hash Table
# Argument 2: Key
# Argument 3: Value
hm_put() {
echo "$3" > "$1/$(hm_hash "$2")"
}
# Argument 1: Hash Table
# Argument 2: Key
hm_delete() {
rm -f "$1/$(hm_hash "$2")"
}
# Argument 1: Hash Table
# Argument 2: Key
hm_get() {
cat "$1/$(hm_hash "$2")"
}
5.2. Usage
Let’s create a hash table with a few keys and print them:
$ hm="$(hm_create)"
$ echo "Created hashmap "$hm""
Created hashmap /tmp/tmp.K6Kuuv
$ hm_put "$hm" mykey myvalue
$ hm_put "$hm" hash table
$ hm_get "$hm" hash
table
$ hm_get "$hm" mykey
myvalue
$ hm_delete "$hm" hash
$ hm_get "$hm" hash # Deleted key "hash" doesn't exist, will raise an error.
cat: can't open '/tmp/tmp.K6Kuuv/4e76434eea3c9d9cf9cb10bbf3f4a74b -': No such file or directory
Then, we can delete the whole hash table with a simple rm -rf on the $hm directory.
6. Conclusion
In this article, we learned about the differences between bash and POSIX arrays, along with their usage. We can also conclude from the implementations that for more complex data structures like the hash map, it is better to go for powerful, higher-level scripting languages such as Python for ease of use and robustness.