在Bash中从数组获取唯一值

1. Overview

Bash is a versatile shell installed on almost every Linux system.

Beyond its value as an interactive shell, bash can also be used to automate all sorts of tasks. Those tasks will often deal with lists of our data: say, a list of files, or a list of IP addresses.

We can organize these lists into arrays. Bash uses both indexed arrays (where we refer to items by number) and associative arrays (where we refer to items by name). Associative arrays are often called maps or dictionaries in other programming languages.

In the tutorial, we’ll assign some lists to arrays. Then, we’ll look at ways to sift out duplicates from our arrays, so they only include unique entries and no duplicates.

2. Declaring and Using Arrays

Bash Arrays are covered in depth elsewhere on this site.

In brief, we need to know how to declare both kinds of arrays (indexed and associative). We also need to know how to refer to the entire array.

We can define our indexed array using parentheses:

ip_addrs=(192.168.2.101 192.168.2.105 192.168.2.201 192.168.2.110)

To work with the whole array, rather than individual indexes, we use the @ sign instead of a number.

This lets us loop over all of the elements in our array without keeping track of the index. For example, say we want to ping each address once:

for ip in ${ip_addrs[@]}; do
    ping -D -c1 $ip
done

2.1. Associative Arrays

For some data, we want more than just a numbered list. Associative arrays let us give names (“keys”) to our data (“values”). NoSQL databases also often organize data this way.

We can assign several key/value pairs at once like this:

declare -A ips_by_hostname=([mysql]=192.168.2.101 [nginx]=192.168.2.99 [smtp]=192.168.2.105)

Let’s remember to first “declare” the variable as an associative array with declare -A.

We can also produce an array of all the values or of all the keys.

To list all an associative array’s values, we use a similar syntax to indexed arrays:

$ echo "${ips_by_hostname[@]}"
192.168.2.101 192.168.2.105 192.168.2.99

To view the keys, add an exclamation point before the array variable name:

for host in "${!ips_by_hostname[@]}"; do
    echo "key: $host value: ${ips_by_hostname[$host]}"
done
key: mysql value: 192.168.2.101
key: smtp value: 192.168.2.99
key: nginx value: 192.168.2.105

3. Only Unique Values

So let’s get back to our original question: when our array contains duplicates, what’s an easy way to remove them?

3.1. Assign to Associative Array Keys

The keys of associative arrays must always be unique to that array. One key, one value.

We can use this to our advantage. If we loop through an array and assign each item to the key of an associative array, the keys to that array will be a set of the unique values of our original array:

declare -a ip_addrs=(192.168.1.101 192.168.1.105 192.168.1.105 192.168.1.106)

declare -A uniq_tmp

for ip in "${ip_addrs[@]}"; do
    uniq_tmp[$ip]=0 # assigning a placeholder
done

echo "unique: ${!uniq_tmp[@]}" # only the keys

This will output three IP addresses, omitting the duplicate 192.168.1.105. This solution has the advantage of working all within bash, not running any other programs.

3.2. Using sort -u

We have other options when shell scripting. We can pass our array through the GNU sort utility.

This will change the order of the items, so let’s keep that in mind if it matters in our particular script.

The sort command sorts input line by line, so we need to split our array into multiple lines.

The sort command’s -u option discards any duplicate lines.

So we can work with our array of IP addresses like this:

uniqs_arr=($(for ip in "${ip_addrs[@]}"; do echo "${ip}"; done | sort -u))

Now we have an array of unique, sorted values.

3.3. Using tr and awk

Once we start using the rest of the collection of Unix tools, we start finding solutions everywhere. One powerful tool beyond bash is awk:

uniqs_arr=($(tr ' ' '\n' <<<"${ip_addrs[@]}" | awk '!u[$0]++' | tr '\n' ' '))

This gets into more complicated territory. But to break it down:

We need to break the array apart into separate lines again. This time, we use tr to translate the spaces between our items into newlines (\n).
We pipe this to awk, and have it only quote unique lines.
Then we translate the output back into an array again, using tr once more.

The advantage of this method is that we preserve the order of the elements of the array.

4. Conclusion

Bash arrays come in two types: indexed, and associative. Using them both together allows us shortcuts to some useful data structure operations.

In this tutorial, we’ve created and manipulated both kinds of bash arrays. We’ve looked at three ways to filter our arrays so they only contain unique values.

With these techniques, we can pull off some more complicated and handy programming steps, all with a bash!

Persistence

REST

Security