1. Overview
In this tutorial, we’ll do a quick comparison of the Linux commands sort | uniq and sort -u. Both use sort to remove duplicate entries from a list, but they operate in slightly different manners.
Note that all commands below are platform-independent.
2. Basic Usage
Let’s start with a list of colors in a file named color:
% cat color
Black
green
red
red
yellow
Green
red
If we want to remove duplicates, uniq would work in some cases. Checking the man page for uniq:
Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first.
For our list, the result would not be a list of unique entries because our list has duplicated, non-adjacent entries of “red”:
% uniq color
Black
green
red
yellow
Green
red
There are a couple of ways around this. First, using the -u argument with uniq removes all duplicates, both adjacent and non-adjacent:
% uniq -u color
Black
green
yellow
Green
red
Alternatively, taking the man page suggestion, sorting the list before calling uniq will remove all of the duplicates.
Sorting the list is easy:
% sort color
Black
Green
green
red
red
red
yellow
Piping this to uniq yields:
% sort color | uniq
Black
Green
green
red
yellow
Now, checking the man page for sort, we can see that the -u flag will provide the same output:
% sort -u color
Black
Green
green
red
yellow
So, generally speaking, both sort | uniq and sort -u do the same thing. But there are some differences.
For example, sort has other options, like sorting on delimiters. But we can use these regardless of using -u or piping to uniq.
3. Counting Unique Entries
After finding a unique list of items, many times we’ll also want to know the number of unique items. The -c option for uniq will return a count for each duplicated line:
% uniq -c color
1 Black
1 green
2 red
1 yellow
1 Green
1 red
Kind of useful, but it again hits the issue of ignoring non-adjacent duplicates. To avoid that, we could sort the list first, then pipe the output to uniq:
sort color | uniq -c
1 Black
1 Green
1 green
3 red
1 yellow
Now we have a list of unique entries regardless of adjacency.
Taking it a step further, let’s say we want a count of unique items in the list. We can pipe to wc:
% sort color | uniq | wc -l
5
Or with sort -u instead of uniq:
% sort -u color | wc -l
5
And we get a count of our unique list items.
4. Summary
In this short article, we described the differences between using sort | uniq and sort -u.