1. Overview
In this article, we’ll look at different approaches to putting items at the end of an alphabetic list in Linux. For example, this occurs in folder sorting, which is the case we’ll deal with in the article.
When dealing with folders with multiple subdirectories and files, we may only be interested in quickly checking some of them. Their naming dictates whether we can see them directly or if we need to scroll with the mouse wheel. When working with a GUI (Graphical User Interface), it’s usually convenient to display these folders of interest at the top of the screen.
However, when working with a CLI (Command Line Interface), as the number of lines of the terminals is limited and the command prompt is located at the bottom, we want to see these at the bottom of the list. This is the example that we’ll use throughout the article.
2. Linux Sorting
Before moving on to the solutions to the problem, we need to discuss what it means to be alphabetically sorted. It’s clear that the letter a is ahead of z. But what about capital letters? Symbols? Are numbers listed before or after the letters?
The locale settings determine many properties such as decimal separator character, date order, day names, and sorting order. The environment variable LC_ALL overrides most of these localization settings. The environment variable relevant for sorting is LC_COLLATE.
Some of the possible values for the LC_COLLATE variable are not POSIX compliant. If LC_COLLATE=en_US.UTF-8, the set of letters [a,b,A,B] is sorted (with the command sort) as:
$ LC_COLLATE=en_US.UTF-8 sort <<< $'a\nb\nA\nB'
a
A
b
B
However, when using LC_COLLATE=C with the same set of letters as before:
$ LC_COLLATE=C sort <<< $'a\nb\nA\nB'
A
B
a
b
For the remainder of the article, we’ll assume that the sorting settings are LC_COLLATE=C, which ensures that the sorting is byte-wise. We may need to set the environment variable if we want the sorting to be persistent in our system.
3. Solutions
We’ll discuss three different solutions that can be followed to achieve a certain file or folder ordering.
3.1. Naive Approach
If we want to have a folder at the end of a list (with not so many elements), one interesting option is to simply change the name of the folder to a synonym that will also identify its content, but whose name starts with a letter later in the alphabet.
For example, let’s say we have, in one direction, a certain folder to store Archives. With the ls command, we can see the contents:
$ ls -1
Archives
Downloads
Music
Photos
We may want to quickly check that the folder is there while moving throughout the directories. Thus, if we rename it (with the mv command) to Records, we’ll see it right before the shell prompt waiting for more input:
$ mv Archives Records
$ ls -1
Downloads
Music
Photos
Records
$
This way, we can reorganize the content within a folder without having to use strange naming conventions or non-conventional characters.
3.2. Straightforward Approach
If we have a really crowded directory with many folders and files, the previous approach won’t work. Thus, another solution is to change the initial (or initials) of the folder name so that it appears either at the beginning or at the end of the directory.
In Linux, there are just two reserved bytes that cannot be used for naming folders and directories. They are 0 (the ASCII value for null) and 0x2f (which is the ASCII equivalent for the forward slash). However, we should keep in mind that if we want the filesystem to also be accessible from other operating systems, such as Windows, there are some extra reserved values such as the backward slash, the asterisk (star), or the question mark.
Uppercase letters are sorted before lowercase, so the last letter that will be listed is the z. Thus, if we prepend a given number of z before the actual name of the folder, we can simulate a custom directory sort:
$ ls -1
Folder
ZZFolder
ZzFolder
zZFolder
zzFolder
Another option is to take not the last letter, but the last symbol from the ASCII available symbols, which is ~ (tilde). In fact, the following list shows all the characters sorted (according to the collation LC_COLLATE=C):
!"#$%&'()*+,-.h/0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
Thus, we can have folders after the zzFolder in the previous example if we prepend the name with a character that is sorted later than z:
$ ls -1
Folder
ZZFolder
ZzFolder
zZFolder
zzFolder
'~Folder'
'~ZFolder'
'~zFolder'
3.3. Special Characters
There are other characters apart from the ones that are listed in the previous section, but they are outside of the ASCII set. They belong to the Unicode set, which is sorted after the ASCII set. Some of the most used characters for their discreet appearance are:
- · is the middle dot (U+00B7)
- ÷ is the division sign (U+00F7)
- Ɩ is the capital iota letter (U+0196)
They’re not available on all keyboards and may be complicated to retrieve. And, although it’s common to use them on graphical user interfaces, they can cause trouble when working with command-line interfaces. However, when used, the previous directory looks like:
$ ls -1
Folder
ZZFolder
ZzFolder
zZFolder
zzFolder
'~Folder'
'~ZFolder'
'~zFolder'
·Folder
÷Folder
ƖFolder
To see these symbols, the terminal should be capable of handling Unicode. These special characters may not be completely safe to use, as they return inconsistent results. For example, they tend to break scripts that cannot handle Unicode characters.
Finally, we also have the private-use characters. However, we really discourage employing them because they present more disadvantages than advantages, so we won’t discuss them here.
4. Conclusion
In this article, we’ve presented first that there are different ways to alphabetically sort. Then, we discussed three solutions commonly taken to put a given folder or file on the top or bottom of a directory listing. With any solution, we should keep in mind that uppercase letters come before lowercase ones, and that ‘!’ is the first symbol while ‘~’ is the last one.