1. Overview
The Linux ls command lists the contents of a directory, such as files, subdirectories, and hidden items. It also provides several options for customizing its output, including sorting, filtering, formatting, and displaying additional information.
In this tutorial, we’ll see how to reliably and portably convert ls output to a JSON array using Bash scripts that are compatible with any recent Linux distribution.
To test the robustness of our code, we’ll use exceptionally complex file and directory names. Let’s keep in mind that Linux file systems allow all printable and non-printable UTF-8 characters except the solidus slash “*/” and NUL “/0*” characters.
2. Our Edge Case
In our daily use of Linux graphical environments, it’s unlikely that we save our files with bizarre names that contain newlines, tabs, or escape characters, but it’s still possible. For example, the file managers of Cinnamon, KDE, and GNOME allow us to copy any text from a text editor and paste it into a file name.
In addition, to create a new file in the terminal, we can insert any sequence of characters in the command touch $’…’, except “*/” and “\0*“:
$ touch $'"My file" @³²¹€.tmp'
$ ls *.tmp
'"My file" @³²¹€.tmp'
$ touch $'/.tmp'
touch: cannot touch '/.tmp': Permission denied
$ touch $'\0.tmp'
touch: cannot touch '': No such file or directory
However, we can easily replace the unallowed solidus slash with similar slashes that are nearly identical:
$ touch $'∕.tmp'
$ ls *.tmp
'"My file" @³²¹€.tmp' ∕.tmp
So, let’s create sample files and directories with challenging names and see how they appear in the terminals and graphical file managers of some popular distributions.
2.1. testfiles.sh
Let’s save the following script as testfiles.sh:
#!/bin/bash
cd "$1"
mkdir $'\'dir \\\\1∕∕\'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:'
mkdir $'\'dir \\\\2∕∕\'\f\t\r\n`αβγδεગઍሂਤ`\n1\\'
touch $'\'file \\\\3∕∕\'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§....'
touch $'\'file \\\\4∕∕\'\n\n ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ'
touch $'\'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר٥.txt\n'
# Use ls with the -b option to escape problematic characters and the --quote-name option to quote file names
ls -l -b --quote-name
Then, let’s run it:
$ mkdir test
$ ./testfiles.sh ./test
total 8
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 "'dir \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:"
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 "'dir \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\"
-rw-rw-r-- 1 francesco francesco 0 Jul 21 14:47 "'file \\\\3∕∕'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§...."
-rw-rw-r-- 1 francesco francesco 0 Jul 21 14:47 "'file \\\\4∕∕'\n\n ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ"
-rw-rw-r-- 1 francesco francesco 0 Jul 21 14:47 "'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר٥.txt\n"
Before going further, let’s check in the following screenshots what the expected result is.
2.2. The Expected Result
In our test files, we used a mixture of LTR, RTL, and escape characters which are handled and displayed differently by terminals and graphical environments.
What we’re primarily interested in is the terminal output of the ls2json.sh and lsl2json.sh scripts, which we’ll see in more detail later.
The purpose of ls2json.sh is to convert the output of ls with no additional options into a JSON array, which will therefore be a list of file and directory names. This way, it’s not possible to distinguish which names refer to files and which refer to directories. lsl2json.sh instead converts the output of ls -l, which also includes additional information for each name.
The following screenshots aren’t just for understanding what we’re trying to accomplish. They also serve to verify that our own Linux distribution fully supports the UTF-8 standard so that all characters in testfiles.sh are displayed correctly.
Here’s a screenshot taken on Fedora 38:
In this case, the terminal output is as expected.
3. ls to JSON
A JSON array is an ordered collection of values, separated by commas and enclosed in square brackets. A JSON object is an unordered collection of key/value pairs separated by commas and enclosed in curly brackets.
We’ve already seen in previous screenshots that our goal is to create the ls2json.sh and lsl2json.sh scripts. The former produces a JSON array with a simple list of names, while the latter produces an array of JSON objects, each of which specifies both the file or directory name and its attributes.
3.1. Simple Listing of Names
Here’s our ls2json.sh script:
#!/bin/bash
cd "$1"
printf '['; ls -b --quote-name | sed '$!s/$/,/'; printf ']\n'
Let’s take a closer look:
- cd “$1” → changes the current directory to the one given as the first argument to the script
- printf ‘[‘ → prints an opening square bracket without a newline
- ls -b –quote-name → lists subdirectories and files in the current directory, using the -b option to escape non-printable characters and the –quote-name option to enclose each file name in double quotes
- | sed ‘$!s/$/,/’ → for every line except the last one ($!), replace the end of the line ($) with a comma (,)
- printf ‘]\n’ → prints a closing square bracket and a new line
Let’s note that using a pipe changes the behavior of ls, which prints each name on a single line, by enabling the -1 option by default. Also, the –quote-name option changes the behavior of the -b option by forcing the escaping of double quotes except for those enclosing the file name.
Let’s test it with our edge case:
$ ./ls2json.sh ./test
["'dir \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:",
"'dir \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\",
"'file \\\\3∕∕'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§....",
"'file \\\\4∕∕'\n\n ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ",
"'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר٥.txt\n"
]
The result is as expected.
3.2. Complete Listing With Directory and File Properties
lsl2json.sh is much more complex, as it must convert the output of ls -l, which is a series of lines of the following form:
$ ls -l
total 12
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 ''\''dir \\1∕∕'\'''$'\n\r'' àèì'$'\n''1\!"£$%&ç嗨 страно(私).;,-:'
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 ''\''dir \\2∕∕'\'''$'\f\t\r\n''`αβγδεગઍሂਤ`'$'\n''1\'
[...]
By using comments, we made lsl2json.sh as self-explanatory as possible:
#!/bin/bash
cd "$1"
# A function to convert a single line of ls -l output to a JSON object
convert_line() {
# Split the line by whitespace into an array
read -ra fields <<< "$1"
# Check if the array has at least 9 elements
if [ "${#fields[@]}" -ge 9 ]; then
# Extract the relevant fields
permissions="${fields[0]}"
size="${fields[4]}"
owner="${fields[2]}"
group="${fields[3]}"
modification_time="${fields[5]} ${fields[6]} ${fields[7]}"
# Check if the first character of permissions is d, indicating a directory
if [ "${permissions:0:1}" = "d" ]; then
is_directory="true"
else
is_directory="false"
fi
# Find the index of the first double quote in $1, corresponding to the beginning of the filename
first_quote_index=$(expr index "$1" '"')
# Find the length of the substring from the first double quote to the end of $1
substring_length=$((${#1} - $first_quote_index + 1))
# Extracts the substring corresponding to the name
name=${1:$first_quote_index - 1:$substring_length}
# Convert the modification time to ISO 8601 format using date command
modification_time="$(date -d "$modification_time" --iso-8601=seconds)"
# Print the JSON object with the fields
printf '{\n'
printf ' "name": %s,\n' "$name"
printf ' "is_directory": %s,\n' "$is_directory"
printf ' "size": %s,\n' "$size"
printf ' "permissions": "%s",\n' "$permissions"
printf ' "owner": "%s",\n' "$owner"
printf ' "group": "%s",\n' "$group"
printf ' "modification_time": "%s"\n' "$modification_time"
printf '}'
fi
}
# A function to convert the output of ls -l to a JSON array
convert_ls() {
# Print the opening bracket of the array
printf '[\n'
# Read each line of input and convert it to a JSON object
while IFS= read -r line; do
# Skip the first line that shows the total number of blocks
if [ -z "$first_line" ]; then
first_line="skipped"
continue
fi
# Convert the line and store it in a variable
json_object="$(convert_line "$line")"
# Check if the variable is not empty
if [ -n "$json_object" ]; then
# Print a comma after the previous object, unless it is the first one
if [ -n "$previous_object" ]; then
printf ',\n'
fi
# Print the object and remember it as the previous one
printf '%s' "$json_object"
previous_object="$json_object"
fi
done < /dev/stdin
# Print the closing bracket of the array and a newline
printf '\n]\n'
}
# Run the convert_ls function with the output of ls -l as input
ls -l -b --quote-name | convert_ls
Here’s the JSON it produces. For brevity, let’s only report the beginning:
$ ./lsl2json.sh ./test
[
{
"name": "'dir \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:",
"is_directory": true,
"size": 4096,
"permissions": "drwxrwxr-x",
"owner": "francesco",
"group": "francesco",
"modification_time": "2023-07-21T14:47:00+02:00"
},
{
"name": "'dir \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\",
[...]
This allows us to have an accurate JSON representation of a folder in a Linux file system. To represent the timestamp, we chose the ISO 8601 standard because it’s universally understandable, unlike the format used by ls, which depends on the current locale.
4. Conclusion
In this article, we looked at the complexity of converting ls output to a JSON array using three Bash scripts:
- testfiles.sh → creates directories and test files
- ls2json.sh → converts the output of ls to JSON
- lsl2json.sh → converts the output of ls -l to JSON
We tested these scripts on various Linux distributions to determine their portability. We also tested their robustness by using exceptionally complex file names.