1. Overview

The Linux ls command lists the contents of a directory, such as files, subdirectories, and hidden items. It also provides several options for customizing its output, including sorting, filtering, formatting, and displaying additional information.

In this tutorial, we’ll see how to reliably and portably convert ls output to a JSON array using Bash scripts that are compatible with any recent Linux distribution.

To test the robustness of our code, we’ll use exceptionally complex file and directory names. Let’s keep in mind that Linux file systems allow all printable and non-printable UTF-8 characters except the solidus slash “*/” and NUL “/0*” characters.

2. Our Edge Case

In our daily use of Linux graphical environments, it’s unlikely that we save our files with bizarre names that contain newlines, tabs, or escape characters, but it’s still possible. For example, the file managers of Cinnamon, KDE, and GNOME allow us to copy any text from a text editor and paste it into a file name.

In addition, to create a new file in the terminal, we can insert any sequence of characters in the command touch $’…’, except “*/” and “\0*“:

$ touch $'"My file" @³²¹€.tmp'
$ ls *.tmp
'"My file" @³²¹€.tmp'
$ touch $'/.tmp'
touch: cannot touch '/.tmp': Permission denied
$ touch $'\0.tmp'
touch: cannot touch '': No such file or directory

However, we can easily replace the unallowed solidus slash with similar slashes that are nearly identical:

$ touch $'∕.tmp'
$ ls *.tmp
'"My file" @³²¹€.tmp'   ∕.tmp

So, let’s create sample files and directories with challenging names and see how they appear in the terminals and graphical file managers of some popular distributions.

2.1. testfiles.sh

Let’s save the following script as testfiles.sh:

#!/bin/bash

cd "$1"
mkdir $'\'dir  \\\\1∕∕\'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:'
mkdir $'\'dir  \\\\2∕∕\'\f\t\r\n`αβγδεગઍሂਤ`\n1\\'
touch $'\'file \\\\3∕∕\'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§....'
touch $'\'file \\\\4∕∕\'\n\n   ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ'
touch $'\'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר؁؂؃٥.txt\n'

# Use ls with the -b option to escape problematic characters and the --quote-name option to quote file names
ls -l -b --quote-name

Then, let’s run it:

$ mkdir test
$ ./testfiles.sh ./test
total 8
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 "'dir  \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:"
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 "'dir  \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\"
-rw-rw-r-- 1 francesco francesco    0 Jul 21 14:47 "'file \\\\3∕∕'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§...."
-rw-rw-r-- 1 francesco francesco    0 Jul 21 14:47 "'file \\\\4∕∕'\n\n   ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ"
-rw-rw-r-- 1 francesco francesco    0 Jul 21 14:47 "'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר؁؂؃٥.txt\n"

Before going further, let’s check in the following screenshots what the expected result is.

2.2. The Expected Result

In our test files, we used a mixture of LTR, RTL, and escape characters which are handled and displayed differently by terminals and graphical environments.

What we’re primarily interested in is the terminal output of the ls2json.sh and lsl2json.sh scripts, which we’ll see in more detail later.

The purpose of ls2json.sh is to convert the output of ls with no additional options into a JSON array, which will therefore be a list of file and directory names. This way, it’s not possible to distinguish which names refer to files and which refer to directories. lsl2json.sh instead converts the output of ls -l, which also includes additional information for each name.

The following screenshots aren’t just for understanding what we’re trying to accomplish. They also serve to verify that our own Linux distribution fully supports the UTF-8 standard so that all characters in testfiles.sh are displayed correctly.

Here’s a screenshot taken on Fedora 38:

ls2json Fedora 38 test filesIn this case, the terminal output is as expected.

3. ls to JSON

A JSON array is an ordered collection of values, separated by commas and enclosed in square brackets. A JSON object is an unordered collection of key/value pairs separated by commas and enclosed in curly brackets.

We’ve already seen in previous screenshots that our goal is to create the ls2json.sh and lsl2json.sh scripts. The former produces a JSON array with a simple list of names, while the latter produces an array of JSON objects, each of which specifies both the file or directory name and its attributes.

3.1. Simple Listing of Names

Here’s our ls2json.sh script:

#!/bin/bash
cd "$1"
printf '['; ls -b --quote-name | sed '$!s/$/,/'; printf ']\n'

Let’s take a closer look:

  • cd “$1” → changes the current directory to the one given as the first argument to the script
  • printf ‘[‘ → prints an opening square bracket without a newline
  • ls -b –quote-name → lists subdirectories and files in the current directory, using the -b option to escape non-printable characters and the –quote-name option to enclose each file name in double quotes
  • | sed ‘$!s/$/,/’ → for every line except the last one ($!), replace the end of the line ($) with a comma (,)
  • printf ‘]\n’ → prints a closing square bracket and a new line

Let’s note that using a pipe changes the behavior of ls, which prints each name on a single line, by enabling the -1 option by default. Also, the –quote-name option changes the behavior of the -b option by forcing the escaping of double quotes except for those enclosing the file name.

Let’s test it with our edge case:

$ ./ls2json.sh ./test
["'dir  \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:",
"'dir  \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\",
"'file \\\\3∕∕'\n \"abc\"\n<1>\\∕!\"£$%&()=?^[]{}@#§....",
"'file \\\\4∕∕'\n\n   ABC\n|2|\\!\"£$%&+*** °? ?(あ) ਖ਼ਗ਼ਜ਼ੑੌ ਥੂੇ",
"'file \\\\5∕∕😜 🤪 🤨.\n 🧐 🤓 😎.צקר؁؂؃٥.txt\n"
]

The result is as expected.

3.2. Complete Listing With Directory and File Properties

lsl2json.sh is much more complex, as it must convert the output of ls -l, which is a series of lines of the following form:

$ ls -l
total 12
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 ''\''dir  \\1∕∕'\'''$'\n\r'' àèì'$'\n''1\!"£$%&ç嗨 страно(私).;,-:'
drwxrwxr-x 2 francesco francesco 4096 Jul 21 14:47 ''\''dir  \\2∕∕'\'''$'\f\t\r\n''`αβγδεગઍሂਤ`'$'\n''1\'
[...]

By using comments, we made lsl2json.sh as self-explanatory as possible:

#!/bin/bash

cd "$1"

# A function to convert a single line of ls -l output to a JSON object
convert_line() {
  # Split the line by whitespace into an array
  read -ra fields <<< "$1"
  # Check if the array has at least 9 elements
  if [ "${#fields[@]}" -ge 9 ]; then
    # Extract the relevant fields
    permissions="${fields[0]}"
    size="${fields[4]}"
    owner="${fields[2]}"
    group="${fields[3]}"
    modification_time="${fields[5]} ${fields[6]} ${fields[7]}"
    # Check if the first character of permissions is d, indicating a directory
    if [ "${permissions:0:1}" = "d" ]; then
      is_directory="true"
    else
      is_directory="false"
    fi
    # Find the index of the first double quote in $1, corresponding to the beginning of the filename
    first_quote_index=$(expr index "$1" '"')
    # Find the length of the substring from the first double quote to the end of $1
    substring_length=$((${#1} - $first_quote_index + 1))
    # Extracts the substring corresponding to the name
    name=${1:$first_quote_index - 1:$substring_length}
    # Convert the modification time to ISO 8601 format using date command
    modification_time="$(date -d "$modification_time" --iso-8601=seconds)"
    # Print the JSON object with the fields
    printf '{\n'
    printf '  "name": %s,\n' "$name"
    printf '  "is_directory": %s,\n' "$is_directory"
    printf '  "size": %s,\n' "$size"
    printf '  "permissions": "%s",\n' "$permissions"
    printf '  "owner": "%s",\n' "$owner"
    printf '  "group": "%s",\n' "$group"
    printf '  "modification_time": "%s"\n' "$modification_time"
    printf '}'
  fi
}

# A function to convert the output of ls -l to a JSON array
convert_ls() {
  # Print the opening bracket of the array
  printf '[\n'
  # Read each line of input and convert it to a JSON object
  while IFS= read -r line; do
    # Skip the first line that shows the total number of blocks
    if [ -z "$first_line" ]; then
      first_line="skipped"
      continue
    fi
    # Convert the line and store it in a variable
    json_object="$(convert_line "$line")"
    # Check if the variable is not empty
    if [ -n "$json_object" ]; then
      # Print a comma after the previous object, unless it is the first one
      if [ -n "$previous_object" ]; then
        printf ',\n'
      fi
      # Print the object and remember it as the previous one
      printf '%s' "$json_object"
      previous_object="$json_object"
    fi
  done < /dev/stdin
  # Print the closing bracket of the array and a newline
  printf '\n]\n'
}

# Run the convert_ls function with the output of ls -l as input
ls -l -b --quote-name | convert_ls

Here’s the JSON it produces. For brevity, let’s only report the beginning:

$ ./lsl2json.sh ./test
[
{
  "name": "'dir  \\\\1∕∕'\n\r àèì\n1\\!\"£$%&ç嗨 страно(私).;,-:",
  "is_directory": true,
  "size": 4096,
  "permissions": "drwxrwxr-x",
  "owner": "francesco",
  "group": "francesco",
  "modification_time": "2023-07-21T14:47:00+02:00"
},
{
  "name": "'dir  \\\\2∕∕'\f\t\r\n`αβγδεગઍሂਤ`\n1\\",
[...]

This allows us to have an accurate JSON representation of a folder in a Linux file system. To represent the timestamp, we chose the ISO 8601 standard because it’s universally understandable, unlike the format used by ls, which depends on the current locale.

4. Conclusion

In this article, we looked at the complexity of converting ls output to a JSON array using three Bash scripts:

  • testfiles.sh → creates directories and test files
  • ls2json.sh → converts the output of ls to JSON
  • lsl2json.sh → converts the output of ls -l to JSON

We tested these scripts on various Linux distributions to determine their portability. We also tested their robustness by using exceptionally complex file names.