1. Introduction
The JavaScript Object Notation (JSON) format has been around as an IETF Information Standard since 2006. It’s widely used for the storage of structured data in the field of data analytics.
In this tutorial, we’ll explore various ways to convert a JSON dataset to a table using jq. First, we’ll download a public JSON dataset. Subsequently, we’ll use options of the jq command to transform that dataset into tables with different formats.
We’ll use a Ubuntu 22.04 environment for running the examples.
2. Sample JSON File
First, we’ll download a sample JSON file:
$ curl 'http://api.worldbank.org/v2/countries/IND/indicators/SP.POP.TOTL?format=json' -o ind.json
$ cat ind.json | jq -r '.[1][1]'
{
"indicator": {
"id": "SP.POP.TOTL",
"value": "Population, total"
},
"country": {
"id": "IN",
"value": "India"
},
"countryiso3code": "IND",
"date": "2021",
"value": 1407563842,
"unit": "",
"obs_status": "",
"decimal": 0
}
As illustrated above, we use the curl command to download a public JSON dataset. The dataset consists of the population of India since 1960. The curl command has one argument and one option:
- http://api.worldbank.org/v2/countries/IND/indicators/SP.POP.TOTL?format=json is the URL to download from
- -o ind.json specifies the output file path
We inspect the downloaded file using the cat and jq commands:
- -r writes the output as a raw string without escaping any quote characters
- ‘.[1][1]’ is a filter to read the value at index [1,1] of the two-dimensional array within the object
We’ll use this sample dataset for exploring the various options of the jq command.
3. CSV Data
The jq syntax supports extraction of the data and provides options for transforming JSON to a CSV file:
$ cat ind.json |
jq '.[1][] | [.country.id, .date, .value] | @csv' |
head
"IN","2022",1417173173
"IN","2021",1407563842
"IN","2020",1396387127
"IN","2019",1383112050
"IN","2018",1369003306
"IN","2017",1354195680
"IN","2016",1338636340
"IN","2015",1322866505
"IN","2014",1307246509
"IN","2013",1291132063
As we can see above, the command performs several actions and uses head to only show the first ten rows of the output:
- .[1][] iterates over all the elements of the second array in the dataset
- [.country.id, .date,.value] extracts individual fields and creates an array element
- we pipe the output to a @csv directive within the jq script to generate a CSV output
In summary, we created a CSV file from a list of JSON objects.
4. Custom Table Header
We’ll explore adding a header row to our table. This can be helpful to better understand the dataset.
4.1. CSV With Custom Header
First, we’ll add a header while converting to a CSV file:
$ cat ind.json |
jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @csv' |
head
"country","year","population"
"IN","2022",1417173173
"IN","2021",1407563842
"IN","2020",1396387127
"IN","2019",1383112050
"IN","2018",1369003306
"IN","2017",1354195680
"IN","2016",1338636340
"IN","2015",1322866505
"IN","2014",1307246509
"IN","2013",1291132063
Here, the jq command prepends a custom header [“country”,”year”,”population”] to the output.
4.2. Tab Separated Table With Custom Header
Next, we’ll use the tab character as a separator of the fields in the generated table:
$ cat ind.json |
jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @tsv' |
head
country year population
IN 2022 1417173173
IN 2021 1407563842
IN 2020 1396387127
IN 2019 1383112050
IN 2018 1369003306
IN 2017 1354195680
IN 2016 1338636340
IN 2015 1322866505
IN 2014 1307246509
The jq command uses the @tsv instead of the @csv directive to separate the fields with a tab character.
Finally, let’s add dashes under the header row based on the length of each header field:
$ cat ind.json |
jq -r '(["country","year","population"] | (., map(length*"-"))), (.[1][] | [.country.id, .date,.value]) | @tsv' |
head
country year population
------- ---- ----------
IN 2022 1417173173
IN 2021 1407563842
IN 2020 1396387127
IN 2019 1383112050
IN 2018 1369003306
IN 2017 1354195680
IN 2016 1338636340
IN 2015 1322866505
In summary, the directive (., map(length*”-“) iterates over the current set of lines and generates a row of dashes below the header. Another key point is that map and the .[] are equivalent in this case**.**
5. Table With Pipe Separators
Likewise, we can use the column command to separate fields using the pipe character:
$ cat ind.json |
jq -r '["country","year","population"], (.[1][] | [.country.id, .date,.value]) | @tsv' |
column -t -o '|' |
head
country|year|population
IN |2022|1417173173
IN |2021|1407563842
IN |2020|1396387127
IN |2019|1383112050
IN |2018|1369003306
IN |2017|1354195680
IN |2016|1338636340
IN |2015|1322866505
IN |2014|1307246509
As shown above, the column command has a couple of options:
- -t specifies the tab character as the field separator for input fields
- -o specifies pipe character as the field separator for output fields
In summary, combining the jq and column commands provides a powerful way of formatting tables.
6. Converting Command Output to Tables
At this point, we’ll use the output of the df command to build a table from the disk usage information:
$ df -P |
jq -Rs 'split("\n")[1:-1] | map(split(" +";"")) | map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]})'
[
{
"FileSystem": "tmpfs",
"Used": "1091488",
"Available": "1856",
"Capacity": "1089632"
},
{
"FileSystem": "/dev/sda3",
"Used": "97977620",
"Available": "26216284",
"Capacity": "66738208"
}
]
In this case, the jq command transforms the output of the df command to create an array of JSON objects, in several steps:
- -Rs reads the input as one raw string instead of as a JSON
- split(“\n”) breaks the input into an array of strings, using newline as a separator
- [1:-1] drops the first and last line
- map(split(” +”;””)) operates on each line and breaks it in separate parts
- map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]}, creates a JSON object using custom keys
Next, we’ll convert the above JSON to a table:
$ df -P |
jq -Rs 'split("\n")[1:-1] | map(split(" +";"")) | map({FileSystem: .[0], Used:.[1], Available : .[2], Capacity: .[3]})' > disk.json
$ cat disk.json | jq -r '["FileSystem","Used","Capacity"], (.[] | [.FileSystem,.Used,.Available]) | @tsv' |
column -t -o '|'
FileSystem|Used |Capacity
tmpfs |1091488 |1864
/dev/sda3 |97977620 |26216476
tmpfs |5457432 |0
tmpfs |5120 |4
/dev/sda2 |524252 |6216
Scratch |497775612|131487316
tmpfs |1091484 |108
/dev/sr0 |51806 |51806
Similar to our earlier example with pipe separators, we create a jq pipeline with several steps:
- we generate an intermediate disk.json file to store the output of the df command in JSON format
- pipe the disk.json file to a jq filter
- ‘[“FileSystem”,”Used”,”Capacity”], (.[] | [.FileSystem,.Used,.Available]) | @tsv’ generates a tab-separated table
- column transforms the table using the pipe character as the field separator
Thus, we transformed the line-oriented output of df command into a sequence of JSON objects. Subsequently, we converted the list of JSON objects into a table.
7. Conclusion
In this article, we learned a few ways to transform a sequence of JSON objects into a table format using jq.
Firstly, we used the sample dataset to create a CSV file from the separate JSON objects. Secondly, we enhanced the table with a header, underlined with dashes. Lastly, we saw an advanced example of converting the Linux df command output into a table.
In summary, jq is a powerful text processing tool to reshape JSON objects to meet our use cases.