loading...

Nested JSON Values to CSV with JQ

dannylee8 profile image Danny Lee ・9 min read

source: https://www.pexels.com/photo/man-standing-on-mountain-against-sky-314703/

If you’ve been following along my journey for the past three weeks I’ve been slowly working through how to use jq to convert a Shopify API order JSON file and convert values inside it to CSV. We’re at the last leg of the journey here as we move into the nested arrays, objects and the regular key:value pairs inside them.

Previous Articles in this Series

Just to review the first post was how I got into this predicament, and then, in my second post I wrote about JSON and [jq](https://medium.com/@dannylee8/using-jq-to-convert-json-to-csv-f3bbd7dd85cd) basic commands, then in the last post I covered pulling out top-level values and exporting them to CSV format.

As always let me say *jq* has a tremendous amount of depth as a tool to manipulate JSON. I highly recommend looking through the official docs.


The Sample Data

We’re still using the same sample orders (gist|raw format) from the Shopify API. This is the set of multiple orders I generated (4 identical order objects), you can also work with the single sample order (gist|raw format). They both have nested data we’ll be using in this article.

Once again, I recommend saving this to a local file so you can play with it in the terminal, as well as pasting it into Chris Nielson’s JSON Visualization page to output a more readable, block divided format to follow along.

If you’d like to see a “live update” of the data as you enter jq commands, I recommend the jq sandbox apps jqterm or jq play. They both allow you to paste JSON data and enter jq commands and see the results update immediately.


Sorting our JSON keys

A more advanced concept, which you can find in the jq manual in the Advanced Features section is creating functions. Sorting our JSON keys isn’t necessary to output our nested values into CSV format, but its an interesting tangent. We can also look briefly at how functions can be defined and called.

Let’s take a look at this jq command:

$ cat sampleOrders.json | jq 'def sortKeys: to _entries | sort | from_entries; .orders | .[0] | sortKeys'

See it here! 👀>> a sample of this at jqterm by following this link.

The section we’re going to look at is right after the single quote ( ‘ ) up to the semi-colon ( ; ). This is what a jq function declaration looks like. We define the name sortKeys and then what it does to_entries (converts and object to an array of key:value pairs), sort, which sorts strings by alphabetical value, and then from_entries, which converts the output back to an object.

The commands after the semi-colon processes the orders key and then pulls out the first element of the array which is the value. This is a single order. Then, it processes it with the sortKeys function and we get back a single order object with all its keys sorted.

Lovely! 💐


What Doesn’t Work

Next, let’s try something that will throw an error so we can see what we’re dealing with when it comes to nested objects and arrays in JSON. I was hoping that by running the following command that I could capture the .billing_address array and the .tax_lines object and mash them together into a CSV.

billing_address looks like this:

{
  "first_name": "Bob",
  "address1": "Chestnut Street 92",
  "phone": "555-625-1199",
  "city": "Louisville",
  "zip": "40202",
  "province": "Kentucky",
  "country": "United States",
  "last_name": "Norman",
  "address2": "",
  "company": null,
  "latitude": 45.41634,
  "longitude": -75.6868,
  "name": "Bob Norman",
  "country_code": "US",
  "province_code": "KY"
}

tax_lines looks like this:

[
  {
    "price": "11.94",
    "rate": 0.06,
    "title": "State Tax",
    "price_set": {
      "shop_money": {
        "amount": "11.94",
        "currency_code": "USD"
      },
      "presentment_money": {
        "amount": "11.94",
        "currency_code": "USD"
      }
    }
  }
]

And we run the command:

$ cat sampleOrders.json | jq '.orders | .[0] | .billing_address, .tax_lines | to_entries | map([.value]|join(“,”))'jq: error (at <stdin>:3791): string ("") and object ({"price":"1...) cannot be added
[
  "Bob",
  "Chestnut Street 92",
  "555-625-1199",
  "Louisville",
  "40202",
  "Kentucky",
  "United States",
  "Norman",
  "",
  "",
  "45.41634",
  "-75.6868",
  "Bob Norman",
  "US",
  "KY"
]
exit status 5

The error that jq is throwing here is one that comes up somewhat often for me. It’s saying that I can’t add together two different types, string and object.

Taking this back one step, right before we try to join them, we see jq is okay with processing them through to_entries:

$ cat sampleOrders.json | jq '.orders | .[0] | .billing_address, .tax_lines | to_entries'[   // This is the output for .billing_address value
  {
    "key": "first_name",
    "value": "Bob"
  },
  {
    "key": "address1",
    "value": "Chestnut Street 92"
  },
  {
    "key": "phone",
    "value": "555-625-1199"
  },
  {
    "key": "city",
    "value": "Louisville"
  },
  {
    "key": "zip",
    "value": "40202"
  },
  {
    "key": "province",
    "value": "Kentucky"
  },
  {
    "key": "country",
    "value": "United States"
  },
  {
    "key": "last_name",
    "value": "Norman"
  },
  {
    "key": "address2",
    "value": ""
  },
  {
    "key": "company",
    "value": null
  },
  {
    "key": "latitude",
    "value": 45.41634
  },
  {
    "key": "longitude",
    "value": -75.6868
  },
  {
    "key": "name",
    "value": "Bob Norman"
  },
  {
    "key": "country_code",
    "value": "US"
  },
  {
    "key": "province_code",
    "value": "KY"
  }
]
[   // This is the output for .tax_lines value
  {
    "key": 0,
    "value": {
      "price": "11.94",
      "rate": 0.06,
      "title": "State Tax",
      "price_set": {
        "shop_money": {
          "amount": "11.94",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "11.94",
          "currency_code": "USD"
        }
      }
    }
  }
]

We have to somehow get all the values we want to output into the same structure before we can add them together or bundle them in an array to be passed to the CSV parser.


Flattening the entire JSON object

One solution I came across on Stack overflow isn’t perfect, but certainly strong-arms the data into a format we could export. The filter works by flattening a single element in the array of objects to one level:

$ jq '.orders[0] | [paths(scalars) as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries'

The output is an object with a single level of all the values. Play with it here! 👀


Objects and arrays are flattened and use a dot notation (example: key.value.subvalue) to retain the nest level information. For example, our .billing_address and .tax_lines data now looks like this:

// single level, object"billing_address.first_name": "Bob",
  "billing_address.address1": "Chestnut Street 92",
  "billing_address.phone": "555-625-1199",
  "billing_address.city": "Louisville",
  "billing_address.zip": "40202",
  "billing_address.province": "Kentucky",
  "billing_address.country": "United States",
  "billing_address.last_name": "Norman",
  "billing_address.address2": "",
  "billing_address.latitude": 45.41634,
  "billing_address.longitude": -75.6868,
  "billing_address.name": "Bob Norman",
  "billing_address.country_code": "US",
  "billing_address.province_code": "KY",// an array with multiple nested levels"tax_lines.0.price": "11.94",
  "tax_lines.0.rate": 0.06,
  "tax_lines.0.title": "State Tax",
  "tax_lines.0.price_set.shop_money.amount": "11.94",
  "tax_lines.0.price_set.shop_money.currency_code": "USD",
  "tax_lines.0.price_set.presentment_money.amount": "11.94",
  "tax_lines.0.price_set.presentment_money.currency_code": "USD",

Elements of each array are indicated by a number, such as 0 in the above tax_lines data. Or like this for the array .line_items which is the list of purchased goods in Shopify order:

// line_item 1"line_items.1.id": 518995019,
  "line_items.1.variant_id": 49148385,// line_item 2"line_items.2.id": 518995019,
  "line_items.2.variant_id": 49148385,

Let’s look at that filter again:

$ cat sampleOrders.json | jq '.orders | [paths(scalars) as $path | {“key”: $path | join(“.”), “value”: getpath($path)}] | from_entries'

As described by Jeff Mercado: “The way this works is: *leaf_paths* returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with *key* and *value* properties, where *key* contains the elements of the path array as a string joined by dots and *value* contains the element at that path. Finally, we put the entire thing in an array and run *from_entries* on it, which transforms an array of *{key, value}* objects into an object containing those key-value pairs.

note: leaf_paths has been deprecated and is now called paths(scalar) , which is the syntax I used in the command above.


Outputing CSV

The flattened object can now output as a CSV with:

$ cat sampleOrders.json | jq '.orders | [paths(scalars) as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries | to_entries  | map([.key, .value] | join(",")) | join("\n")'

or alternatively:

$ cat sampleOrders.json | jq '.orders[0] | [paths(scalars) as $path |{"key":$path|join("."),"value": getpath($path)}] | from_entries | to_entries[] | [.key,.value] | [@csv](http://twitter.com/csv) '

or for a single order:

$ cat sampleOrders.json | jq '.orders[0] | [paths(scalars) as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries| to_entries  | map([.key, .value] | join(",")) | join("\n")'

If we wanted to just pull out just values from the customer object we can use the select function with the startswith method:

$ cat sampleOrders.json | jq '.orders[0\] | [paths(scalars) as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries | keys_unsorted[] | select(startswith("customer"))'customer.id
customer.email
customer.created_at
customer.updated_at
customer.first_name
customer.last_name
customer.orders_count
customer.state
customer.total_spent
customer.last_order_id
customer.verified_email
customer.tags
customer.last_order_name
customer.currency
customer.accepts_marketing_updated_at
customer.admin_graphql_api_id
customer.default_address.id
customer.default_address.customer_id
customer.default_address.address1
customer.default_address.address2
customer.default_address.city
customer.default_address.province
customer.default_address.country
customer.default_address.zip
customer.default_address.phone
customer.default_address.name
customer.default_address.province_code
customer.default_address.country_code
customer.default_address.country_name
customer.default_address.default

Getting Specific Objects and Arrays

To select a value that is made up of an array of objects in a single order we can use:

$ cat sampleOrders.json | jq '.orders[0] | .note_attributes | (map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] |[@csv](http://twitter.com/csv)'

Along with a “-r or — raw-output” option we will receive this output:

1 "name","value"
2 "custom engraving","Happy Birthday"
3 "colour","green"

This is in CSV format and includes the keys (line 1) as column headings, and then lines 2, 3 which are the two objects that are inside the note_attributes array.

"note_attributes": [
        {
          "name": "custom engraving",
          "value": "Happy Birthday"
        },
        {
          "name": "colour",
          "value": "green"
        }
      ],

I found the solution to this on this stack overflow page, and if we break down whats going on into separate lines it becomes a little easier to understand:

// keys pulls out the keys from the input data
// [add](https://stedolan.github.io/jq/manual/#add) joins an array and [unique](https://stedolan.github.io/jq/manual/#unique,unique_by(path_exp)) removes duplicates
// and this is surrounded by (), the [grouping operator](https://stedolan.github.io/jq/manual/#Parenthesis)
// after which its assigned to the variable $cols(map(keys) | add | unique) as $cols |// map . (the [Identity operator](https://stedolan.github.io/jq/manual/#Basicfilters)) which is the note_attributes array
// as the variable $row.  Then, data stored in the variable
// $col is piped (|) to the next map function.  The identity
// operator (.) is now $cols which is an array made up of the
// unique, keys from the step above.  When we map over $row[.]
// we are finding the element at the key .
// We set this array of ordered (by the header column) values
// to $rows.map(. as $row | $cols | map($row[.])) as $rows |// And output our header row (array), and $rows (which is an array
// of arrays) to @csv to get our CSV data.$cols, $rows[] | [@csv](http://twitter.com/csv)'

Whew. That was a lot and I still have not discovered a way to easily export mixed values, other than flattening the entire JSON object. I really don’t feel that there’s much value (for me) to pursue it at this point. I feel this is one of those cases where there is a better tool for this job, or perhaps, one needs to be built 🤔. But, I’m not one to try to drive a screw with a crescent wrench…anymore.

Use the right tool for the job.

I hope you aren’t too disappointed and I do really hope you enjoyed this exploration of jq . I always welcome comments, suggestions and useful links 🙌.

Thanks for reading! And have a great week!

Discussion

pic
Editor guide