DEV Community

He3
He3

Posted on

How to Extract Elements from Thousands of Lines of JSON? The answer is... 🧐

JSON, short for Java*Script **Object **N*otation, was discovered and defined by Douglas Crockford in 2001. While a 22-year-old college student is just graduating, JSON at 22 has become one of the standards for Internet data exchange and is well known to developers.

As one of the developers, I've read countless JSON files. JSON can be fat or thin, long or short. A short JSON can be a simple array containing only a Boolean value:

[true]
Enter fullscreen mode Exit fullscreen mode

And a long JSON can be astonishingly tens of thousands of lines:

When reading long JSON files, the most common requirement is to extract the value of a field or traverse the object values of an array, all of which are buried deep in the river of JSON.

For a programming novice, the first thought would be to use programming to get the desired field, suppose we have a JSON structure as follows:

{
  "human": {
    "person": {
      "man": [
        {
          "name": "Jack",
          "age": "17"
        },
        {
          "name": "Mike",
          "age": "32"
        },
        {
          "name": "John",
          "age": "23"
        },
        {
          "name": "David",
          "age": "41"
        },
        {
          "name": "Eric",
          "age": "29"
        },
        {
          "name": "Chris",
          "age": "38"
        },
        {
          "name": "Tom",
          "age": "27"
        },
        {
          "name": "Peter",
          "age": "35"
        },
        {
          "name": "Robert",
          "age": "26"
        },
        {
          "name": "Daniel",
          "age": "33"
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Now I need to get the name of all elements in the man array, so we would definitely write this JS code:

const nameSet = data.human.person.man.map(manItem => manItem.name)
Enter fullscreen mode Exit fullscreen mode

Thanks to the syntactic sugar and APIs of modern JavaScript, the code is concise. But while the code is simple, there are many issues to consider: one is the need to save this code file, two is the need for an environment to run this code, three is the need to print or save the execution result of the code, and four is the need to continuously debug this code and handle exception and edge cases and so on.

The programming novice continues to ponder, there are many scenarios to index values in JSON, clumsy programming seems not to be an efficient method, so he wants to rely on the currently popular ChatGPT:

  • 🧑‍💻 I have the above JSON structure, help me extract the name field of all elements under the man field.

  • 🤖 I'll help you extract the name field of all elements under man, the result is:
    Jack
    Mike,
    John,
    David,
    Eric,
    Chris,
    Tom,
    Peter,
    Robert,
    Daniel

The programming newbie was thrilled to get the result, but soon encountered another problem. When pasting tens of thousands of lines of JSON into the ChatGPT input box, they quickly received an error message:

Even the powerful ChatGPT succumbs in the face of extremely long and complex text. Of course, there are solutions, like inputting in segments, or programmatically calling the ChatGPT API. But if we resort to this, we are back to square one.

So, is there a better way? 🧐

The solution must be JSON Path! Let's take the He3 JSON Path tool as an example. To get the name field from the previous example, we just need to enter $..name in the input box:

JSON Path tool link: https://t.he3app.com?c9yj

Upon first seeing $..name, you might be confused. What kind of syntax is this? How does it work?

Let me introduce JSON Path:

JSON Path is an expression language for locating and extracting specific elements in JSON data. It provides a concise syntax that makes extracting data from complex JSON structures easy.

In simple terms, JSON Path is like Markdown. It's a lightweight syntax that can extract fragments from a JSON structure.

As a language, JSON Path requires a certain amount of learning to grasp its syntax and keywords. But rest assured, the learning curve for JSON Path is not steep. Once mastered, you can extract specific key values, traverse arrays, filter elements based on conditions, perform slicing, and more. Handling JSON data becomes a breeze.

Let's first introduce some common JSON Path syntax:

  • $: Root node, representing the outermost layer of JSON data.
  • .: Child node operator, used to access properties in an object.
  • []: Index operator, used to access elements in an array or filter elements by conditions.
  • *: Wildcard, used to match any property name or array index.
  • ..: Recursive descent operator, used to search all levels in a nested structure.
  • @: Current node, can be used to reference the current node in a filtering condition.

Assume we add a "god": "God" key-value pair in the JSON example mentioned earlier, and we want to use JSON Path to get the god field, we can use $.god:

For $..name, we know it means "return the values of the name key at all levels in the JSON data". If we want to get all age fields, we can change it to $..age:

If we want to get the first element of the man array, we can input $.human.person.man[0]:

Of course, using the recursive descent operator is even more convenient $..man[0]:

The index reference operator [] has even more advanced operations.

For instance, to get the first, second, and third elements, you can input $..man[0,1,2]:

For getting 1, 2, 3, such consecutive array elements, JSON Path provides a more convenient expression [start:end]. The above can be changed to $..man[0:3]:

❗️ Note that the expression [0,3 includes the start but not the end.

JSON Path also provides two types of expressions:

  1. (): Expression used for condition judgments or logical operations. You can use comparison operators (such as >, <, == etc.) and logical operators (such as &&, ||) to define conditions inside the parentheses. For instance, (@.length) represents getting the last element of an array.

  2. ?(): Filter expression. In ?(), you can use any legal JavaScript expression to filter elements. Such expressions are calculated inside the filter and decide whether to select or exclude the current element based on their result. For instance, [?(@.age > 25)] filters elements based on the "age" attribute, selecting those where the age is greater than 25.

With these two expressions, we can further improve the accuracy of data extraction. For instance, to get the last element of man, we can use $..man[(@.length-1)]:

@ represents the current element, .length represents the length of the current element, so (@.length-1) can be used to get the last element of an array.

If we want to filter out people with age greater than or equal to 33 in the man field, we can use the filter expression $..man[?(@.age >= 33)]:

The above covers most of the commonly used JSON Path syntax. For less commonly used syntax, you can refer to the JSON Path Plus implementation.

JSON Path doesn't have an official standard document, but there are some widely accepted and used implementations and documents. JSON Path Plus is one of them, and the He3 JSON Path tool adopts JSON Path Plus.

Returning to the main topic, in the following tens of thousands of lines of JSON:

If you want to see if there are friends named Ellen Rowland, you can:

Top comments (0)