DEV Community

Percival Villalva for Apify

Posted on • Originally published at blog.apify.com on

How to parse JSON with Python

Understand JSON structure and syntax, and learn how to parse JSON strings and files using Python's built-in json module and convert JSON files using Pandas.

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write while also being easy for machines to parse and generate. It is widely used for transmitting data between a client and a server, as an alternative to XML.

JSON data is represented as a collection of key-value pairs, where the keys are strings and the values can be any valid JSON data type, such as a string, number, boolean, null, array, or object.

{
    "name": "John Doe",
    "age": 30,
    "city": "New York"
}

Enter fullscreen mode Exit fullscreen mode

In this example, name, age, and city are the keys, and "John Doe", 30, and "New York" are the corresponding values.

How to parse JSON strings in Python

To parse a JSON string in Python, we can use the built-in json module. This module provides two methods for working with JSON data:

  • json.loads() parses a JSON string and returns a Python object.

  • json.dumps() takes a Python object and returns a JSON string.

Here is an example of how to use json.loads() to parse a JSON string:

import json

# JSON string
json_str = '{"name": "John", "age": 30, "city": "New York"}'

# parse JSON string
data = json.loads(json_str)

# print Python object
print(data)

Enter fullscreen mode Exit fullscreen mode

In this example, we import the json module, define a JSON string, and use json.loads() to parse it into a Python object. We then print the resulting Python object.

Note that json.loads() will raise a json.decoder.JSONDecodeError exception if the input string is not valid JSON.

After running the script above we can expect to get the following output printed to the console:

{'name': 'John', 'age': 30, 'city': 'New York'}

Enter fullscreen mode Exit fullscreen mode

How to read and parse JSON files in Python

To parse a JSON file in Python, we can use the same json module we used in the previous section. The only difference is that instead of passing a JSON string to json.loads(), we pass the contents of a JSON file.

For example, assume we have a file named **data.json** that we would like to parse and read. Here's how we would do it:

import json

# open JSON file
with open('data.json', 'r') as f:
    # parse JSON data
    data = json.load(f)

# print Python object
print(data)

Enter fullscreen mode Exit fullscreen mode

In this example, we use the open() function to open a JSON target file called data.json in read mode. We then pass the file object to json.load(), which parses the JSON data and returns a Python object. We then print the resulting Python object.

Note that if the JSON file is not valid JSON, json.load() will raise a json.decoder.JSONDecodeError exception.

How to pretty print JSON data in Python

When working with JSON data in Python, it can often be helpful to pretty print the data, which means to format it in a more human-readable way. The json module provides a method called json.dumps() that can be used to pretty print JSON data.

Here is an example of how to pretty print JSON data in Python:

import json

# define JSON data
data = {
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": ["reading", "traveling", "cooking"]
}

# pretty print JSON data
pretty_json = json.dumps(data, indent=4)

# print pretty JSON
print(pretty_json)

Enter fullscreen mode Exit fullscreen mode

Output:

{
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": [
        "reading",
        "traveling",
        "cooking"
    ]
}

Enter fullscreen mode Exit fullscreen mode

In this example, we define a Python dictionary representing JSON data, and then use json.dumps() with the indent argument set to 4 to pretty print the data. We then print the resulting pretty printed JSON string.

Note that indent is an optional argument to json.dumps() that specifies the number of spaces to use for indentation. If indent is not specified, the JSON data will be printed without any indentation.

How to parse JSON with Python Pandas

In addition to the built-in json package, we can also use pandas to parse and work with JSON data in Python. pandas provides a method called pandas.read_json() that can read JSON data into a DataFrame.

Compared to using the built-in json package, working with pandas can be easier and more convenient when we want to analyze and manipulate the data further, as it allows us to use the powerful and flexible DataFrame object.

Here is an example of how to parse JSON data with pandas:

import pandas as pd
import json

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame using pandas
df = pd.read_json(json.dumps(data))

# print DataFrame
print(df)

Enter fullscreen mode Exit fullscreen mode

Output:


   name age city
0 John 30 New York
1 Jane 25 London
2 Bob 35 Paris

Enter fullscreen mode Exit fullscreen mode

In this example, we define a Python dictionary representing JSON data, and use json.dumps() to convert it to a JSON string. We then use pandas.read_json() to read the JSON string into a DataFrame. Finally, we print the resulting DataFrame.

One benefit of using pandas to parse JSON data is that we can easily manipulate the resulting DataFrame, for example by selecting columns, filtering rows, or grouping data.

import pandas as pd
import json

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame using pandas
df = pd.read_json(json.dumps(data))

# select columns
df = df[["name", "age"]]

# filter rows
df = df[df["age"] > 30]

# print resulting DataFrame
print(df)

Enter fullscreen mode Exit fullscreen mode

Output:

  name age
2 Bob 35

Enter fullscreen mode Exit fullscreen mode

In this example, we select only the name and age columns from the DataFrame, and filter out any rows where the age is less than or equal to 30.

Using pandas to parse and work with JSON data in Python can be a convenient and powerful alternative to using the built-in json package. It allows us to easily manipulate and analyze the data using the DataFrame object, which offers a rich set of functionality for working with tabular data.

How to convert JSON to CSV in Python

Sometimes we might want to convert JSON data into a CSV format. Luckily, the pandas library can also help us with that.

We can use the pandas.read_json() to read JSON data into a DataFrame, followed by a method called DataFrame.to_csv() to write the DataFrame to a CSV file.

Here is an example of how to convert JSON data to CSV in Python using pandas:

import pandas as pd

# define JSON data
data = {
    "name": ["John", "Jane", "Bob"],
    "age": [30, 25, 35],
    "city": ["New York", "London", "Paris"]
}

# convert JSON to DataFrame
df = pd.read_json(json.dumps(data))

# write DataFrame to CSV file
df.to_csv("data.csv", index=False)

# read CSV file
df = pd.read_csv("data.csv")

# print DataFrame
print(df)

Enter fullscreen mode Exit fullscreen mode

Output:

   name age city
0 John 30 New York
1 Jane 25 London
2 Bob 35 Paris

Enter fullscreen mode Exit fullscreen mode

In this example, we define a Python dictionary representing JSON data, and use json.dumps() to convert it to a JSON string. We then use pandas.read_json() to read the JSON string into a DataFrame, and use DataFrame.to_csv() to write the DataFrame to a CSV file. We then use pandas.read_csv() to read the CSV file back into a DataFrame, and print the resulting DataFrame.

Note that when calling to_csv(), we pass index=False to exclude the row index from the output CSV file.

Web Scraping with Python

Learn how to web scrape with Python. With code examples.

favicon blog.apify.com

Top comments (0)