DEV Community

Akmal Chaudhri for SingleStore

Posted on

Quick tip: Analysing Air Quality Data with SingleStoreDB

Abstract

Air quality has become a significant concern in many cities globally due to increased pollution levels. In this short article, we'll see how to calculate the Air Quality Index (AQI), store air quality data in SingleStoreDB and quickly visualise the AQI.

Introduction

In this article, we'll obtain air quality data from the Department for Environment, Food and Rural Affairs (Defra) in the UK and store it in SingleStoreDB. SingleStoreDB integrates with many external tools and libraries. We'll utilise one of these integrations in this example.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use AQI Demo Group as our Workspace Group Name and aqi-demo as our Workspace Name. We'll make a note of our password and host name.

We'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS aqi_db;
Enter fullscreen mode Exit fullscreen mode

Create a Deepnote account

We'll create a free account on the Deepnote website. Once logged in, we'll create a new Deepnote project to give us a new notebook. We'll also create a new directory called data.

Obtain Air Quality Data

Courtesy of Tony Hirst, we'll use the Python downloader for UK Air Quality data notebook to obtain sample data. We'll use the same site code of KC1 (London N. Kensington) and select 2022 for the year. In the final step of the notebook, we'll assign the results to a Pandas Dataframe and write it to a CSV file, as follows:

import pandas as pd

...

aqi_df = pyreadr.read_r(fn)[fn.split(".")[0]]

aqi_df.to_csv("data/KC1_2022.csv", index = False)
Enter fullscreen mode Exit fullscreen mode

At this point, we need to acknowledge Defra and uk-air.defra.gov.uk as the source of information as well as the licence:

© Crown 2023 copyright Defra via uk-air.defra.gov.uk, licenced under the Open Government Licence (OGL).

Let's take a quick look at the data:

aqi_df.head()
Enter fullscreen mode Exit fullscreen mode

The output should be similar to the following:

                  date        O3       NO      NO2  NOXasNO2      SO2        CO  PM10   PM2.5     wd   ws  temp                  site code
0  2022-01-01 00:00:00  39.11572  0.24946  9.94500  10.51875  0.26609  0.209556  14.9  12.076  212.0  3.0  10.4  London N. Kensington  KC1
1  2022-01-01 01:00:00  38.71658  0.12473  7.84125   8.22375 -0.53218  0.197914  16.3  12.642  214.8  3.7  10.5  London N. Kensington  KC1
2  2022-01-01 02:00:00  41.51056  0.12473  6.88500   7.07625  0.26609  0.221198  15.4  11.604  212.9  3.6  10.2  London N. Kensington  KC1
3  2022-01-01 03:00:00  45.70153  0.12473  4.97250   5.16375  0.26609  0.186272  13.6  10.189  208.8  3.9  10.6  London N. Kensington  KC1
4  2022-01-01 04:00:00  48.49551  0.12473  4.59000   4.59000  0.26609  0.186272  10.9   8.302  206.7  4.0  11.3  London N. Kensington  KC1
Enter fullscreen mode Exit fullscreen mode

We can see the range of values:

aqi_df.describe()
Enter fullscreen mode Exit fullscreen mode

The output should be similar to the following:

                O3           NO          NO2     NOXasNO2          SO2           CO         PM10        PM2.5           wd           ws         temp
count  8617.000000  8593.000000  8593.000000  8593.000000  7923.000000  7737.000000  8592.000000  8622.000000  8664.000000  8664.000000  8664.000000
mean     49.920524     4.817513    18.449004    25.724694     0.907117     0.165013    14.606646     8.869804   193.805794     3.286034    10.889578
std      26.435005    18.278836    16.204639    39.501914     1.111520     0.139987    10.512747     8.222229    98.460865     1.567511     6.410335
min      -0.199570    -0.124730     0.000000     0.000000    -1.330450     0.000000     0.700000     0.377000     0.100000     0.000000    -5.900000
25%      32.929050     0.374190     7.458750     8.415000     0.266090     0.093136     8.200000     4.245000   107.475000     2.200000     6.500000
50%      51.089920     0.748380    12.813750    14.152500     0.532180     0.128062    11.900000     6.226000   219.100000     2.900000    10.700000
75%      66.456810     1.995680    23.906250    26.392500     1.064360     0.186272    17.500000    10.283000   265.000000     4.100000    15.300000
max     166.441380   368.701880   108.438750   652.927500    10.377510     1.641522   110.400000    84.623000   360.000000    12.900000    31.900000
Enter fullscreen mode Exit fullscreen mode

We can see data for the polluting gases:

  • Ozone (O3)
  • Nitrogen monoxide (NO)
  • Nitrogen dioxide (NO2)
  • Sulphur dioxide (SO2)
  • Carbon monoxide (CO)
  • Particulates (PM10 and PM2.5)

We'll check for missing data:

aqi_df.isna().sum()
Enter fullscreen mode Exit fullscreen mode

The output should be similar to the following:

date           0
O3           143
NO           167
NO2          167
NOXasNO2     167
SO2          837
CO          1023
PM10         168
PM2.5        138
wd            96
ws            96
temp          96
site           0
code           0
Enter fullscreen mode Exit fullscreen mode

For our initial analysis, we'll remove the rows with missing data:

aqi_df.dropna(inplace = True)
Enter fullscreen mode Exit fullscreen mode

Calculate Air Quality Index (AQI)

Next, we'll use the python-aqi library to calculate the AQI. This library uses algorithms from:

  • The United States Environmental Protection Agency (EPA)
  • The China Ministry of Environmental Protection (MEP)

First, we need to install it, as follows:

!pip install python-aqi
Enter fullscreen mode Exit fullscreen mode

We'll initially use the particulates (PM10 and PM2.5) to calculate the AQI using the EPA algorithm and create a new column in the Pandas Dataframe to store the result:

import aqi

aqi_df["aqi"] = aqi_df.apply(
    lambda row: aqi.to_aqi([
        (aqi.POLLUTANT_PM25, row["PM2.5"]),
        (aqi.POLLUTANT_PM10, row["PM10"])
    ]),
    axis = 1
)
Enter fullscreen mode Exit fullscreen mode

Now we'll store the Pandas Dataframe in SingleStoreDB.

Store data in SingleStoreDB

We'll use Ibis to create the table and store the data in SingleStoreDB.

First, we'll install the package:

!pip install ibis-singlestoredb
Enter fullscreen mode Exit fullscreen mode

Next, we'll import:

import ibis

ibis.options.interactive = True
Enter fullscreen mode Exit fullscreen mode

We'll now create a connection to SingleStoreDB:

conn = ibis.singlestoredb.connect(
    "admin:<password>@<host>:3306/aqi_db"
)
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Now we'll create a new table in SingleStoreDB using the Pandas Dataframe:

aqi_tbl = conn.create_table("aqi", aqi_df, force = True)
Enter fullscreen mode Exit fullscreen mode

We can check the table using several methods. Let's use the SQL Editor in SingleStoreDB:

USE aqi_db;

SELECT * FROM aqi LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

The output should be similar to the following:

+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
| date                | O3      | NO      | NO2     | NOXasNO2 | SO2     | CO       | PM10 | PM2.5  | wd    | ws   | temp | site                 | code | aqi  |
+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
| 2022-01-01 03:00:00 | 45.7015 | 0.12473 |  4.9725 |  5.16375 | 0.26609 | 0.186272 | 13.6 | 10.189 | 208.8 |  3.9 | 10.6 | London N. Kensington | KC1  |   42 |
| 2022-01-01 04:00:00 | 48.4955 | 0.12473 |    4.59 |     4.59 | 0.26609 | 0.186272 | 10.9 |  8.302 | 206.7 |    4 | 11.3 | London N. Kensington | KC1  |   35 |
| 2022-01-01 17:00:00 | 46.3002 | 0.12473 | 8.22375 |    8.415 | 0.26609 | 0.197914 |   23 | 12.925 | 229.7 |  3.8 | 10.2 | London N. Kensington | KC1  |   53 |
| 2022-01-01 19:00:00 | 41.7101 | 0.12473 | 12.0487 |    12.24 | 0.26609 | 0.221198 | 23.4 | 14.057 | 231.8 |    5 |   11 | London N. Kensington | KC1  |   55 |
| 2022-01-01 21:00:00 | 44.7037 | 0.12473 | 9.75375 |  9.75375 | 0.26609 | 0.221198 | 24.3 | 13.679 | 231.8 |    5 | 11.3 | London N. Kensington | KC1  |   54 |
+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
Enter fullscreen mode Exit fullscreen mode

A variety of tools could now be used with SingleStoreDB to perform analytics on the data. For example, we have previously discussed the ease with which we can use analytics and visualisation tools with SingleStoreDB, such as Apache Superset, Metabase and Looker.

Bonus: Create visualisation

From our Python environment, we can also create a quick visualisation using Plotly:

import pandas as pd
import plotly.graph_objs as go

# Define the ranges for each AQI category
aqi_bounds = [0, 51, 101, 151, 201, 301, 500]

# Define the names of each AQI category
aqi_categories = [
    "Good",
    "Moderate",
    "Unhealthy for Sensitive Groups",
    "Unhealthy",
    "Very Unhealthy",
    "Hazardous",
    "Very Hazardous"
]

# Define the colours for each AQI category
aqi_colors = [
    [0.0,   "green" ],
    [0.1,   "green" ],
    [0.102, "yellow"],
    [0.2,   "yellow"],
    [0.202, "orange"],
    [0.3,   "orange"],
    [0.302, "red"   ],
    [0.4,   "red"   ],
    [0.402, "purple"],
    [0.6,   "purple"],
    [0.602, "maroon"],
    [1.0,   "maroon"]
]

# Define the tick values and labels for the colorbar
tickvals = [(aqi_bounds[i] + aqi_bounds[i + 1]) / 2 for i in range(len(aqi_bounds)-1)]
ticktext = aqi_categories[:-1]

# Create a new figure object with a heatmap trace
fig = go.Figure(data = go.Heatmap(
    x = aqi_df["date"], y = aqi_df["site"], z = aqi_df["aqi"],
    colorscale = aqi_colors, zmin = aqi_bounds[0], zmax = aqi_bounds[-1],
    colorbar = dict(title = "Air Quality Index",
                    tickmode = "array",
                    ticktext = ticktext,
                    tickvals = tickvals
                )
    )
)

# Customise the layout of the figure
fig.update_layout(
    xaxis_title = "Date",
    yaxis_title = "Location",
    title = "Air Quality Index by Location and Date",
    width = 1000,
    height = 600
)

# Show the figure
fig.show()
Enter fullscreen mode Exit fullscreen mode

This produces the heatmap shown in Figure 1.

Figure 1. AQI for KC1.

Figure 1. AQI for KC1.

The overall quality of air throughout 2022 was Good. The next steps would be to drill down and further analyse the data, check the data for the other polluting gases, and so on. We could also make the heatmap more interesting by adding additional weather data.

Summary

In this short article, we have quickly analysed some air quality data from Defra in the UK, using the US EPA algorithm with data for particulates (PM10 and PM2.5). However, we note that many countries use different algorithms to calculate their AQI, although the approach described in this article could be easily adapted for other countries.

Top comments (0)