In this post, we'll be creating an hourly weather auto-collector using AWS Lambda.
This Lambda function will collect the previous day hourly weather data from OpenWeatherMap API, clean it, then upload it on the S3 bucket.
The weather data will look like this:
And it will be done once a day, so we'll have yesterday's hourly weather data uploaded daily.
Assuming we already have:
- AWS account
- Have an S3 bucket where we will upload the data on
- Free version of OpenWeatherMap API key
Steps we will go through
1๏ธโฃ Create a policy
2๏ธโฃ Create a role
3๏ธโฃ Set up a Lambda Function
4๏ธโฃ Write code to fetch, clean the data from OpenWeatherMap
5๏ธโฃ Install libraries in the same directory
6๏ธโฃUpload the zip file on the Lambda console
7๏ธโฃ Set up CloudWatch
โก๏ธ Done!
Let's get started ๐
1. Create a policy
To begin with, we first need to create a policy allowing a role to upload files on our S3 bucket.
Open your IAM console, and click Policies
, then Create permission
button.
On the next page, we'll be able to write a policy in JSON format. It should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "s3:PutObject",
"Recource": "arn:aws:s3:::yourbucketnamehere/*"
}
]
}
When it's ready, click Next: Tags
button, then click the Next
button to review.
Name the policy and click Create policy
.
2. Create a role
Now we will create a role that we will attach the policy we just created.
On the same IAM console, click Roles
, then Create role
.
On the next page, select AWS services as the entity and Lambda as a use case. Click the Permission
button to go to the next page.
On the next page, you can define what permission you are going to give to this role. Search the policy name you have given to the policy we just created.
Check the policy and click the Next
button. Click the Next
button again to name your role.
Then click Create role.
The Lambda function we will be developing in the next step will be allowed to upload files if this role is attached.
3. Set up Lambda Function
Next, we will set up our Lambda function.
Open the Lambda console and click Create
button.
Select Author from scratch
, and enter the Function name you like. In this tutorial, we will use Python3.8.
On the Permissions section, select Use an existing role
. Then choose your role we just created. When it's done, we can click the Create Function
button.
Now we could attempt writing function code on the page, but most of our essential libraries cannot be imported through the Lambda function.
Therefore we will first need to create the lambda_function.py
file locally and install the necessary libraries on the same directory.
4. Write code to fetch, clean the data from OpenWeatherMap
Ok, now we will write code to clean the weather data fetched from OpenWeatherMap API.
Create a folder locally with lambda_function.py
file in it.
Step1: Get the data
We start by getting the hourly weather data in a certain location.
In your lambda_function.py
file, firstly import all the necessary libraries and define lambda_handler
function.
import os
import sys
import requests
import json
import pandas as pd
from datetime import datetime, date, timedelta, timezone
import boto3
import csv
def lambda_handler(event, context):
## We will start writing code here
Inside the lambda_handler
function, we'll fetch weather data from OpenWeatherMap. To do so, firstly assign necessary information into variables.
api_key = 'your_own_api_key_here'
url = 'https://api.openweathermap.org/data/2.5/onecall/timemachine'
yesterday = datetime.now() - timedelta(days=1)
timestamp = round(datetime.timestamp(yesterday))
params = {
'lat': '53.349805',
'lon': '-6.26031',
'units': 'metric',
'dt': timestamp,
'appid': api_key
}
The geographical coordinates are set to Dublin, Ireland. Please find out your location by googling longitude and latitude [your location]
. South and East are set to negative.
Now we will send a request to get hourly weather data for yesterday.
result = requests.get(url=url, params=params)
result_json = result.json()
In the result_json
variable, we should have:
{'lat': 53.3498,
'lon': -6.2603,
'timezone': 'Europe/Dublin',
'timezone_offset': 0,
'current': {'dt': 1613668586,
'sunrise': 1613633820,
'sunset': 1613670073,
'temp': 6.92,
'feels_like': 1.1,
'pressure': 998,
'humidity': 76,
'dew_point': 2.99,
'uvi': 0.84,
'clouds': 75,
'visibility': 10000,
'wind_speed': 6.17,
'wind_deg': 240,
'wind_gust': 12.35,
'weather': [{'id': 803,
'main': 'Clouds',
'description': 'broken clouds',
'icon': '04d'}]},
'hourly': [{'dt': 1613606400,
'temp': 9.62,
'feels_like': 4.5,
'pressure': 991,
'humidity': 81,
'dew_point': 6.52,
'clouds': 75,
'visibility': 10000,
'wind_speed': 6.17,
'wind_deg': 160,
'weather': [{'id': 500,
'main': 'Rain',
'description': 'light rain',
'icon': '10n'}],
'rain': {'1h': 0.51}},
{'dt': 1613610000,
'temp': 9.75,
'feels_like': 2.49,
'pressure': 987,
'humidity': 81,
'dew_point': 6.65,
'clouds': 75,
'visibility': 10000,
'wind_speed': 9.26,
'wind_deg': 170,
'wind_gust': 16.46,
'weather': [{'id': 500,
'main': 'Rain',
'description': 'light rain',
'icon': '10n'}],
'rain': {'1h': 0.89}}, .....
Step2: Clean the data
What we want to know is only the hourly data from the JSON data, so we get that part in weather_data
variable using pandas.
weather_data = pd.json_normalize(data=result_json['hourly'])
However, what we want to get from this dataframe is only dt
and feels_like
, and the data that are still nested inside the 'weather' column, so we will change the above code into this:
weather_data = pd.json_normalize(data=result_json['hourly'], record_path='weather',
meta=['dt','feels_like'])
Then we will get only the necessary data.
We can use these data as it is, but I chose to remove some of the columns as they are redundant.
weather_data = weather_data.drop(['main', 'description', 'icon', 'temp', 'clouds'], 1)
Now we get the minimum required data in the data frame.
We would also like the dt
column as easy-to-read format, so we will change it too.
weather_data['dt'] = weather_data['dt'].apply(lambda x: datetime.fromtimestamp(x))
# we will also assign date as a part of file name later on.
date = weather_data['dt'][0].strftime("%m-%d-%Y")
We could also change the format to "%m/%d/%Y %H:%M:%S"
if you'd like.
weather_data['dt'] = weather_data['dt'].apply(lambda x: x.strftime("%m/%d/%Y %H:%M:%S"))
In this tutorial, we don't want the data from 0:00 to 5:00, 21:00 to 23:00, so we'll get rid of them as well.
weather_data = weather_data.drop(weather_data.index[21:])
weather_data = weather_data.drop(weather_data.index[:6])
Step3: Write a file on the S3 bucket
Now we will write the data into a file and upload it on the S3 bucket.
// Convert the data frame into CSV
csv_data = weather_data.to_csv(index=False)
s3 = boto3.resource('s3')
bucket = s3.Bucket('your_bucket_name_here')
key = '{}.csv'.format(date)
with open("/tmp/{}.csv".format(date), 'w') as f:
csv_writer = csv.writer(f, delimiter=",")
csv_reader = csv.reader(csv_data.splitlines())
for row in csv_reader:
# each row looks like this..
# ['id', 'dt', 'feels_like']
# ['801', '02/18/2021 06:00:00', '-2.49']
# ['801', '02/18/2021 07:00:00', '-1.84']....
# write each row on f using csv_writer
csv_writer.writerow(row)
bucket.upload_file("/tmp/{}.csv".format(date), key)
5. Install libraries in the same directory
In AWS Lambda, many libraries cannot be imported therefore we need to have them in the same directory where we have the lambda_function.py
.
For our lambda function, we need to have NumPy, Pandas and Requests installed.
I have found this article extremely helpful, so please have a look if you'd like to know the way step by step.
After installing all the libraries, we need to compress all the file. I have mine as archive.zip
but the name doesn't really matter.
6. Upload the zip file on the Lambda console
In your lambda console, we'll be able to find Upload a zip file
button inside the Actions
dropdown. Up load your zip file from there.
When it's done, we can run a test from the Test
button that is located top-right of the page. You'll need to configure the test event, but you don't have to do much here, so just name the test and hit Create
. Then hit the Test
again.
Sweet! It says the test has run successfully.
Let's see if the CSV file is correctly saved in the S3 bucket.
It seems the file is uploaded correctly.
7. Set up CloudWatch
Setting Cloudwatch for our lambda function enables the function to run automatically.
Let's open the CloudWatch console. Click the Create rule
button.
In the Event Source section, select Schedule
and set our desired interval. I'll set it to run once a day.
In the Target section, select Lambda function
and choose our function name from the list. Hit the Configure details
button.
Name your CloudWatch rule on the next page, and hit Create rule
button.
Perfect!
Check if your function was run as soon as you created the CloudWatch rule as well as running as your expected interval.
Complete code in the lambda_function.py
import os
import sys
import requests
import json
import pandas as pd
from datetime import datetime, date, timedelta, timezone
import boto3
import csv
def lambda_handler(event, context):
api_key = 'your_openweathermap_api_key_here'
url = 'https://api.openweathermap.org/data/2.5/onecall/timemachine'
yesterday = datetime.now() - timedelta(days=1)
timestamp = round(datetime.timestamp(yesterday))
params = {
'lat': '53.349805',
'lon': '-6.26031',
'units': 'metric',
'dt': timestamp,
'appid': api_key
}
# Fetch hourly weather data in Dublin from OpenWeatherMap API
input_file = requests.get(url=url, params=params)
result_json = input_file.json()
# Flatten and clean hourly weather data
weather_data = pd.json_normalize(data=result_json['hourly'], record_path='weather',
meta=['dt', 'temp', 'feels_like', 'clouds'])
weather_data = weather_data.drop(['main', 'description', 'icon', 'temp', 'clouds'], 1)
weather_data['dt'] = weather_data['dt'].apply(lambda x: datetime.fromtimestamp(x))
date = weather_data['dt'][0].strftime("%m-%d-%Y")
weather_data['dt'] = weather_data['dt'].apply(lambda x: x.strftime("%m/%d/%Y %H:%M:%S"))
weather_data = weather_data.drop(weather_data.index[21:])
weather_data = weather_data.drop(weather_data.index[:6])
csv_data = weather_data.to_csv(index=False)
#call your s3 bucket
s3 = boto3.resource('s3')
bucket = s3.Bucket('your_bucket_name_here')
key = '{}.csv'.format(date)
with open("/tmp/{}.csv".format(date), 'w') as f:
csv_writer = csv.writer(f, delimiter=",")
csv_reader = csv.reader(csv_data.splitlines())
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
csv_writer.writerow(row)
#upload the data into s3
bucket.upload_file("/tmp/{}.csv".format(date), key)
Thanks for reading!
If you have any ideas to improve the function, please leave your view in the comment! I would truly appreciate it ๐ In the meantime, follow me on Linkedin @Maiko Miyazaki
Resources
AWS Lambda with Pandas and NumPy by Ruslan Korniichuk
AWS Lambda with Pandas and NumPy|Pandas & AWS Lambda|Pandas Lambda with Python3 by BidDataOnlineSchool
Top comments (1)
Thank you Maiko! Extremely well done example!
Do you have this solution done with TerraForm by any chance?