We love it when developers use SparkPost webhooks to build awesome responsive services. Webhooks are great when you need real-time feedback on what your customers are doing with their messages. They work on a “push model – you create a microservice to handle the event stream.
Did you know that SparkPost also supports a “pull model Message Events API that enables you to download your event data for up to ten days afterwards? This can be particularly useful in situations such as:
- You’re finding it difficult to create and maintain a production-ready microservice. For example, your corporate IT policy might make it difficult for you to have open ports permanently listening;
- You’re familiar with batch type operations and running periodic workloads, so you don’t need real-time message events;
- You’re a convinced webhooks fan, but you’re investigating issues with your almost-working webhooks receiver microservice, and want a reference copy of those events to compare.
If this sounds like your situation, you’re in the right place! Now let’s walk through setting up a really simple tool to get those events.
Design goals
Let’s start by setting out the requirements for this project, then translate them into design goals for the tool:
- You want it easy to customize without programming.
- SparkPost events are a rich source of data, but some event-types and event properties might not be relevant to you. Being selective gives smaller output file sizes, which is a good thing, right?
- Speaking of output files, you want event data in the commonly-used
csv
file format. While programmers love JSON, CSV is easier for non-technical users (and results in smaller files). - You want to set up your SparkPost account credentials and other basic information once and once only, without having to redo them each time it’s used. Having to remember that stuff is boring.
- You need flexibility on the event date/time ranges of interest.
- You want to set up your local time-zone once, and then work in that zone, not converting values manually to UTC time. Of course, if you really want to work in UTC, because your other server logs are all UTC, then “make it so.”
- Provide some meaningful comfort reporting on your screen. Extracting millions of events could take some time to run. I want to know it’s working.
Events, dear programmer, events …
Firstly, you’ll need Python 3 and git
installed and working on your system. For Linux, a simple procedure can be found in our previous blog post. It’s really this easy:
sudo su -
yum update -y
yum install -y python35
yum install -y wget
wget https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py
yum install -y git
For other platforms, this is a good starting point to get the latest Python download; there are many good tutorials out there on how to install.
Then get the sparkyEvents
code from Github using:
$ git clone https://github.com/tuck1s/sparkyEvents.git
Initialized empty Git repository in /home/stuck/sparkyEvents/.git/
remote: Counting objects: 32, done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 32 (delta 7), reused 28 (delta 5), pack-reused 0
Unpacking objects: 100% (32/32), done.
$ cd sparkyEvents
We’re the knights who say “.ini”
Set up a sparkpost.ini
file as per the example in the Github README file here.
Replace <YOUR API KEY>
with a shrubbery your specific, private API key.
Host
is only needed for SparkPost Enterprise service usage; you can omit for sparkpost.com.
Events
is a list, as per SparkPost Event Types; omit the line, or assign it blank, to select all event types.
Properties
can be any of the SparkPost Event Properties. Definitions can split over lines using indentation, as per Python .ini file structure, which is handy as there are nearly sixty different properties. You can select just those properties you want, rather than everything; this keeps the output file to just the information you want.
Timezone
can be configured to suit your locale. It’s used by SparkPost to interpret the event time range from_time
and to_time
that you give in command-line parameters. If you leave this blank, SparkPost will default to using UTC.
If you run the tool without any command-line parameters, it prints usage:
$ ./sparkyEvents.py
NAME
./sparkyEvents.py
Simple command-line tool to retrieve SparkPost message events into a .CSV file.
SYNOPSIS
./sparkyEvents.py outfile.csv from_time to_time
MANDATORY PARAMETERS
outfile.csv output filename, must be writeable. Records included are specified in the .ini file.
from_time
to_time Format YYYY-MM-DDTHH:MM
from_time
and to_time
are inclusive, so for example if you want a full day of events, use time T00:00 to T23:59.
Here’s a typical run of the tool, extracting just over 18 million events. This run took a little over two hours to complete.
./sparkyEvents.py outfile.csv 2017-06-04T00:00 2017-06-04T23:59
SparkPost events from 2017-06-04T00:00 to 2017-06-04T23:59 America/New_York to outfile.csv
Events: <all>
Properties: ['timestamp', 'type', 'event_id', 'friendly_from', 'mailfrom', 'raw_rcpt_to', 'message_id', 'template_id', 'campaign_id', 'subaccount_id', 'subject', 'bounce_class', 'raw_reason', 'rcpt_meta', 'rcpt_tags']
Total events to fetch: 18537125
Page 1: got 10000 events in 5.958 seconds
Page 2: got 10000 events in 5.682 seconds
Page 3: got 10000 events in 5.438 seconds
Page 4: got 10000 events in 6.347 seconds
:
:
That’s it! You’re ready to use the tool now. Want to take a peek inside the code? Keep reading!
Inside the code
Getting events via the SparkPost API
The SparkPost Python library doesn’t yet have built-in support for the message-events
endpoint. In practice the Python requests
library is all we need. It provides inbuilt abstractions for handling JSON data, response status codes etc and is generally a thing of beauty.
One thing we need to take care of here is that the message-events endpoint is rate-limited. If we make too many requests, SparkPost replies with a 429
response code. We play nicely using the following function, which sleeps for a set time, then retries:
def getMessageEvents(uri, apiKey, params):
try:
path = uri + '/api/v1/message-events'
h = {'Authorization': apiKey, 'Accept': 'application/json'}
moreToDo = True
while moreToDo:
response = requests.get(path, timeout=T, headers=h, params=params)
# Handle possible 'too many requests' error inside this module
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
if response.json()['errors'][0]['message'] == 'Too many requests':
snooze = 120
print('.. pausing', snooze, 'seconds for rate-limiting')
time.sleep(snooze)
continue # try again
else:
print('Error:', response.status_code, ':', response.text)
return None
except ConnectionError as err:
print('error code', err.status_code)
exit(1)
Practically, when using event batches of 10000 I didn’t experience any rate-limiting responses even on a fast client. I had to deliberately set smaller batch sizes during testing, so you may not see rate-limiting occur for you in practice.
Selecting the Event Properties
SparkPost’s events have nearly sixty possible properties. Users may not want all of them, so let’s select those via the sparkpost.ini file. As with other Python projects, the excellent ConfigParser library does most of the work here. It supports a nice multi-line feature:
“Values can also span multiple lines, as long as they are indented deeper than the first line of the value.”
We can read the properties (applying a sensible default if it’s absent), remove any newline or carriage-return characters, and convert to a Python list in just three lines:
# If the fields are not specified, default to a basic few
properties = cfg.get('Properties', 'timestamp,type')
properties = properties.replace('\r', '').replace('\n', '') # Strip newline and CR
fList = properties.split(',')
Writing to file
The Python csv library enables us to create the output file, complete with the required header row field names, based on the fList
we’ve just read:
fh = csv.DictWriter(outfile, fieldnames=fList, restval='', extrasaction='ignore')
fh.writeheader()
Using the DictWriter
class, data is automatically matched to the field names in the output file, and written in the expected order on each line. restval="
ensures we emit blanks for absent data, since not all events have every property.
extrasaction=’ignore’
ensures that we skip extra data we don’t want.
for i in res['results']:
fh.writerow(i) # Write out results as CSV rows in the output file
That’s pretty much everything of note. The tool is less than 150 lines of actual code.
You’re the Master of Events!
So that’s it! You can now download squillions of events from SparkPost, and can customize the output files you’re getting. You’re now the master of events!
This post was originally published on sparkpost.com
Top comments (0)