SteveT for SparkPost

Posted on Sep 26, 2017 • Originally published at sparkpost.com on Sep 22, 2017

Calm your Suppression List Anxieties with Python

#python #email #cli

“It takes many good deeds to build a good reputation, and only one bad one to lose it.”

– Benjamin Franklin

From our handy Getting Started Guide, you know how important it is to bring your suppression list from your old provider with you. Ben Franklin was right – your email reputation will catch a nasty cold if you send to stale, unsubscribed, bounced addresses. This affects whether your messages to your real subscribers are accepted, now and in the future, so it’s best to heed the doctor’s advice.

In this article, we’ll set up an easy-to-use tool to manage your suppression lists. If you want some more background on the “what and “why of suppressions, this article is a good starting point.

Scrubbing up

The exported suppression lists we see coming from old providers are often dirty. Duplicate entries, invalid entries with more than one @ sign, invalid characters, telephone numbers instead of email addresses, you name it. We’ve seen files with weird characters in various obscure international alphabets. This might make you consider just amputating those lists, but we’ll explore techniques to preserve as much of them as we can.

Lists can be large, reaching nearly a million entries. Working with small blocks manually is going to take forever. We are going to need a robot surgeon!

Plan for treatment

Let’s set out our needs, and translate them into design goals.

Make it easy to get started and simple to use.
Make best efforts to understand your file format, even if it appears to contain weird characters.
Check and upload any size of a list without manual work.
Check the input files up front, with helpful warnings as we go.
Checks should be thorough and fast. If there are faults, we want to know exactly where they are in the file, and what’s wrong. Specifically, we need to:
- Ensure email addresses are well-formed (i.e. follow the RFCs)
- Check the other field values, such as transactional / non_transactional flags.
A “check everything but don’t change anything mode, to make it easy to find and fix faulty input data.
Allow retrieval of your whole suppression list back from SparkPost, or select time-bounded portions.
- Have time-zone awareness, while accepting times in your locale. In particular, remember that start and end times could fall on either side of a daylight savings time change.
- Keep it simple. The API supports searching by domain, source, type, description etc – however, that can be done by filtering the retrieved file afterward. If you want these features, raise an issue on the Git repository and we’ll look at it.
Work across both master account and subaccounts.
Make it easy to supply defaults for missing/optional file information.

That leads us on to making a tool with the following options:

Check the format of your files (prior to import). Always a good idea to bring your suppressions with you.
Update your suppression list in SparkPost (i.e. create if your list is currently empty).
Retrieve your suppression list from SparkPost, for example, if you want to get suppressions back into your upstream campaign management tool.
Delete your suppression list from SparkPost, i.e. clean your suppression list out. Maybe you uploaded some entries by mistake. We hope that’s a rare use-case, but it’s there for you.

Time to operate

sparkySuppress is a tool written in Python to help you manage your suppression list. The Github repo includes a comprehensive README file. There’s some help with getting Python 3 installed here if you need it.

You can configure sparkySuppress with the sparkpost.ini file, which is used to set up things you change infrequently, such as your API key, timezone, batch sizes and so on. You can leave everything except API key set to default if you like.

Email addresses from input files are checked as we go, using the excellent email_validator library. We use this to give comprehensive reporting in case of faulty addresses, for example:

Line 2 ! bad@email@address.com The email address is not valid. It must have exactly one @-sign.
Line 3 ! invalid.email@~{}gmail.com The domain name ~{}gmail.com contains invalid characters (Codepoint U+007E not allowed at position 1 in '~{}gmail.com').

The ! marks the entry as having an error. We’ll mark entries that have recoverable problems with a warning w like this:

Line 1 w need valid transactional & non_transactional flags: {'recipient': 'test.recip@gmail.com'}

An excellent character

Text files are not as simple as they appear! Unusual file character encoding can be an obstacle, particularly when you don’t have control over how the suppression list export was created in the first place.

UTF-8 is the most modern and capable encoding, but some systems may not use it. Output files exported from some older versions of Excel will be in Latin-1 for example, rather than UTF-8.

The FileCharacterEncodings setting in the sparkpost.ini provides an easy way to control how your input file will be processed. The tool reads your file using each encoding in turn, and if it finds anomalies, will try in the next encoding and so on. So if you have:

FileCharacterEncodings=utf-8,utf-16,ascii,latin-1

you will see the tool trying each encoding until it finds one that reads the whole file without error. You can select any of the standard encodings shown here.

$ ./sparkySuppress.py check klist-1.csv
Trying file klist-1.csv with encoding: utf-8
        Near line 1125 'utf-8' codec can't decode byte 0x9a in position 7198: invalid start byte
Trying file klist-1.csv with encoding: utf-16
        Near line 1 UTF-16 stream does not start with BOM
Trying file klist-1.csv with encoding: ascii
        Near line 1125 'ascii' codec can't decode byte 0x9a in position 7198: ordinal not in range(128)
Trying file klist-1.csv with encoding: latin-1
        File reads OK.

Lines in file: 8496

Your first encoding in the list is used when you’re retrieving entries back from SparkPost into a file.

A good performance

Delete is a bit special – it uses multi-threading because deletes have to be done one per call. Update and retrieve work fast when single-threaded, as each call handles a batch. You should experience good performance with the default batch size and thread settings, but you can tweak them if needed.

Practicing your medicine

In case you don’t have data from your old provider yet, here’s a tool for creating suppression lists that you can use to create a dummy file to practice on.

That’s about it! You are now a skilled suppression list surgeon. You’ll soon have your campaigns in excellent shape.

And finally…

If you are exploring this tool and want to give the author feedback, you’re welcome to visit our Community Slack channel– there’s a channel just for Python, #python. Alternatively, open a Github project issue or pull-request.

If you don’t like Python (whut?) there are some lower-level command-line SparkPost projects that provide a thin “wrapper over the API and can be used to manipulate suppression lists. Check out Node.js and Go and if you want to know more about the API and UI for suppression lists, here’s a good place to start. There’s also a node.js tool to retrieve your list back again from SparkPost for checking.

If you prefer point-and-click, the SparkPost user interface has a built-in Lists/Suppressions upload feature. This gives you a nice example template and is ideal when you have perfectly formatted files that aren’t too large, with a maximum of 10,000 recipients per file.

The post Calm your Suppression List Anxieties with Python appeared first on SparkPost.

DEV Community