How many Substack accounts do you have? Or rather, how many email addresses do you have? This is the story of how collecting a few of both surprised me, how I recovered and how you can too.
Disclaimer: Substack is not at fault here, this was entirely my own doing!
Summary
- Substack implements RFC-2919 allowing us to programmatically get every list we've ever been subscribed to as
List-Id
in any mail client. Standards are fantastic. - With Gmail's search or API, we can retrieve emails from Substack using
from:substack.com
. And, to filter by payment receipts:subject:"your payment receipt" from:substack.com
. Using Google Apps Script we can programmatically aggregate these messages into a handy list, like this. - Newsletters being emails made it easy to retrieve this information, but, by backing this data up sooner I could have avoided the trouble altogether.
How I messed up
Over the years I've used a few different email addresses. With newsletters growing in popularity and so many being sent via Substack, that also meant I'd collected several Substack accounts. Thankfully it's easy to merge these accounts.
While evaluating alternatives to Reveue(for no reason whatsoever 😇), I misread Substack's onboarding flow and believing that I'd created another duplicate account, hit delete.
Moments later, I realised that I'd deleted my primary account. Whoops!
The good news: the deletion process is good. My account was gone immediately. Many companies make it difficult to leave so this was genuinely a welcome surprise. Excellent compliance work folks!
The bad news: The deletion process is too good. I didn't receive a single message from the many newsletters I'd been unsubscribed from. Oh no!
Some newsletters are regular and predictable, but many are not. These are the ones where you get the odd valuable unique insight or a punchy soundbite. The few I might not know I missed, but, would miss the most.
So: How do I find and recover all these newsletters that I've previously signed up for?
Email inboxes are just big log-files
The nice thing about newsletters is that they're “just” emails. And the nice thing about emails is that they're “just” files. Specifically, they are log files, or, a series of mails recorded in a standardised format.
This is a slight oversimplification, because in reality emails can be stored in nearly any way, but the conceptual model still works.
What this means is that it's possible to scan over all of these mails to reconstruct my history and to derive a list of the newsletters I read. Sweet.
Let's go about solving this problem like this:
- Find all mails sent from Subtack.com.
- Identify which newsletter they are from.
- Put those into a list.
- Optionally, group by the email address they were sent to.
Okay, from a busy inbox, how do we do that?
What do the standards say?
This post isn't about the history of email - there's a fascinating series of IETF RFCs which build up the format we know today.
The first standard which will help today is RFC-822, the standard for the format of ARPA Internet Text Messages.
Section 1.2 introduces headers and section 3, describes their format. In short, email headers are bits of data about the email and are sent at the start of a message before the main content. They look something like this:
Subject: This is a subject
Content-Type: text/plain
Which in this case says that the email's subject is "This is a subject" and that it's plain text. This format is relatively easy for a computer to read.
The next two standards that can help today are extensions specifically for mailing lists, specifically RFC-2369 and RFC-2919 which introduce a series of list-specific metadata. These look like this:
List-Owner: <mailto:example@example.org>
List-Unsubscribe: <https://example.org/unsubscribe>
List-Id: <some-id>
Cool! So maybe we can aggregate those List IDs?
Enter: Gmail and Apps Script
If I were dealing with a mail spool on a Linux machine, this might be a bit easier, because these fields are very grep
-able.
e.g. grep --only-matching 'List-Id:' | sort --unique
In practice, most of us are using webmail services and don't have direct access to the mail-box (although could download them over IMAP). They do offer APIs. And Gmail, which I use, comes with a handy scripting engine called Apps Script.
Apps Script is great. I've used it in Google Sheets to get Jira ticket descriptions and in Google Docs to create meeting agendas automatically. It comes with a bunch of Google API integrations, including the Gmail Service which is one-click access to most Gmail operations.
Our first task is iterating over all emails from Substack. The filter from:substack.com
should do that for us, which looks something like this:
const threads = GmailApp.search("from:substack.com", 0, 50)
console.log(threads)
This is asynchronous code which is executing synchronously. Apps Script is a bit weird like that. For today's purposes it makes things much simpler.
Okay, we now have a list of Threads. Threads don't include much information about the messages, so our next step is to retrieve the Messages and read theirList-ID
headers.
// Find last 50 emails from Substack.
const threads = GmailApp.search("from:substack.com", 0, 50)
for (const thread of threads) {
// Log the list ID of the first message in each thread.
const messages = thread.getMessages()
if (messages.length >= 1) {
const listId = messages[0].getHeader('List-ID')
console.log(listId)
}
}
Excellent, now we're getting somewhere. We have list IDs for newsletters. Our next step is to iterate over all emails, storing unique list IDs only. We can use a Set for these, which only hold unique values.
const lists = new Set()
// Find last 50 emails from Substack.
const threads = GmailApp.search("from:substack.com", 0, 50)
for (const thread of threads) {
// Retrieve list ID of the first message in each thread.
const messages = thread.getMessages()
if (messages.length >= 1) {
const listId = messages[0].getHeader('List-ID')
// Store the list ID if it's unique.
if (!lists.has(listId)) {
lists.add(listId)
}
}
}
// Log the unique list IDs.
console.log(Array.from(lists))
We now have a list of the unique lists we have at some point been subscribed to, excellent!
With a bit of refactoring, we can group this by email - I have done this and you can see the code further down so won't cover it here.
Show me the code! And how do I run this?
Excellent question and the answer to both questions is here: What are my Substack subscriptions?
In this example, I've implemented pagination using a Generator (Apps Script makes AsyncIterables very easy!)
This example also uses a non-standard field List-URL
which works for Substack and is a more human-friendly. It also groups by Delivered-To
email address making it easier to find duplicates.
Instead of logging the result, this sends you an email using the sendEmail method.
The manual bit
Unfortunately, Substack doesn't offer an API, so after getting a full list of which newsletters I'd previously been subscribed to, I couldn't automatically re-subscribe. So I did a bit of clicking of links, which seemed like a fair compromise considering that I got into this fix by doing a bit of clicking on links.
Annoyingly some subscriptions were ones I'd paid for and Substack doesn't have a way to automatically resume or refund these. They recommend speaking to support, so armed with a similarly-obtained list of the paid-for subscriptions, I've done just that.
You can get those using this query: subject:"Your payment receipt" from:substack.com
Takeaways
1. Always bet on Email
Email has fallen in popularity, even as newsletters have surged, and until today I actually wished that more of these newsletters were RSS feeds instead.
But it's actually very handy being able to find conversations, receipts, notifications, news and so much more in one place. And this kind of recovery would not be possible without that history.
Email clients are ubiquitous and work on nearly every platform. It's genuinely quite hard to beat the reliability of email.
2. Standards are dope. Why don't we use more of them?
If you're building a platform doing something like sending out newsletters today, try and include their guidance.
After recovering my Substack subscriptions I had a look at some other newsletter providers I'm subscribed through and the presence of mailing-list headers in their messages was spotty at best.
It's not only email: For example on the web, we frequently see people re-inventing their own behaviour, especially around how content is made accessible or how page navigation works, even though browsers do both better. I've definitely done this in the past. We can do better.
3. Export your data and keep fewer accounts
Most platforms offer some form of data export. Take advantage of it - because it's always nice to have a backup!
Also, by avoiding keeping many digital accounts you'll be far less likely to run into problems like these. You will also reduce your risk of data theft or leaks.
...
This journey has been a slightly unwelcome-yet-fun distraction, but I hope if not helpful that it's at least mildly interesting. Feedback is welcome - thanks for reading!
Top comments (0)