DEV Community

Cover image for Take ownership of your Twitter data, set-up your own Twitter updated archive in GitHub
Raul Piraces Alastuey
Raul Piraces Alastuey

Posted on • Originally published at piraces.dev on

Take ownership of your Twitter data, set-up your own Twitter updated archive in GitHub

Due to the "recent" acquisition of Twitter and all changes that came after that, lots of users started considering other alternatives such as Mastodon, Tumblr, Nostr and many others...

But what happens to the generated data over the years we have been making in this platform, what do we make with them? Discarding them? My thoughts on this are... why not just keeping it safe for the future?

In this case it is where Tweetback come into action. An awesome project by Zach Leatherman to take ownership of your Twitter data in a public archive you can self-host (for example in GitHub).

Overview

Thanks to Zach Leatherman and its awesome project Tweetback, we can self host our own Twitter archive and keeping it up-to-date if necessary (just like I have done with mine).

The process to make your own archive is simple. Just follow the steps below.

Getting your Twitter archive in first place

This part is important to initialize our archive.

Twitter allows you to export a backup of all of your data you have in this platform.

As they explain in their help center, log in into your Twitter account (Web client) and:

  1. Click More in the main navigation menu to the left of your timeline.
  2. Select Settings and privacy.
  3. Select "Your Account" under Settings.
  4. Click on "Download an archive of your data".

Page where the data can be claimed

After clicking there, click in the button that says to request your archive. Then it will change the status to "Requesting archive".

Twitter settings page after requesting your archive

After taking this action, Twitter will collect your data and send you an email with a link to a .zip file with all your exported data. This process, can take up to 48 hours... so let's relax and keep with the process.

Note: don't forget to download the .zip file as soon as you receive the email and save it somewhere (as the links expires shortly).

Getting the Tweetback code and setting it up

In my case I just forked the repository but you can clone it and upload it to other site if you want.

Note that you will need to have Node.js installed (and Git of course).

Then follow this steps:

  1. Clone the repo (or your fork):
git clone https://github.com/tweetback/tweetback.git
Enter fullscreen mode Exit fullscreen mode

Note: if you choose creating a fork, the process of updating the repository for possible future changes from the base repository its easier with the "sync fork" option that GitHub provides.

  1. Enter to the repository main folder with the terminal and perform a npm install (this will generate a package-lock.json file you will keep).

  2. Following the instructions of Tweetback, open the zip file you downloaded from Twitter, and extract the data/tweets.js file from it into the database folder in your Tweetback repository folder.

  3. Edit this file to change the first line window.YTD.tweet.part0 to module.exports (a simple replace) so the first part of the file will look as follows:

module.exports = [
{
  "tweet" : {
...
Enter fullscreen mode Exit fullscreen mode

Note: if you have lots of tweets, the process can take a while...

  1. On the root folder of the repository run with the terminal the command npm run import which will perform the first initialization of our tweets database (going through the tweets.js file).

  2. At the end, you will see some output regarding counts and other info (but no errors).

  3. After importing the tweets, we will be making some changes to the _data/metadata.js file, which will customize some page settings for our archive. We will modify the options as follows (leave the others without modifications):

Metadata settings and where they reflect the changes

  1. Now we are ready with our archive! Take a look at how it will be by running npm run start at the root folder and accessing the local URL that the process outputs to the terminal.

  2. If everything looks good for you, commit and push your changes to the repository.

(Optional) Set the subpath where your page will reside

You can skip this step if your twitter archive will be in the root of a domain/subdomain (such as tweets.piraces.dev), otherwise we will have to set the subpath where it will be served (for example if hosting in GitHub pages such as piraces.github.io/tweets).

To do this, we will need to do some changes to the eleventy.config.js file as follows (at the end before the closing }):

...
eleventyConfig.addPlugin(EleventyHtmlBasePlugin);
return {pathPrefix: "/twitter/"}};
Enter fullscreen mode Exit fullscreen mode

Note: change the pathPrefix accordingly to the path you want.

(Optional) Configuring a subdomain to point to our GitHub Pages deployment

In this tutorial we will be using GitHub Pages to host our twitter archive, so since in my case I have the domain piraces.dev managed by Cloudflare, I will create a subdomain called tweets.piraces.dev to point to the GitHub pages deployment.

To do this we will only have to create a CNAME DNS record with the value of the subdomain (tweets in this case) pointing to your GitHub domain that will be in the format HANDLE.github.io where HANDLE will be your handle/username in GitHub (piraces.github.io in my case).

CNAME DNS record pointing to GitHub pages

The picture above shows how to set it for Cloudflare, but it will be very similar in other providers.

Deploying our archive using GitHub pages

Once our repo has all the changes mentioned above, we will have to configure GitHub pages for our repository.

To do so, we must follow this steps:

  1. Create an empty branch named gh-pages (where the deployed page will be). From your terminal:
git switch --orphan gh-pages
git commit --allow-empty -m "Initial commit on orphan branch"
git push -u origin gh-pages
Enter fullscreen mode Exit fullscreen mode
  1. Go to the repository page in GitHub and click "Settings" and there "Pages" from the left toolbar.

  2. In the "Pages" screen, we will be setting the options as follows:

GitHub pages settings example

  1. Setup a workflow to publish the page on every push to main branch or manually. To do so, we must go to the "Actions" tab in the GitHub repository page, and then select "New workflow". From this page, we select "Skip this and set up a workflow yourself ->", then a editor appears and we can setup the workflow. You can use the following code (adapting some fields) to complete yours:
name: Publish page

on:
  push:
    branches: [ "main" ] # Change this to the name of the main branch of your repository
  workflow_dispatch: # Setting this enables the possibility to run the workflow manually

jobs:
  build:
      runs-on: ubuntu-latest
      steps:
      - uses: actions/checkout@v3
      - name: Use Node.js 18.x
        uses: actions/setup-node@v3
        with:
            node-version: 18.x
      - run: npm ci # Note that you must commit the package-lock.json to the repo and quit it from the .gitignore file
      - run: npm run build
      - name: Deploy to Github Pages
        uses: crazy-max/ghaction-github-pages@v3
        with:
          # Build directory to deploy
          build_dir: _site
          # Write the given domain name to the CNAME file
          fqdn: tweets.piraces.dev # Change this to your configured subdomain or completely delete the line if not using it.
          jekyll: false
          keep_history: true # If you don't want to preserve a commit history, set to false to perform a git push --force
          env:
            GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN  }} # This secret is available by default
Enter fullscreen mode Exit fullscreen mode
  1. When finishing editing, click on "Start commit" and commit the workflow to the main branch. The workflow will start running after the commit, you can check the status from the "Actions tab" in the GitHub repository page.

  2. If the workflow shows green (all steps are OK), go and check your URL and see your Twitter archive 🎉 (note, if something goes wrong with your configured subdomain, please revisit step 3).

(Optional) Setting an scheduled action to update our archive

If you may still using Twitter in the future, consider modifying the workflow created in the previous step to fetch all new tweets using the Twitter API.

To do so, you will need a Twitter bearer token. You will only be allowed to obtain a Twitter bearer token if you are a "Twitter developer" 😢, so you will have to fill some forms and apply to become one.

To start the process to become a "Twitter developer" visit developer.twitter.com, sign-in and start the process... It may be an annoying process with manually reviews.

Once you are a "Twitter developer", proceed to create an app and then getting its bearer token. Documentation about getting the bearer token can be found in the official developer documentation of Twitter.

When you have everything set-up correctly, copy the bearer token that will look like something like AAAA... (like random characters). Now go to the repository with your terminal and try to perform the update process for yourself locally (before modifying the remote workflow) to make sure everything goes smoothly.

When you are in the repository base path with your terminal, run the following command (where AAAAAA... is your actual complete bearer token):

TWITTER_BEARER_TOKEN=AAAAAA... npm run fetch-new-data

This will fetch any tweets that are not in the database because you made them later.

After fetching new tweets, you will need to rebuild your site with:

npm run build

Take a look to the updated archive by running it locally with npm run start and accessing the URL that the process outputs to the terminal.

If everything looks OK to you, we are ready to modify our GitHub workflow to automate this!

Follow then this steps:

  1. Go to your repository GitHub page, select the "Actions" tab, select your workflow in the left navbar, locate the latest execution of the workflow and click on "View workflow file". After this, select the edit icon and start modifying your workflow.

Note: you can also modify the workflow in your local environment accessing the folder .github/workflows/ and editing the workflow.

  1. We will be adding a new step after - run: npm ci to run the fetching process of new tweets. To do so, first you may need to add the Twitter bearer token as a secret to your actions (since you don't want it to be publicly visible). If you don't know how to do so, please follow the steps in the GitHub official documentation.

  2. After declaring the secret, add a step after - run: npm ci like the following (assuming your secret is named TWITTER_BEARER_TOKEN, if not, change it accordingly):

...
- run: npm run fetch-new-data
  env:
    TWITTER_BEARER_TOKEN: ${{ secrets.TWITTER_BEARER_TOKEN }} # Change the secret name if different
Enter fullscreen mode Exit fullscreen mode
  1. In order to do this periodically, I decided to make my workflow run every night. To do so, you will only have to add a schedule entry in the on entry with a cron rule (adjust it if you want to your requirements):
name: Publish page

on:
  push:
    branches: [ "main" ] # Change this to the name of the main branch of your repository
  schedule:
    - cron: "30 0 * * *" # This will trigger the workflow every day at 00:30 UTC
  workflow_dispatch: # Setting this enables the possibility to run the workflow manually
Enter fullscreen mode Exit fullscreen mode
  1. Your workflow now will look like this (with minimal differences):
name: Fetch new data & publish page

on:
  push:
      branches: [ "main" ]
  schedule:
      - cron: "30 0 * * *"
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Use Node.js 18.x
      uses: actions/setup-node@v3
      with:
          node-version: 18.x
    - run: npm ci
    - run: npm run fetch-new-data
      env:
        TWITTER_BEARER_TOKEN: ${{ secrets.TWITTER_BEARER_TOKEN }} # Here the secret TWITTER_BEARER_TOKEN
    - run: npm run build
    - name: Deploy to Github Pages
      uses: crazy-max/ghaction-github-pages@v3
      with:
        # Build directory to deploy
        build_dir: _site
        # Write the given domain name to the CNAME file
        fqdn: tweets.piraces.dev
        jekyll: false
        keep_history: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Here the secret GITHUB_TOKEN
Enter fullscreen mode Exit fullscreen mode
  1. Save your workflow, commit and push it.

  2. Checkout the execution process in the "Actions" tab of your GitHub repository to check everything goes OK.

  3. You have now an "always updated" Twitter archive to be there for you and the people that may want to check it completely unattended! 🚀

Note: if you have any doubts regarding code/workflows you can always check my Twitter archive implementation: GitHub - piraces/twitter_archive.

(Optional) Adding your archive to tweetback-canonical (list of hosted Twitter backups)

tweetback-canonical is a package to resolve twitter URLs to new canonically hosted twitter backups, made by Zach Leatherman too.

Simple follow the process in their README.md to add the URL of your Twitter archive to the mapping.js file.

The process its very simple:

  1. Create a fork of the tweetback-canonical repository.

  2. Modify the mapping.js file adding your Twitter archive public URL to the end of the object, in the format:

"TWITTER_HANDLE": "FULL_URL_TO_TWITTER_ARCHIVE",

Note: your twitter handle must go without the @ symbol and you must end the line with a comma.

  1. Commit and push your changes with a commit message with starts with mapping:.

  2. Open a PR to the main branch of the official repository and wait for approval and merge.

  3. You are done with all! Congrats! 🎉

Conclusion

We have seen in detail how to use the awesome Tweetback project to build and self-host our own Twitter archive/backup. Whether you may, or may not, want to get off Twitter, I think it's a great approach to get the ownership of your data anyways.

Doing this you are not exposed to lose the investment you put in generating those tweets if something bad happens (hopefully not 🙏).

Don't forget to report any issues you find to the official repository and thanks the author (Zach) for the awesome work!

Happy deploy! 🎉🎉

Latest comments (0)