DEV Community

Ayumi Sato
Ayumi Sato

Posted on

Importing Toots from Mastodon to WordPress Posts Using Google Apps Script

I have been archiving my online writings on a private WordPress site since 2003. This includes diaries originally written in HTML, posts from other external platforms, and content created using MovableType - all of which have been migrated to WordPress. Although I don't post regularly and not everything is archived, I still have an impressive collection spanning 15 years, totaling approximately 2,500 articles.
Recently, I wanted to import my past toots from Mastodon into WordPress in a similar manner. I was able to achieve this using Google Apps Script. By importing my toots into my WordPress environment, I can effectively manage my data and easily search through the content.
If you are looking to do something similar with your own Mastodon toots and WordPress site, I hope this article will be a helpful resource.

Objectives

  • Import past toots from my Mastodon account into WordPress.
  • Use Google Apps Script to avoid setting up a new environment.
  • Set the toot’s post date as the article’s post date in WordPress.
  • Use the first 20 characters of the toot’s body as the WordPress article title.
  • If the toot’s body is empty, set the WordPress article title as "no title".
  • Categorize the articles as "Toots".
  • Attach images from toots to the WordPress article body.
  • Convert hashtags in toots to WordPress tags.
  • Exclude boosts and replies.

What Was Not Done

  • Video attachments.
  • Other necessary or unnecessary things I didn't think of.

Requirements

To fetch data from Mastodon, you need an access token.

Obtaining the Mastodon Access Token

  1. Log in to your Mastodon instance.
  2. Select "Development" from "User Settings" and click "New Application".
  3. Enter an application name and select the necessary permissions (scope). For this task, "read" permission is sufficient.
  4. Click "Submit".
  5. Note the access token from the application details page.

Google Apps Script Code

Write the following code in Google Apps Script. Set the URL of the Mastodon instance, the obtained access token, and the URL of the WordPress site.
Execute the main() function. It may take a few minutes to complete. If a security warning appears during execution, please allow it.

// Main function: Convert Mastodon posts to WordPress format and save as an XML file
function main() {
  const instanceUrl = "https://mastodon.example.com"; // Set the URL of the Mastodon instance
  const accessToken = "your_token_here"; // Set the obtained access token
  const siteUrl = "https://wordpress.example.com"; // Set the URL of the WordPress site

  // Fetch Mastodon posts
  const posts = fetchMastodonPosts(instanceUrl, accessToken);

  // Convert fetched posts to WordPress format XML
  const wxrContent = createWxrFile(posts, siteUrl);

  // Create and save the XML file to Google Drive
  const blob = Utilities.newBlob(wxrContent, "application/xml", "mastodon_posts.xml");
  const file = DriveApp.createFile(blob);
  Logger.log("File created: " + file.getUrl());
}

// Function to fetch Mastodon posts
function fetchMastodonPosts(instanceUrl, accessToken) {
  const headers = { Authorization: "Bearer " + accessToken };
  const options = { method: "get", headers: headers };

  // Get user ID
  const userId = getUserId(instanceUrl, options);

  // Fetch all posts
  return fetchAllPosts(instanceUrl, userId, options);
}

// Function to get user ID
function getUserId(instanceUrl, options) {
  const userResponse = UrlFetchApp.fetch(`${instanceUrl}/api/v1/accounts/verify_credentials`, options);
  return JSON.parse(userResponse.getContentText()).id;
}

// Function to fetch all posts
function fetchAllPosts(instanceUrl, userId, options) {
  let posts = [];
  let url = `${instanceUrl}/api/v1/accounts/${userId}/statuses`;

  while (url) {
    const response = UrlFetchApp.fetch(url, options);
    const data = JSON.parse(response.getContentText());
    posts = posts.concat(data);

    // Get the URL of the next page
    url = getNextPageUrl(response);
  }

  return posts;
}

// Function to get the URL of the next page from response headers
function getNextPageUrl(response) {
  const links = response.getHeaders()["Link"];
  if (links && links.includes('rel="next"')) {
    return links.match(/<(.*)>; rel="next"/)[1];
  }
  return null;
}

// Function to strip HTML tags
function stripHtmlTags(str) {
  if (!str) return "";
  return str.toString().replace(/<[^>]*>/g, "");
}

// Function to convert HTML content to WordPress blocks
function convertToWordPressBlocks(htmlContent) {
  return htmlContent
    .replace(/<p>(.*?)<\/p>/g, (match, content) => `<!-- wp:paragraph -->\n<p>${content}</p>\n<!-- /wp:paragraph -->\n`)
    .replace(/<img src="(.*?)" alt="(.*?)" \/>/g, (match, src, alt) => `<!-- wp:image -->\n<figure class="wp-block-image"><img src="${src}" alt="${alt}" /></figure>\n<!-- /wp:image -->\n`);
}

// Function to format date for RSS
function formatPubDate(date) {
  const myDate = new Date(date);
  const days = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"];
  const months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"];
  return `${days[myDate.getUTCDay()]}, ${myDate.getUTCDate().toString().padStart(2, "0")} ${months[myDate.getUTCMonth()]} ${myDate.getUTCFullYear()} ${myDate.getUTCHours().toString().padStart(2, "0")}:${myDate.getUTCMinutes().toString().padStart(2, "0")}:${myDate.getUTCSeconds().toString().padStart(2, "0")} +0000`;
}

// Function to format date for WordPress
function formatDateToWordPress(date) {
  const myDate = new Date(date);
  return `${myDate.getFullYear()}-${String(myDate.getMonth() + 1).padStart(2, "0")}-${String(myDate.getDate()).padStart(2, "0")} ${String(myDate.getHours()).padStart(2, "0")}:${String(myDate.getMinutes()).padStart(2, "0")}:${String(myDate.getSeconds()).padStart(2, "0")}`;
}

// Function to create WordPress eXtended RSS (WXR) file
function createWxrFile(posts, siteUrl) {
  let xml = '<?xml version="1.0" encoding="UTF-8" ?>\n';
  xml += '<rss version="2.0" xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.2/">\n';
  xml += "<channel>\n";
  xml += "<wp:wxr_version>1.2</wp:wxr_version>\n";
  posts.forEach((post, index) => {
    xml += createWxrItem(post, index, siteUrl);
  });
  xml += "</channel>\n";
  xml += "</rss>\n";
  return xml;
}

// Function to create WXR item for each post
function createWxrItem(post, index, siteUrl) {
  // Skip replies and reblogs
  if (post.in_reply_to_id !== null || post.reblog !== null) {
    return "";
  }

  let content = post.content;
  const strippedContent = stripHtmlTags(post.content);
  const title = post.spoiler_text ? `⚠️${post.spoiler_text}` : post.content ? strippedContent.substring(0, 20) : "no title";
  const postDate = formatDateToWordPress(post.created_at);
  const postPubDate = formatPubDate(post.created_at);

  // Extract hashtags
  const hashtags = extractHashtags(content);

  // Convert hashtags to links
  content = convertHashtagsToLinks(content, siteUrl);

  // Add spoiler text if present
  if (post.spoiler_text) {
    content = `<p>${post.spoiler_text}</p>${content}`;
  }

  // Add media attachments if present
  if (post.media_attachments.length > 0) {
    post.media_attachments.forEach((media) => {
      const alt = media.description ? media.description : "";
      if (media.type === "image") {
        content += `\n\n<img src="${media.url}" alt="${alt}" />`;
      }
    });
  }

  // Construct WXR item
  let xmlItem = `
<item>
<title><![CDATA[${title}]]></title>
<content:encoded><![CDATA[${convertToWordPressBlocks(content)}]]></content:encoded>
<excerpt:encoded><![CDATA[]]></excerpt:encoded>
<pubDate><![CDATA[${postPubDate}]]></pubDate>
<dc:creator><![CDATA[dummy]]></dc:creator>
<wp:post_id>${index + 1}</wp:post_id>
<wp:post_date><![CDATA[${postDate}]]></wp:post_date>
<wp:post_date_gmt><![CDATA[${postDate}]]></wp:post_date_gmt>
<wp:post_modified><![CDATA[${postDate}]]></wp:post_modified>
<wp:post_modified_gmt><![CDATA[${postDate}]]></wp:post_modified_gmt>
<wp:post_type>post</wp:post_type>
<wp:status><![CDATA[publish]]></wp:status>
<category domain="category" nicename="toots"><![CDATA[Toots]]></category>
`;

  // Add hashtags as WordPress tags
  hashtags.forEach((tag) => {
    xmlItem += ` <category domain="post_tag" nicename="${tag}"><![CDATA[${tag}]]></category>\n`;
  });

  xmlItem += " </item>\n";
  return xmlItem;
}

// Function to extract hashtags from content
function extractHashtags(content) {
  const regex = /<a href="[^"]*" class="mention hashtag" rel="tag">#<span>([^<]+)<\/span><\/a>/g;
  const hashtags = [];
  let match;
  while ((match = regex.exec(content)) !== null) {
    hashtags.push(match[1]);
  }
  return hashtags;
}

// Function to convert hashtags to WordPress links
function convertHashtagsToLinks(content, siteUrl) {
  return content.replace(/<a href="[^"]*" class="mention hashtag" rel="tag">#<span>([^<]+)<\/span><\/a>/g, function (match, tag) {
    const tagUrl = `${siteUrl}/tag/${encodeURIComponent(tag)}/`;
    return `<a href="${tagUrl}" class="hashtag">#${tag}</a>`;
  });
}
Enter fullscreen mode Exit fullscreen mode

Download the WXR File

After running the script, an XML file will be created in Google Drive. Download this file.

The image shows the Google Apps Script editor with a script named

Import into WordPress

From the WordPress admin panel, go to "Tools" → "Import" and select "WordPress". Import the WXR file.
Before importing into the production environment, be sure to verify that the import works correctly in a test environment.

Import External Images into WordPress

Initially, images in imported articles are linked directly to the Mastodon instance. To import these images into WordPress, I used the Auto Upload Images plugin. Although it seems outdated, I couldn't find another plugin with the same functionality.

Using the Auto Upload Images Plugin

  1. Install and activate the plugin.
  2. From the WordPress admin panel, select "Tools" → "Replace External Images".
  3. Select the target articles from the post list.
  4. Click the "Replace" button to upload the images.

Set Featured Images

After importing images, set featured images as needed. To streamline this process, I used the XO Featured Image Tools plugin.

Using XO Featured Image Tools

  1. Install and activate the plugin.
  2. From the WordPress admin panel, select "Tools" → "Featured Image".
  3. Select the target posts and click "Create Featured Image from Image".
  4. Featured images will be generated automatically.

Impressions

When I decided to migrate my posts, I searched for tools but couldn't find any that met my requirements. Some tools were outdated or didn't fully import toots as WordPress posts.
Since no suitable tools were available, I decided to create my own script with the help of Perplexity. The most challenging part was generating a minimal WordPress eXtended RSS (WXR) file for the migration.
I couldn't find the WXR specifications, so I exported articles from WordPress and wrote the script by mimicking the exported content.
For now, the script works in my environment. Whether it will work in other environments or continue to work in the future is uncertain, but I plan to use it as needed.
Next, I plan to create a script using Google Apps Script to continuously import new posts from Mastodon.

Top comments (0)