Konnor Rogers

Posted on Jun 20, 2023

Pulling your dev.to posts down locally

#webdev #bridgetownrb #ruby #tutorial

Alright kids! Strap in! This is kind of a meta post since I'm writing it here on dev.to, but I'm about to show you how I pulled all my writings on dev.to down locally into a new Bridgetown site I made! (Which may feature a blog...who knows...)

First step, create a file to run your script. I'll be using Ruby here, but feel free to use whatever language you fancy.

First, let's grab all the articles I've written. Feel free to change username to match your dev.to username.

#!/usr/bin/env ruby

require "json"
require "net/http"
require "time"

username = "konnorrogers"

json = JSON.parse(
  Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
)

Done right?!

Not quite! We still need to loop through all of our articles and gather the content we need. To help with exploring the dev.to API, you can write the return JSON to a file like this:

filename = "./articles.json"
File.write(filename,
  JSON.pretty_generate(
    JSON.parse(
      Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
    )
  ).to_s
)

However, I already did this part and know what I need for my Bridgetown site, but feel free to use the snippet above for exploring the API. We use pretty_generate on the JSON so its easier to read.

Anyways, when looking through the data returned we don't get back the body_markdown which has the raw markdown for our posts. What we need to do is loop through all of our "articles" and then grab the body_markdown property for each one.

Here we go:

json.each do |obj|
  title = obj["title"]

  # turn anything thats not a number or letter into a hyphen, then squash reoccurring hypens into 1
  # Example: 
  #   "How can I pull my data from dev.to?"
  #=> "how-can-i-pull-my-data-from-dev-to"
  file_title = title.downcase.gsub(/[^0-9a-z]/i, "-").split(/-+/).join("-")

  # Produces a string like this: 2023-06-20 17:24:40 -0400
  date = Time.parse(obj["published_at"]).to_s

  # Pulls only yyyy-mm-dd
  file_date = date.split(" ").first

  # produces a path like this:
  # "src/_posts/2023-06-20-pulling-your-devto-posts-down-locally.md"
  file_path = "src/_posts/#{file_date}-#{file_title}.md"

  # don't waste an API call!
  next if File.exist(file_path)

  description = obj["description"]

  # Comma separated string
  categories = obj["tags"]

  article_path = obj["path"]
  article_url = URI("https://dev.to/api/articles#{article_path}")

  # One second seems to be the secret sauce to get around rate limiting.
  sleep 1

  # We can't get the info we need from the initial API call so we need to go to the article_url
  # to get the raw markdown.
  article_json = JSON.parse(Net::HTTP.get(article_url))
  body_markdown = article_json["body_markdown"]

  content = "---\n"
  content << "title: "#{title}\n\""
  content << "categories: #{categories}\n"
  content << "date: #{date}\n"
  content << "description: "|\n  #{description}\n\""
  content << "---\n\n"
  content << body_markdown

  File.write(file_path, content, mode: "w")
end

Now let's run our script and watch the magic happen. This may take a while because dev.to rate limits to what seems to be about 1 API call per second, so if you have say 60 posts, it'll take roughly 1minute to gather all your files.

ruby my-script.rb

And here's what it pulled down for me!

src/_posts/2023-06-13-inserting-a-string-on-the-first-line-of-every-file-with-vim.md
src/_posts/2023-06-07-maintain-scroll-position-in-turbo-without-data-turbo-permanent.md
src/_posts/2023-05-30-button-to-vs-link-to-and-the-pitfalls-of-data-turbo-method.md
src/_posts/2023-05-22-rails-frontend-bundling-which-one-should-i-choose.md
src/_posts/2023-05-22-revisiting-box-sizing-best-practices.md
src/_posts/2023-04-08-how-to-keep-a-persistent-class-on-a-litelement.md
src/_posts/2022-11-22-jest-vitest-and-webcomponents.md
src/_posts/2022-10-20-actiontext-all-the-ways-to-render-an-actiontext-attachment.md
src/_posts/2022-10-10-actiontext-safe-listing-attributes-and-tags.md
src/_posts/2022-10-04-actiontext-modify-the-rendering-of-activestorage-attachments.md
src/_posts/2022-07-20-why-we-still-bundle-with-http-2-in-2022.md
src/_posts/2022-04-08-testing-scopes-with-rails.md
src/_posts/2022-04-07-adding-additional-actions-to-trix.md
src/_posts/2022-03-13-converting-a-callback-to-a-promise.md
src/_posts/2022-03-10-escaping-the-traditional-rails-form.md
src/_posts/2022-02-21-adding-text-alignment-to-trix.md
src/_posts/2022-01-29-modifying-the-default-toolbar-in-trix.md
src/_posts/2022-01-29-exploring-trix.md
src/_posts/2021-11-30-cross-browser-vertical-slider-using-input-type-range.md
src/_posts/2021-11-01-rebuilding-activestorage-first-impressions.md
src/_posts/2021-10-27-why-jest-is-not-for-me.md
src/_posts/2021-10-06-frontend-bundler-braindump.md
src/_posts/2021-07-08-writing-code-block-highlighting-to-a-css-file-with-rouge.md
src/_posts/2021-07-06-creating-reusable-flashes-in-rails-using-shoelace.md
src/_posts/2021-07-03-pulling-down-somebody-s-fork-with-git.md
src/_posts/2021-07-02-fixing-fatal-error-ineffective-mark-compacts-near-heap-limit-allocation-failed-javascript-heap-out-of-memory-in-webpacker.md
src/_posts/2021-07-02-migrating-hls-videos-to-mp4-format-in-rails.md
src/_posts/2021-06-25-querying-activestorage-attachments.md
src/_posts/2021-05-25-case-switch-statement-in-ruby.md
src/_posts/2021-05-15-arel-notes.md

Best of luck and hopefully this gives you some motivation to dust off your self-hosted blog! I personally have been writing on dev.to because my old blog site is a 4 year old Gatsby site I have exactly 0 hope of ever getting running again. So here's to new beginnings! 🥂

Top comments (4)

Stephan Lamoureux • Jun 21 '23

Nice! I might use this. I currently just use their api to make cards on my portfolio for my posts. Also I didn’t know you were a fellow Rhode Islander ⚓

Konnor Rogers • Jun 22 '23

Hell yea! Born and raised! 27 years and counting! Small world!

Maxime • Jun 22 '23

Hey, I've got a similar project in the making, just FYI to speed up the process you can instead query the endpoint without specifying the ID and pass an argument to say you want pagination set to 1000 (which is the max but who has 1000 blog posts ? 😁).

I don't have the exact code in hands to share as I'm on my phone but you can probably find it easily in the doc. But yeah up to 1000 at once, it's damn fast to retrieve them all 🔥

Konnor Rogers • Jun 25 '23

Oh snap, if I had more posts May be worth sleuthing around the pagination API

DEV Community

Pulling your dev.to posts down locally

Top comments (4)

Read next

CCSP Study Guide: How to Prepare for the CCSP Exam

Top 10 Web Development Trends for 2025: Shaping the Future of the Internet!

Revolutionizing Payments: The WhiteBIT Crypto Card in Action

Managing Multiple EF Core DbContexts in a Single Application