DEV Community

Konnor Rogers
Konnor Rogers

Posted on

Pulling your dev.to posts down locally

Alright kids! Strap in! This is kind of a meta post since I'm writing it here on dev.to, but I'm about to show you how I pulled all my writings on dev.to down locally into a new Bridgetown site I made! (Which may feature a blog...who knows...)

First step, create a file to run your script. I'll be using Ruby here, but feel free to use whatever language you fancy.

First, let's grab all the articles I've written. Feel free to change username to match your dev.to username.

#!/usr/bin/env ruby

require "json"
require "net/http"
require "time"

username = "konnorrogers"

json = JSON.parse(
  Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
)
Enter fullscreen mode Exit fullscreen mode

Done right?!

Not quite! We still need to loop through all of our articles and gather the content we need. To help with exploring the dev.to API, you can write the return JSON to a file like this:

filename = "./articles.json"
File.write(filename,
  JSON.pretty_generate(
    JSON.parse(
      Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
    )
  ).to_s
)
Enter fullscreen mode Exit fullscreen mode

However, I already did this part and know what I need for my Bridgetown site, but feel free to use the snippet above for exploring the API. We use pretty_generate on the JSON so its easier to read.

Anyways, when looking through the data returned we don't get back the body_markdown which has the raw markdown for our posts. What we need to do is loop through all of our "articles" and then grab the body_markdown property for each one.

Here we go:

json.each do |obj|
  title = obj["title"]

  # turn anything thats not a number or letter into a hyphen, then squash reoccurring hypens into 1
  # Example: 
  #   "How can I pull my data from dev.to?"
  #=> "how-can-i-pull-my-data-from-dev-to"
  file_title = title.downcase.gsub(/[^0-9a-z]/i, "-").split(/-+/).join("-")

  # Produces a string like this: 2023-06-20 17:24:40 -0400
  date = Time.parse(obj["published_at"]).to_s

  # Pulls only yyyy-mm-dd
  file_date = date.split(" ").first

  # produces a path like this:
  # "src/_posts/2023-06-20-pulling-your-devto-posts-down-locally.md"
  file_path = "src/_posts/#{file_date}-#{file_title}.md"

  # don't waste an API call!
  next if File.exist(file_path)

  description = obj["description"]

  # Comma separated string
  categories = obj["tags"]

  article_path = obj["path"]
  article_url = URI("https://dev.to/api/articles#{article_path}")

  # One second seems to be the secret sauce to get around rate limiting.
  sleep 1

  # We can't get the info we need from the initial API call so we need to go to the article_url
  # to get the raw markdown.
  article_json = JSON.parse(Net::HTTP.get(article_url))
  body_markdown = article_json["body_markdown"]

  content = "---\n"
  content << "title: "#{title}\n\""
  content << "categories: #{categories}\n"
  content << "date: #{date}\n"
  content << "description: "|\n  #{description}\n\""
  content << "---\n\n"
  content << body_markdown

  File.write(file_path, content, mode: "w")
end
Enter fullscreen mode Exit fullscreen mode

Now let's run our script and watch the magic happen. This may take a while because dev.to rate limits to what seems to be about 1 API call per second, so if you have say 60 posts, it'll take roughly 1minute to gather all your files.

ruby my-script.rb

And here's what it pulled down for me!

src/_posts/2023-06-13-inserting-a-string-on-the-first-line-of-every-file-with-vim.md
src/_posts/2023-06-07-maintain-scroll-position-in-turbo-without-data-turbo-permanent.md
src/_posts/2023-05-30-button-to-vs-link-to-and-the-pitfalls-of-data-turbo-method.md
src/_posts/2023-05-22-rails-frontend-bundling-which-one-should-i-choose.md
src/_posts/2023-05-22-revisiting-box-sizing-best-practices.md
src/_posts/2023-04-08-how-to-keep-a-persistent-class-on-a-litelement.md
src/_posts/2022-11-22-jest-vitest-and-webcomponents.md
src/_posts/2022-10-20-actiontext-all-the-ways-to-render-an-actiontext-attachment.md
src/_posts/2022-10-10-actiontext-safe-listing-attributes-and-tags.md
src/_posts/2022-10-04-actiontext-modify-the-rendering-of-activestorage-attachments.md
src/_posts/2022-07-20-why-we-still-bundle-with-http-2-in-2022.md
src/_posts/2022-04-08-testing-scopes-with-rails.md
src/_posts/2022-04-07-adding-additional-actions-to-trix.md
src/_posts/2022-03-13-converting-a-callback-to-a-promise.md
src/_posts/2022-03-10-escaping-the-traditional-rails-form.md
src/_posts/2022-02-21-adding-text-alignment-to-trix.md
src/_posts/2022-01-29-modifying-the-default-toolbar-in-trix.md
src/_posts/2022-01-29-exploring-trix.md
src/_posts/2021-11-30-cross-browser-vertical-slider-using-input-type-range.md
src/_posts/2021-11-01-rebuilding-activestorage-first-impressions.md
src/_posts/2021-10-27-why-jest-is-not-for-me.md
src/_posts/2021-10-06-frontend-bundler-braindump.md
src/_posts/2021-07-08-writing-code-block-highlighting-to-a-css-file-with-rouge.md
src/_posts/2021-07-06-creating-reusable-flashes-in-rails-using-shoelace.md
src/_posts/2021-07-03-pulling-down-somebody-s-fork-with-git.md
src/_posts/2021-07-02-fixing-fatal-error-ineffective-mark-compacts-near-heap-limit-allocation-failed-javascript-heap-out-of-memory-in-webpacker.md
src/_posts/2021-07-02-migrating-hls-videos-to-mp4-format-in-rails.md
src/_posts/2021-06-25-querying-activestorage-attachments.md
src/_posts/2021-05-25-case-switch-statement-in-ruby.md
src/_posts/2021-05-15-arel-notes.md
Enter fullscreen mode Exit fullscreen mode

Best of luck and hopefully this gives you some motivation to dust off your self-hosted blog! I personally have been writing on dev.to because my old blog site is a 4 year old Gatsby site I have exactly 0 hope of ever getting running again. So here's to new beginnings! 🥂

Top comments (4)

Collapse
 
stephanlamoureux profile image
Stephan Lamoureux

Nice! I might use this. I currently just use their api to make cards on my portfolio for my posts. Also I didn’t know you were a fellow Rhode Islander ⚓

Collapse
 
konnorrogers profile image
Konnor Rogers

Hell yea! Born and raised! 27 years and counting! Small world!

Collapse
 
maxime1992 profile image
Maxime

Hey, I've got a similar project in the making, just FYI to speed up the process you can instead query the endpoint without specifying the ID and pass an argument to say you want pagination set to 1000 (which is the max but who has 1000 blog posts ? 😁).

I don't have the exact code in hands to share as I'm on my phone but you can probably find it easily in the doc. But yeah up to 1000 at once, it's damn fast to retrieve them all 🔥

Collapse
 
konnorrogers profile image
Konnor Rogers

Oh snap, if I had more posts May be worth sleuthing around the pagination API