Alright kids! Strap in! This is kind of a meta post since I'm writing it here on dev.to, but I'm about to show you how I pulled all my writings on dev.to down locally into a new Bridgetown site I made! (Which may feature a blog...who knows...)
First step, create a file to run your script. I'll be using Ruby here, but feel free to use whatever language you fancy.
First, let's grab all the articles I've written. Feel free to change username
to match your dev.to username.
#!/usr/bin/env ruby
require "json"
require "net/http"
require "time"
username = "konnorrogers"
json = JSON.parse(
Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
)
Done right?!
Not quite! We still need to loop through all of our articles and gather the content we need. To help with exploring the dev.to API, you can write the return JSON to a file like this:
filename = "./articles.json"
File.write(filename,
JSON.pretty_generate(
JSON.parse(
Net::HTTP.get(URI("https://dev.to/api/articles?username
=#{username}"))
)
).to_s
)
However, I already did this part and know what I need for my Bridgetown site, but feel free to use the snippet above for exploring the API. We use pretty_generate
on the JSON so its easier to read.
Anyways, when looking through the data returned we don't get back the body_markdown
which has the raw markdown for our posts. What we need to do is loop through all of our "articles" and then grab the body_markdown
property for each one.
Here we go:
json.each do |obj|
title = obj["title"]
# turn anything thats not a number or letter into a hyphen, then squash reoccurring hypens into 1
# Example:
# "How can I pull my data from dev.to?"
#=> "how-can-i-pull-my-data-from-dev-to"
file_title = title.downcase.gsub(/[^0-9a-z]/i, "-").split(/-+/).join("-")
# Produces a string like this: 2023-06-20 17:24:40 -0400
date = Time.parse(obj["published_at"]).to_s
# Pulls only yyyy-mm-dd
file_date = date.split(" ").first
# produces a path like this:
# "src/_posts/2023-06-20-pulling-your-devto-posts-down-locally.md"
file_path = "src/_posts/#{file_date}-#{file_title}.md"
# don't waste an API call!
next if File.exist(file_path)
description = obj["description"]
# Comma separated string
categories = obj["tags"]
article_path = obj["path"]
article_url = URI("https://dev.to/api/articles#{article_path}")
# One second seems to be the secret sauce to get around rate limiting.
sleep 1
# We can't get the info we need from the initial API call so we need to go to the article_url
# to get the raw markdown.
article_json = JSON.parse(Net::HTTP.get(article_url))
body_markdown = article_json["body_markdown"]
content = "---\n"
content << "title: "#{title}\n\""
content << "categories: #{categories}\n"
content << "date: #{date}\n"
content << "description: "|\n #{description}\n\""
content << "---\n\n"
content << body_markdown
File.write(file_path, content, mode: "w")
end
Now let's run our script and watch the magic happen. This may take a while because dev.to rate limits to what seems to be about 1 API call per second, so if you have say 60 posts, it'll take roughly 1minute to gather all your files.
ruby my-script.rb
And here's what it pulled down for me!
src/_posts/2023-06-13-inserting-a-string-on-the-first-line-of-every-file-with-vim.md
src/_posts/2023-06-07-maintain-scroll-position-in-turbo-without-data-turbo-permanent.md
src/_posts/2023-05-30-button-to-vs-link-to-and-the-pitfalls-of-data-turbo-method.md
src/_posts/2023-05-22-rails-frontend-bundling-which-one-should-i-choose.md
src/_posts/2023-05-22-revisiting-box-sizing-best-practices.md
src/_posts/2023-04-08-how-to-keep-a-persistent-class-on-a-litelement.md
src/_posts/2022-11-22-jest-vitest-and-webcomponents.md
src/_posts/2022-10-20-actiontext-all-the-ways-to-render-an-actiontext-attachment.md
src/_posts/2022-10-10-actiontext-safe-listing-attributes-and-tags.md
src/_posts/2022-10-04-actiontext-modify-the-rendering-of-activestorage-attachments.md
src/_posts/2022-07-20-why-we-still-bundle-with-http-2-in-2022.md
src/_posts/2022-04-08-testing-scopes-with-rails.md
src/_posts/2022-04-07-adding-additional-actions-to-trix.md
src/_posts/2022-03-13-converting-a-callback-to-a-promise.md
src/_posts/2022-03-10-escaping-the-traditional-rails-form.md
src/_posts/2022-02-21-adding-text-alignment-to-trix.md
src/_posts/2022-01-29-modifying-the-default-toolbar-in-trix.md
src/_posts/2022-01-29-exploring-trix.md
src/_posts/2021-11-30-cross-browser-vertical-slider-using-input-type-range.md
src/_posts/2021-11-01-rebuilding-activestorage-first-impressions.md
src/_posts/2021-10-27-why-jest-is-not-for-me.md
src/_posts/2021-10-06-frontend-bundler-braindump.md
src/_posts/2021-07-08-writing-code-block-highlighting-to-a-css-file-with-rouge.md
src/_posts/2021-07-06-creating-reusable-flashes-in-rails-using-shoelace.md
src/_posts/2021-07-03-pulling-down-somebody-s-fork-with-git.md
src/_posts/2021-07-02-fixing-fatal-error-ineffective-mark-compacts-near-heap-limit-allocation-failed-javascript-heap-out-of-memory-in-webpacker.md
src/_posts/2021-07-02-migrating-hls-videos-to-mp4-format-in-rails.md
src/_posts/2021-06-25-querying-activestorage-attachments.md
src/_posts/2021-05-25-case-switch-statement-in-ruby.md
src/_posts/2021-05-15-arel-notes.md
Best of luck and hopefully this gives you some motivation to dust off your self-hosted blog! I personally have been writing on dev.to because my old blog site is a 4 year old Gatsby site I have exactly 0 hope of ever getting running again. So here's to new beginnings! 🥂
Top comments (4)
Nice! I might use this. I currently just use their api to make cards on my portfolio for my posts. Also I didn’t know you were a fellow Rhode Islander ⚓
Hell yea! Born and raised! 27 years and counting! Small world!
Hey, I've got a similar project in the making, just FYI to speed up the process you can instead query the endpoint without specifying the ID and pass an argument to say you want pagination set to 1000 (which is the max but who has 1000 blog posts ? 😁).
I don't have the exact code in hands to share as I'm on my phone but you can probably find it easily in the doc. But yeah up to 1000 at once, it's damn fast to retrieve them all 🔥
Oh snap, if I had more posts May be worth sleuthing around the pagination API