Andy Haskell

Posted on Jan 18, 2022

#GopherDiggingRuby: Make a dev.to image link fetcher in Ruby

#ruby #codenewbie #webdev #tutorial

I write a lot of blog posts on dev.to and really like using Markdown for blogging. But one thing that would make the blogging process even more convenient, is if I had all the image links I've made on previous dev.to posts in one place for reusing images.

So in this tutorial, I will show how we can use Ruby packages to make a CSV file of every image link in your blog posts from the command line, using the dev.to/Forem API.

We will:

💻 Send an authenticated request to the dev.to API with the standard library net/http package and the Forem API
🐈 Deserialize JSON into objects of a custom class with the standard library json package
💎 Retrieve the image links using the commonmarker gem
💼 Serialize objects into CSV files with the standard library csv package

This tutorial assumes you're familiar with JSON, HTTP requests, and the basics of Ruby.

Contacting the dev.to API

The first thing you need in order to talk to a web API is an API client. An API client is some code that sends authenticated HTTP requests to the web API. We can get a client in one of two ways:

Search for a Ruby Gem of a client for the dev.to/Forem APIs; a lot of people have made code repositories for talking to websites' APIs so that other people can use them.
Make our own API client from scratch

Since we're only talking to one HTTP endpoint on dev.to's API in this tutorial, we'll go with the second option. So let's jump to dev.to's API documentation!

The endpoint we're talking to is the User's Articles endpoint, which lists all articles belonging to the user calling that endpoint as JSON. Looking at the request and response samples on the right, we will find that:

The request is a GET request to the endpoint /articles/me
The response is an array of JSON objects
- Each JSON object in the array contains many fields for all the data representing the article, namely title for the title of the article, and body_markdown for the Markdown of the entire text of the article
The sample cURL request contacting that endpoint contains the header api-key, which tells dev.to who you are and gives you access to your own account's data.

So we need a client class that can:

Send an HTTP request to the dev.to API's /articles/me endpoint, with an API key as authentication
Deserialize the JSON response to Ruby objects, giving us access to the fields we are interested in.

If you're following along, make a folder for your Ruby app and write this code to app.rb in that folder:

require 'net/http'
require 'json'

class DevToClient
  def initialize(api_key)
    @api_key = api_key
  end

  def get_my_articles
    # [TODO] Send HTTP request to /articles/me and deserialize
    # the JSON response
  end
end

We have a DevToClient class with two methods:

initialize as its constructor. In the constructor, we pass in our API key, and it gets stored in the instance variable @api_key
get_my_articles, which will send an HTTP request to dev.to's /articles/me endpoint that uses the @api_key and deserialize the response

Now for getting the HTTP response. Looking at the documentation for net/http, there is the method HTTP.get_response for sending a GET request to the URL passed in and getting back an HTTP response.

From the get_response method's documentation, in addition to a URL, we also are able to optionally pass in a hash of any request headers we want to pass in. So we can pass in an api-key header with this code:

  def get_my_articles
    Net::HTTP.get_response(
      URI('https://dev.to/api/articles/me'),
      { 'api-key': @api_key }
    )
  end

We send the request to dev.to's /api/articles/me endpoint with an api-key header containing the value of the DevToClient's @api_key instance variable.

To run this, you are going to need to get your own API key for the dev.to API. To do that, first, make a dev.to account if you don't have one already. Then, you can get your API key by following the Authentication instructions on the dev.to API docs.

⚠️WARNING!⚠️ For any web API you are working with, DO NOT share your API key or other forms of authentication with anyone; don't post it online or email it to your friends, and also don't commit it in your code! If someone else gets ahold of your API key, they will be able to impersonate you on that API and access your account data! If you suspect that someone has obtained one of your authentication keys or secrets, you should have that key/secret invalidated and then a fresh API key/secret created to protect your account.

Now, your save your API key to the environment variable DEVTO_API_KEY. Then, at the bottom of app.rb, add this code:

api_key = ENV['DEVTO_API_KEY']
puts DevToClient.new(api_key).get_my_articles

Run the code with ruby app.rb and you should see terminal output like this:

#<Net::HTTPOK:0x00007fc20a92f800>

We got back a response of class HTTPOK (which inherits from HTTPSuccess and in turn inherits from HTTPResponse). So now we have our response, so let's parse it to make Ruby objects for each article.

JSON parsing in Ruby

In addition to net/http, the Ruby standard library has a json package for serializing deserializing JSON, and inside that package there is the method JSON.parse. So if we did Ruby code like

sloth_json = <<EOF
{
    "sci_name":    "Bradypus",
    "common_name": "Three-toed sloth",
    "claw_count":  3
}
EOF

sloth = JSON.parse(sloth_json)
sci_name = sloth["sci_name"]
common_name = sloth["common_name"]
claw_count = sloth["claw_count"]
puts "The #{sci_name} (#{common_name}) has #{claw_count} claws"

and then ran ruby app.rb, we would get output like this:

The Bradypus (Three-toed sloth) has 3 claws

The object returned from JSON.parse is a Ruby hash, with its field names becoming the hash's keys.

Let's try JSON.parse to return a Ruby object from DevToClient#get_my_articles:

  def get_my_articles
    res = Net::HTTP.get_response(
      URI('https://dev.to/api/articles/me'),
      { 'api-key': @api_key }
    )

    if res.code.to_i > 299 || res.code.to_i < 200
      raise "got status code #{res.code}"
    end
    JSON.parse res.body
  end

Now, if we got a status code besides 2xx, we raise an error. Otherwise, we return the result of parsing the response body.

If you then ran code like

puts DevToClient.new(api_key).get_my_articles

you will see that the Ruby object that the response body deserialized to was an array of Ruby hashes. So we could do something like this:

DevToClient.new(api_key).get_my_articles.each do |article|
  md = article["body_markdown"]

  # now use a Markdown file to find every image link in
  # the article
end

But what if we wanted a DevToArticle class that handles digging for all the image links, and we wanted to deserialize our JSON to an array of DevToArticles instead of hashes?

Let's start by making a DevToArticle class:

class DevToArticle
  attr_accessor :id, :title, :body_markdown, :url

  def initialize
    # [TODO] Add deserialization logic here
  end

  def get_article_images
    # [TODO] Add Markdown parsing for the article's
    # @body_markdown
  end
end

Since Ruby doesn't directly know if that the JSON it's getting is supposed to be a DevToArticle, calling JSON.parse will return an array of Ruby hashes. So we will need just a bit of extra logic for converting those hashes to DevToArticles.

I wasn't sure how to do this at first; in Go, the main programming language I work with, I would be doing this using code like this:

type DevToArticle struct {
    ID           int    `json:"id"`
    Title        string `json:"title"`
    BodyMarkdown string `json:"body_markdown"`
    URL          string `json:"url"`
}

func (d *DevToClient) GetMyArticles() ([]DevToArticle, error) {
    // get the HTTP response for the "user's articles" API
    // endpoint here

    var articles []DevToArticle
    if err := json.NewDecoder(res).Decode(&articles); err != nil {
        return nil, err
    }
    return articles, nil
}

I searched for how to deserialize to a custom class rather than a hash, and after asking about that on Twitter, Jamie Gaskins told me that there isn't really a standardized way in Ruby to deserialize to a class, but you are able to give your Ruby class an initialize method that takes in a hash. So based on that advice, in our DevToArticle#initialize class, the deserialization logic would look like this:

  def initialize(attributes)
    @id = attributes['id']
    @title = attributes['title']
    @body_markdown = attributes['body_markdown']
    @url = attributes['url']
  end

For each field we want an instance variable for, we just pull it out of the attributes hash passed in.

Note, by the way, that this also gives us control of the casing scheme for the deserialized objects. In Ruby, the standardized casing for instance variables is snake_case, and that's the casing the Forem API uses, but what if Forem was a camelCase API instead? @body_markdown still is able to be snake_case even if bodyMarkdown in the hash is camelCase:

    @body_markdown = attributes['bodyMarkdown']

Now, to have DevToClient#get_my_articles return an array of DevToArticles instead of an array of hashes, we can do this:

    articles = JSON.parse(res.body)
    articles.map { |article| DevToArticle.new article }

By passing that block into articles.map, we get back an array of DevToArticles created from each hash in the articles array, so now get_my_articles returns the type we want: an array of DevToArticles. Now let's jump into the Markdown of those articles all their image links!

Markdown parsing with commonmarker

Unlike HTTP and JSON, the standard library in Ruby doesn't have a Markdown package, so we can either write our own Markdown parser, or use a Markdown-parsing Ruby Gem.

And it turns out that there's a popular Ruby Gem that lets us parse a Markdown file and then walk over its nodes (nodes as in text, links, images, etc): CommonMarker! To get it, first run bundle init to create a Gemfile, then in the Gemfile, add the line

gem "commonmarker"

Then run bundle install. If CommonMarker it successfully installs, you should be able to use it in your Ruby code.

To start, add require 'commonmarker' to the top of app.rb, then in DevToArticle#get_article_images, add this code:

  def get_article_images
    doc = CommonMarker.render_doc(@body_markdown, :DEFAULT)
    puts doc
  end

If you run that function in app.rb, you will get output for an article like:

#<CommonMarker::Node:0x00007fca8718f228>

indicating that we were able to parse the Markdown in @body_markdown, converting it to a CommonMarker::Node object.

Following this example in the CommonMarker documentation, we can walk over all the nodes in the document with code like this:

  def get_article_images
    doc = CommonMarker.render_doc(@body_markdown, :DEFAULT)
    doc.walk do |node|
      puts node.type
    end
  end

Now in the do block, we are looking at each Node and seeing what type of node it is. So if you run this code, you might see output like this:

text
code
text
paragraph
image
text
text
paragraph

We're only interested in image nodes, so we'll add an if statement to check the node's type, which according to the documentation, is the Ruby symbol :image according to the new(p1) docs.

    doc.walk do |node|
      if node.type == :image
        # [TODO] retrieve the node's content
      end
    end

Now we're only processing image links. And a Markdown image link has two parts: descriptive alt text, which screen reader software reads when viewing images, and the URL of the image. So we need to find ways to get both of those.

Looking at the CommonMarker documentation, the Node method for getting the alt text is to_plaintext, and the Node method for getting the URL of the image is url. So now, we can return the parts to the image link:

  def get_article_images
    doc = CommonMarker.render_doc(@body_markdown, :DEFAULT)
    image_links = []
    doc.walk do |node|
      if node.type == :image
        image_links.push [
          node.to_plaintext.delete_suffix("\n"), node.url
        ]
      end
    end

    image_links
  end

So now, we have all the data we'll need for serializing to a CSV file!

Serializing your image links to a CSV file

The Ruby standard library also comes with a csv package for parsing CSV files, or generating them from arrays of data. Each row will be one image link, including:

The alt text of the image link
The URL of the image
The ID of the article that the image link came from
The title of the article that the image link came from
The URL of the article that the image link came from

So we will want CSV header text like:

Alt Text,Image URL,Article ID,Article Title,Article URL

And for each row in the CSV, we will want an DevToImageLink Ruby class to represent all the data in that row

class DevToImageLink
  def initialize(article, image_alt, image_url)
    @article = article
    @image_alt = image_alt
    @image_url = image_url
  end

  def to_csv_row
    [@image_alt, @image_url, @article.id, @article.title, @article.url]
  end
end

In the initialize method we pass in the DevToArticle for the image link, and the alt text and URL of the image, to become instance variables. And in the to_csv_row method, all of these fields are put into a Ruby array.

Heading back to the DevToArticle class, now that we have the DevToImageLink class defined, let's have DevToArticle#get_article_images return an array of DevToImageLinks, rather than an array of arrays:

      if node.type == :image
-       image_links.push [node.to_plaintext, node.url]
+       image_alt = node.to_plaintext
+       image_url = node.url
+       image_links.push DevToImageLink.new(self, image_alt, image_url)
      end

Now that that's all set, let's add a top-level get_image_links_csv function that will convert our article to a CSV.

def get_image_links_csv(api_key)
  CSV.generate do |csv|
    csv << [
      'Alt Text','Image URL','Article ID','Article Title','Article URL'
    ]

    DevToClient.new(api_key).get_my_articles.each do |article|
      article.get_article_images.each do |image_link|
        csv << image_link.to_csv_row
      end
    end
  end
end

The function CSV.generate takes in a block and returns the CSV string generated in that block.

In the first line inside the block, we pass in our CSV headers with the CSV's << method, so they will serve as the first line of the CSV.

Now, we loop over the image links in each of the articles returned by DevToClient#get_my_articles. For each image link, we call the DevToImageLink#to_csv_row method, and then load the returned array into the CSV.

Finally, the return value of CSV#generate is a string in CSV format. So now, we can use that code like this:

puts get_image_links_csv(api_key)

Using three standard library packages and a gem, we were able to make a convenient script for getting all our dev.to image links and converting them to a CSV. In my next Ruby tutorial, I will be looking at using a gem for giving this script a better user experience so it's easier to search the CSV for the image you want.