DEV Community

Cover image for TIL: query the NPM paginated Rest API with Elixir
NDREAN
NDREAN

Posted on • Updated on

TIL: query the NPM paginated Rest API with Elixir

These are notes on how to use Elixir to find all the NPM packages published by AWS with their download count. The total count is 1676. We first query the npms-api search endpoint: we give a string and it returns a list of matching Javascript packages. The endpoint is paginated. With this list, we query the NPM registry to get statistics for each package.
We send a total of approx 1750 requests. The longer one is the paginated API with a response time of approx 300ms per query. Then we send 1700 queries to get the stats on each package with an average response of 10ms. We get the result on average in around 25s.

We used Elixir to achieve this with the default client Finch. We used an easy path: build the full list of found packages, and then query the details for each element of the list.

We used Stream.resource to build the list of packages. We used the total count returned by the endpoint to handle the pagination and increment a counter on each iteration.

We then use async_stream to query the second endpoint. It returns statistics on a given package. We retrieve the downloaded count during a given period.

We then Enum.sort_by to order the list of maps on a given key.

With the result, we run a Task - a concurrent process - to prettify the result and save in it a file.

We use the function tap which passes the data by since we need the data unchanged for the final step. This allows us to keep the flow.

We eventually prettify the data into a list of maps %{"package_name" => downloaded_count}.

Elixir can nicely chain all these streams.

defmodule Npm do
  require Logger

  @registry "https://api.npmjs.org"
  @search_point "https://api.npms.io/v2/search"

  @starting "2022-01-01"
  @ending "2023-01-01"
  @search "@aws-sdk/client"
  @aws_npm_packages "aws-npm-packages.json"

  def find(save? \\ false, string \\ @search, starting \\ @starting, ending \\ @ending) do

    check_response = fn response ->
      case response do
        {:ok, result} ->
          result

        {:error, reason} ->
          {:error, reason}
      end
    end

    #  the optional "save to file"
    save_to_file = fn list ->
      Logger.info(%{length: length(list)})

      Task.start(fn ->
        case Poison.encode(list, %{pretty: true, indent: 2}) do
          {:ok, result} -> File.write!("../aws-npm-packages.json", result)
          {:error, reason} -> reason
        end
      end)
    end

    # the iterating function in Stream.resource
    next = fn {data, page} ->
      {response, total} = search(string, 25 * page)

      case page * 25 >= total do
        true ->
          {:halt, data}

        false ->
          {response, {data, page + 1}}
      end
    end

    try do
      Stream.resource(
        fn -> {[], 0} end,
        &next.(&1),
        fn _ -> nil end
      )
      |> Task.async_stream(&downloaded(&1, starting, ending))
      |> Stream.map(&check_response.(&1))
      |> Enum.sort_by(&Map.get(&1, "downloads"), :desc)
      |> tap(fn data -> if save?, do: save_to_file.(data) end)
      |> Enum.map(fn %{"downloads" => d, "package" => name} ->
        Map.put(%{}, name, d)
      end)

    rescue
      e ->
        Logger.warn(e)
    end
  end

  # we send a tuple {stream, total}
  def search(string, from \\ 0) do
    with {:ok, %{body: body}} <-
           Finch.build(:get, @search_point <> "?q=#{string}&size=25&from=#{from}")
           |> Finch.request(Back.Finch),
         {:ok, %{"results" => results, "total" => total}} <- Jason.decode(body) do
      {
        Stream.filter(results, fn package ->
          Map.has_key?(package, "flags") === false &&
            get_in(package, ["package", "name"]) |> String.contains?(string)
        end)
        |> Stream.map(&get_in(&1, ["package", "name"])),
        total
      }
    else
      {:error, reason} ->
        reason
    end

  # the second endpoint
  def downloaded(package_name, start, ending) do

    path = @registry <> "/downloads/point/" <> "#{start}" <> ":" <> "#{ending}" <> "/" <> "#{package_name}"

    with {:ok, %{body: result}} <-
           Finch.build(:get, path) |> Finch.request(Back.Finch),
         {:ok, response} <- Jason.decode(result) do
      response
    else
      {:error, reason} -> reason
    end
  end
Enter fullscreen mode Exit fullscreen mode

The usage is:

iex> Npm.find("@aws-sdk/client", "2022-01-01", "2022-03-01")
Enter fullscreen mode Exit fullscreen mode

aws-sdk-graphic

The same result with @google-cloud/:

google-cloud

If you have Livebook, installed, you can run a session with the button:

Run in Livebook

If not, you can easily run a Livebook from Docker. Run the image:

docker run -p 8080:8080 -p 8081:8081 --pull always -e LIVEBOOK_PASSWORD="securesecret" livebook/livebook
Enter fullscreen mode Exit fullscreen mode

and then from another terminal, execute:

open http://localhost:8080/import?url=https://github.com/ndrean/gruland/blob/main/livebook.livemd
Enter fullscreen mode Exit fullscreen mode

Top comments (0)