These are notes on how to use Elixir
to find all the NPM packages published by AWS with their download count.
The total count is 1676. We start and query the npms-api search endpoint: we give a string and it returns a list of matching Javascript packages. The endpoint is paginated. With this list, we query the NPM registry to get statistics for each package.
We send a total of approx 1750 requests. The longer one is the paginated API with a response time of approx 200-250ms per query. Then we send 1700 queries to get the stats on each package with an average response of 10-15ms. We get the result on average in around 20s (compared to a sequential 1700/25*0.200=13s + 1700*.0.010=17s = 30s).
We used Elixir
to achieve this with the HTTP client Finch
to stream and the Rust based JSON parser jsonrs
.
We used an easy path: build the full list of found packages, and then query the details for each element of the list.
We used Stream.resource to build the list of packages. We used the total count returned by the endpoint to handle the pagination and increment a counter on each iteration.
We then use async_stream to query the second endpoint. It returns statistics on a given package. We retrieve the downloaded count during a given period.
We then Enum.sort_by to order the list of maps on a given key.
With the result, we run a side-effect as a Task
: it saves the data in a file. Since we need the data unchanged for the final step, use the function tap and the data passes by. This allows us to keep the flow and run this side -effect.
We eventually prettify the data into a list of maps %{"package_name" => downloaded_count}
.
Elixir
can nicely chain all these streams. In a Livebook, the following code will default to the AWS packages.
Mix.install([
{:finch, "~> 0.16.0"},
{:jsonrs, "~> 0.3.1"}
])
Supervisor.start_link(
[
{Finch, name: MyFinch},
{Task.Supervisor, name: MyTaskSup}
],
strategy: :one_for_one,
name: MySup
)
defmodule Npm do
require Logger
@registry "https://api.npmjs.org"
@search_point "https://api.npms.io/v2/search"
@starting "2022-01-01"
@ending "2023-01-01"
@search "@aws-sdk/client"
@aws_npm_packages "aws-npm-packages.json"
def find(save? \\ false, string \\ @search, starting \\ @starting, ending \\ @ending) do
check_response = fn response ->
case response do
{:ok, result} ->
result
{:error, reason} ->
{:error, reason}
end
end
# the optional "save to file"
save_to_file = fn list ->
Logger.info(%{length: length(list)})
Task.Supervisor.async_nolink(MyTaskSup, fn ->
case Jsonrs.encode(list, lean: true, pretty: true) do
{:ok, result} -> File.write!(@aws_npm_packages, result)
{:error, reason} -> reason
end
end)
end
# the iterating function in Stream.resource
next = fn {data, page} ->
{response, total} = search(string, 25 * page)
case page * 25 >= total do
true ->
{:halt, data}
false ->
{response, {data, page + 1}}
end
end
try do
Stream.resource(
fn -> {[], 0} end,
&next.(&1),
fn _ -> nil end
)
|> Task.async_stream(&downloaded(&1, starting, ending))
|> Stream.map(&check_response.(&1))
|> Enum.sort_by(&Map.get(&1, "downloads"), :desc)
|> tap(fn data -> if save?, do: save_to_file.(data) end)
|> Enum.map(fn %{"downloads" => d, "package" => name} ->
Map.put(%{}, name, d)
end)
rescue
e ->
Logger.warn(e)
end
end
# we send a tuple {stream, total}
def search(string, from \\ 0) do
url =
URI.new!(@search_point)
|> URI.append_query(
URI.encode_query(%{q: string, size: 25, from: from})
)
|> URI.to_string()
with {:ok, %{body: body}} <-
Finch.build(:get, url)
|> Finch.request(MyFinch),
{:ok, %{"results" => results, "total" => total}} <- Jsonrs.decode(body) do
{
Stream.filter(results, fn package ->
Map.has_key?(package, "flags") === false &&
get_in(package, ["package", "name"]) |> String.contains?(string)
end)
|> Stream.map(&get_in(&1, ["package", "name"])),
total
}
else
{:error, reason} ->
reason
end
# the second endpoint
def downloaded(package_name, start, ending) do
path = @registry <> "/downloads/point/" <> "#{start}" <> ":" <> "#{ending}" <> "/" <> "#{package_name}"
with {:ok, %{body: result}} <-
Finch.build(:get, path) |> Finch.request(MyFinch),
{:ok, response} <- Jsonrs.decode(result) do
response
else
{:error, reason} -> reason
end
end
The usage is:
iex> Npm.find(false, "@aws-sdk/client", "2022-01-01", "2022-03-01")
The same result with @google-cloud/
:
If you have Livebook, installed, you can run a session with the button:
If not, you can easily run a Livebook from Docker. Run the image:
docker run -p 8080:8080 -p 8081:8081 --pull always -e LIVEBOOK_PASSWORD="securesecret" livebook/livebook
and then from another terminal, execute:
open http://localhost:8080/import?url=https://github.com/ndrean/gruland/blob/main/livebook.livemd
Top comments (0)