These are notes on how to use Elixir
to find all the NPM packages published by AWS with their download count. The total count is 1676. We first query the npms-api search endpoint: we give a string and it returns a list of matching Javascript packages. The endpoint is paginated. With this list, we query the NPM registry to get statistics for each package.
We send a total of approx 1750 requests. The longer one is the paginated API with a response time of approx 300ms per query. Then we send 1700 queries to get the stats on each package with an average response of 10ms. We get the result on average in around 25s.
We used Elixir
to achieve this with the default client Finch
. We used an easy path: build the full list of found packages, and then query the details for each element of the list.
We used Stream.resource to build the list of packages. We used the total count returned by the endpoint to handle the pagination and increment a counter on each iteration.
We then use async_stream to query the second endpoint. It returns statistics on a given package. We retrieve the downloaded count during a given period.
We then Enum.sort_by to order the list of maps on a given key.
With the result, we run a Task
- a concurrent process - to prettify the result and save in it a file.
We use the function tap which passes the data by since we need the data unchanged for the final step. This allows us to keep the flow.
We eventually prettify the data into a list of maps %{"package_name" => downloaded_count}
.
Elixir
can nicely chain all these streams.
defmodule Npm do
require Logger
@registry "https://api.npmjs.org"
@search_point "https://api.npms.io/v2/search"
@starting "2022-01-01"
@ending "2023-01-01"
@search "@aws-sdk/client"
@aws_npm_packages "aws-npm-packages.json"
def find(save? \\ false, string \\ @search, starting \\ @starting, ending \\ @ending) do
check_response = fn response ->
case response do
{:ok, result} ->
result
{:error, reason} ->
{:error, reason}
end
end
# the optional "save to file"
save_to_file = fn list ->
Logger.info(%{length: length(list)})
Task.start(fn ->
case Poison.encode(list, %{pretty: true, indent: 2}) do
{:ok, result} -> File.write!("../aws-npm-packages.json", result)
{:error, reason} -> reason
end
end)
end
# the iterating function in Stream.resource
next = fn {data, page} ->
{response, total} = search(string, 25 * page)
case page * 25 >= total do
true ->
{:halt, data}
false ->
{response, {data, page + 1}}
end
end
try do
Stream.resource(
fn -> {[], 0} end,
&next.(&1),
fn _ -> nil end
)
|> Task.async_stream(&downloaded(&1, starting, ending))
|> Stream.map(&check_response.(&1))
|> Enum.sort_by(&Map.get(&1, "downloads"), :desc)
|> tap(fn data -> if save?, do: save_to_file.(data) end)
|> Enum.map(fn %{"downloads" => d, "package" => name} ->
Map.put(%{}, name, d)
end)
rescue
e ->
Logger.warn(e)
end
end
# we send a tuple {stream, total}
def search(string, from \\ 0) do
with {:ok, %{body: body}} <-
Finch.build(:get, @search_point <> "?q=#{string}&size=25&from=#{from}")
|> Finch.request(Back.Finch),
{:ok, %{"results" => results, "total" => total}} <- Jason.decode(body) do
{
Stream.filter(results, fn package ->
Map.has_key?(package, "flags") === false &&
get_in(package, ["package", "name"]) |> String.contains?(string)
end)
|> Stream.map(&get_in(&1, ["package", "name"])),
total
}
else
{:error, reason} ->
reason
end
# the second endpoint
def downloaded(package_name, start, ending) do
path = @registry <> "/downloads/point/" <> "#{start}" <> ":" <> "#{ending}" <> "/" <> "#{package_name}"
with {:ok, %{body: result}} <-
Finch.build(:get, path) |> Finch.request(Back.Finch),
{:ok, response} <- Jason.decode(result) do
response
else
{:error, reason} -> reason
end
end
The usage is:
iex> Npm.find("@aws-sdk/client", "2022-01-01", "2022-03-01")
The same result with @google-cloud/
:
If you have Livebook, installed, you can run a session with the button:
If not, you can easily run a Livebook from Docker. Run the image:
docker run -p 8080:8080 -p 8081:8081 --pull always -e LIVEBOOK_PASSWORD="securesecret" livebook/livebook
and then from another terminal, execute:
open http://localhost:8080/import?url=https://github.com/ndrean/gruland/blob/main/livebook.livemd
Top comments (0)