Collecting dependency information for Telescope 2.6

#osd700 #opensource #telescope

A few days ago, I started working on a new feature for Telescope developers: a dependency graph database for Telescope.

The main question is: why? Why collect dependency information of Telescope? There are three related reasons:

To visualize the dependencies that Telescope uses, as well as provide other information such as GitHub repositories.
To aggregate GitHub repo issues, so that Telescope maintainers can help other project's communities by contributing and thus helping Telescope in the long term.
To promote a healthy open source community, where we not only use the projects of other people, but we also give back by contributing to such projects. Only that way we can actually motivate a healthier open source community.

Current Progress

While I started working on it, I haven't finished writing the MVP.

My plan is to write a microservice that will provide the dependency information for any client. An interesting property of this project is that most of the data that the microservice is going to provide is static. When the microservice is started, it will collect the information related to the dependencies from a file given by pnpm, pnpm-lock.yml. The pnpm-lock file contains all of the dependencies that pnpm managed to find across all project.json files in the workspaces.

While pnpm-lock gathered which dependencies are being used, but it does not show more metadata except the version used. So, for example, the GitHub URL has to be extract from somewhere else.

Another thing that happens is that this method only includes npm packages enlisted in the local package.json files in Telescope. This includes most of Telescope's dependencies, since Telescope is mostly a JavaScript project, but that doesn't include other dependencies like pnpm, docker and docker images, nodejs, git, and other dependencies. Scraping this information automatically may be a more difficult task, so we would have to provide support for manually written files that provide this information.

After collecting the dependencies from the pnpm-lock file, I would extract more information by accessing the npm registry. Some information I am interested to collect is the GitHub repository link, as well as the description of the package.

When all of this data collection is done, it is time to transform in an object and store it in an in-memory database that will then be given as a response. The reason I don't use a persistent data store is because I want this information to always be generated when initializing the service. The idea is that we want only want to show this dependency graph when Telescope is released, and on every release some dependency might change, thus making a persistent store somewhat useless. In the case of the microservice shutting down due to an error, it was can easily collect the information again.

Why not collect the information and cache it? Since this is an MVP, I am not interested in how to make it extremely efficient. When we discuss about the API of the microservice, for example, we would want to think of improving the data collection.

However, that's going to be an issue for the future! For now, we want to focus on the feature itself, leave the nice-to-have's for later.