DEV Community

OSS Insight
OSS Insight

Posted on

Love, Code, and Robot — Explore robots in the world of code

When it comes to GitHub, we often see fake GitHub users who are always enthusiastic and active, giving timely feedback to project maintainers and contributors, and helping developers with tasks that can be automated. Yes, the next thing I want to discuss is something about GitHub bots.


In the OSSInsight project, we have developed a number of metrics to provide insight into open source projects. When developing some open source project metrics, we always consider excluding bot-generated actions or events from the metric calculation.

However, We can't ignore the contribution of robots in the domain of open source, and it's important to shift our thinking to look at what bots are doing on GitHub.

GitHub bots help developers do a lot of work:

  • Issue triage and management. (For example: stale[bot]、todo[bot])
  • Code review, security audit and quality inspection (For example, snyk-bot).
  • Format checking like ensuring license agreement signing, or make sure commit messages semantic. (For example: CLAassistant)
  • Integration with third-party systems, including Jira, Slack, Jenkins and so on.
  • As an agent to help contributor perform some operations needed permission on the repository. (For example: k8s-ci-bot、ti-chi-bot)

History trends

Looking at the historical data, we see that the number of GitHub bots grows significantly faster after 2019 (on average, 20,000 new bots are created each year)
Image description

I looked into what happened during the year and found that GitHub invested a lot in its software development infrastructure (including bots) during the year.

  • In May 23, 2019, GitHub announced acquired Dependabot (Aka, dependabot[bot]).
  • In June 17th, 2019, GitHub announced acquired Pull Panda.
  • In September 18th, 2019, GitHub announced acquired Semmle (Aka, the team builded lgtm-com[bot]).

At this year, we, humans beings, were amazed to discover that bots could find problems, submit PRs, wait CI test code, complete merges and comment on PRs on their own without any human involvement.
Image description

For now, rough statistics found that there are more than 95,620 bots on GitHub, the number doesn't seem like so much, but wait...

These 95 thousand bot accounts generated 603 million events, these events account for 12.82% of all public events on GitHub.

And these GitHub robots have served over 18 million open source repositories.

Cases study


dependabot[bot] is a hard-working bot responsible for helping open source projects keep their dependencies up to date.

By analyzing depentenbot's Push commit time, we found that he likes to start his busy week at 8:00 on Mondays (at GMT timezone).
Image description

It is commendable that, after a series of log4j security vulnerabilities came to light, it helped many Java-language repositories to update the dependency to a secure version timely.

Stale Bots

Stale Bot is a controversial class of robots, they are responsible for reminding maintainers to continue promoting long-term stale issue.

Bad practice
The user from Gatsby:

I used to open GitHub issues to Gatsby to report bugs. Almost nothing was ever fixed and every few weeks I had to manually clickety-click to keep the issues alive because of the stale bot. Guess what I do now? I don't report bugs to Gatsby, and I recommend against using Gatsby in newer projects.

Best practices
The user from NixOS:

IMO NixOS has the right stalebot settings 0. It was discussed thouroughly in the RFC, as to choose the right information text and other actions by the bot. For example, the bot will only mark the issue/PR as stale and will never close the issue or lock it. Issues are only ever closed by humans. The information text they came up with is quite a bit longer than the ansible one 1. I think this is a very important point when adding such a bot, otherwise the user will be left helpless.

To verify the above statement, we run the following query through the SQL statement:
Image description

We know from the following query that many Issues in the gatsbyjs/gatsby repository have been closed by the stale bots.
Image description

I think it is necessary to distinguish between what should be done by robots and what must be done with human involvement.

Weird bots

There are some weird bots on GitHub that don't help people work and learn on GitHub, but rather act as data movers.
Image description

  • Some of them will use GitHub as a free place to archive their data, for example, speedtracker-bot, newstools.
  • Some of them will periodically upload a timestamp to the code repository as a commit, for example, keihin00174.
  • Some are even crazier and you can't even access their profile pages because the number of events generated is so large that GitHub's database can't process them quickly, for example, mhutchinson-witness, direwolf-github. Image description

_More content and specific SQL can go into the official website to learn more. _

Discussion (0)