When it comes to GitHub, we often see fake GitHub users who are always enthusiastic and active, giving timely feedback to project maintainers and contributors, and helping developers with tasks that can be automated. Yes, the next thing I want to discuss is something about GitHub bots.
In the OSSInsight project, we have developed a number of metrics to provide insight into open source projects. When developing some open source project metrics, we always consider excluding bot-generated actions or events from the metric calculation.
However, We can't ignore the contribution of robots in the domain of open source, and it's important to shift our thinking to look at what bots are doing on GitHub.
GitHub bots help developers do a lot of work:
- Issue triage and management. (For example: stale[bot]、todo[bot])
- Code review, security audit and quality inspection (For example, snyk-bot).
- Format checking like ensuring license agreement signing, or make sure commit messages semantic. (For example: CLAassistant)
- Integration with third-party systems, including Jira, Slack, Jenkins and so on.
- As an agent to help contributor perform some operations needed permission on the repository. (For example: k8s-ci-bot、ti-chi-bot)
I looked into what happened during the year and found that GitHub invested a lot in its software development infrastructure (including bots) during the year.
- In May 23, 2019, GitHub announced acquired Dependabot (Aka, dependabot[bot]).
- In June 17th, 2019, GitHub announced acquired Pull Panda.
- In September 18th, 2019, GitHub announced acquired Semmle (Aka, the team builded lgtm-com[bot]).
For now, rough statistics found that there are more than 95,620 bots on GitHub, the number doesn't seem like so much, but wait...
These 95 thousand bot accounts generated 603 million events, these events account for 12.82% of all public events on GitHub.
And these GitHub robots have served over 18 million open source repositories.
dependabot[bot] is a hard-working bot responsible for helping open source projects keep their dependencies up to date.
It is commendable that, after a series of log4j security vulnerabilities came to light, it helped many Java-language repositories to update the dependency to a secure version timely.
Stale Bot is a controversial class of robots, they are responsible for reminding maintainers to continue promoting long-term stale issue.
The user from Gatsby:
I used to open GitHub issues to Gatsby to report bugs. Almost nothing was ever fixed and every few weeks I had to manually clickety-click to keep the issues alive because of the stale bot. Guess what I do now? I don't report bugs to Gatsby, and I recommend against using Gatsby in newer projects.
The user from NixOS:
IMO NixOS has the right stalebot settings 0. It was discussed thouroughly in the RFC, as to choose the right information text and other actions by the bot. For example, the bot will only mark the issue/PR as stale and will never close the issue or lock it. Issues are only ever closed by humans. The information text they came up with is quite a bit longer than the ansible one 1. I think this is a very important point when adding such a bot, otherwise the user will be left helpless.
I think it is necessary to distinguish between what should be done by robots and what must be done with human involvement.
- Some of them will use GitHub as a free place to archive their data, for example, speedtracker-bot, newstools.
- Some of them will periodically upload a timestamp to the code repository as a commit, for example, keihin00174.
- Some are even crazier and you can't even access their profile pages because the number of events generated is so large that GitHub's database can't process them quickly, for example, mhutchinson-witness, direwolf-github.
_More content and specific SQL can go into the official website to learn more. _