DEV Community

Rajko Radovanovic
Rajko Radovanovic

Posted on

Top Github repo trends in 2021

Top six Github Repo trends in 2021

Motivation, methodology, and top trending areas

Motivation: Github is the largest source code host in the world with more than 254M repositories (at least 52M public repos in 2020, per x-lab report). Beyond source code, Github has become a broader community hub for over 73 million developers worldwide (16M new in 2021).

Incredible growth of event logs, active repositories, and active developer accounts on GitHub from 2015 to 2020 ([Github Insight Report 2020](http://oss.x-lab.info/github-insight-report-2020-en.pdf) — Xlab)

Methodology: I downloaded data on 247 of the top trending repositories on Github through December 27th by net star growth and methodically categorized each repository: first by whether it was a technology repo or educational repo, then by subcategories.

Six clear trending areas emerged — each deep dived in sections below:

  1. Educational resources (~70/250): patterns, interview prep, build lists…

  2. Web development technologies (~43/250): frameworks on frameworks

  3. Developer tools (~35/250): IDEs, terminals, and really random stuff

  4. Data, AI, ML (~21/250): mostly deep learning

  5. New languages on the block: TypeScript, Go and Rust

  6. MISC: Microsoft, Dev Counterculture, and Surveillance

The full list of repositories can be downloaded here (including descriptions, total contributor, forking, issues, and pull request counts for each repo) and I include a summary table below.

Area #1: Educational Resources

While Github was designed to store code, it has become a major hub for crowdsourcing knowledge and educational resources. In fact, only 3/10 of the top Github repositories today are ‘technologies’ (Vue, React, & Tensorflow), and by far the most popular repository on Github is a free, non-profit, coding camp.

Notable subcategories & examples:

What can we infer about developer culture & demographics?

There is a massive workforce reskilling underway. There are millions of people globally learning to program for the first time — their motivations are often economic emancipation. This is powerful, but also driven by profound change in the labor market. The World Economic Forum has estimated technology will eliminate ~85M jobs, while creating ~97M new jobs by 2025 (link). Not everyone will successfully make this transition, we need to remember this, and we need mechanisms to support those that do not.

Developers ❤️ to learn & build**. **One of my favorite things about working in tech is the people that partake in it: nerds who love continuous learning & makers who love to build!

What can founders and startup leaders take away?

Content marketing is king: write good content by making it educational for continuous learners and/or applied for makers that like to build 😃

*Marking “Good First Issues” can be effective: *a couple of the ‘build project’ lists are actually mostly compilations of issues labelled “Good First Issue” in open source repos… Newer developers can hone their skills while becoming contributors to your projects if you guide them toward the easier opportunities. There is actually an entire section in Github’s 2021 Octoverse report on the effects of using this label!

Give to get**: **education can be used create moats and unlock new TAMs. For example, the dbt Labs community has become the defacto source for an entire generation of data analysts looking to upskill and is helping bring thousands of companies to the Modern Data Stack. They have their own set of online courses. There are also even independent programs, such as the Analytics Engineers Club, offering a 10 week, cohort-based program.

Another interesting example is Conduktor, a company that leveraged a massively popular online Udemy course and built a community around an enterprise Kafka platform that simplifies the management of pub sub systems.

Highlight: The Developer Roadmap

As a fan of great visualizations, I want to highlight one of my favorite educational repositories: the developer roadmap by Kamran Ahmed. Note, there are interactive versions for Frontend, Backend, DevOps and specific languages. They are super cool, find them at https://roadmap.sh/, also useful for folks trying to understanding various technology landscapes at a higher level.

Frontend Developer Roadmap by Kamran Ahmed ([link](https://roadmap.sh/frontend))

Area #2: Web Dev

I include the top repos below. *Ant-design *leads the list, though I suspect there is a data quirk at hand.

Cross platform frameworks are trending: ***Flutter, **Google’s multi-platform framework continues to grow in popularity. *Tauri**, **a framework for the development of desktop applications leveraging front-end technologies, was one of the fastest growing in relative terms on the list. Still TBD how the evolution of WebAssembly will further enable this space.

There are **two open-source alternatives to firebase **on the list: Supabase, and Appwrite. I will be writing about open-source alternatives in an upcoming post.

Top WebDev frameworks by absolute star growth in 2021

A note on the ephemeral nature of Front-End Web Dev technologies

Web development technologies evolve incredibly quickly, and web developer preferences are extremely ephemeral. See for example the two charts below:

Contributors to top Web Development repositories over time

WebDev — Active qualified contributors to various repositories over time

Front-end frameworks are rather fickle:

  • Gatsby, later purple peak, skyrockets from June 2017 to June 2020, then equally quickly declines

  • Flutter, yellow, peaks in march 2021, then begins declining abruptly

  • Even Next.js begins slightly declining around March 2021

  • React-native, pink, peaks quickly around 2017, then begins declining)

Backend frameworks are more enduring: Rails, Django, and Node in particular

C*ontributors to top SRE¹ (think Cloud Native) repositories over time*

SRE — Active qualified contributors to various repositories over time

Notice how much slower the rise and more enduring the core technologies of the SRE¹ stack appear to be, particularly Kubernetes in purple with the largest number of active and qualified contributors of any project.

Note: I will write much more about this methodology, as well as more detailed benchmarks from this perspective, in a future post...

What can founders and startup leaders take away?

**Don’t be a one trick pony, especially not in front-end: **a single trending web dev technology is not likely to sustain a large and enduring company given the speed of change in the space. Latching on to a single core tech has been the downfall of companies even in spaces where dev preferences are much more sustained (think early data infra companies that latched on to Hadoop). Recognizing this, expanding beyond Next.JS while it is still wildly popular, and hiring the creator of Svelte, is part of the brilliance of Guillame Rauch. We may joke about everyone going to work for Vercel, but it is likely a rare viable enduring strategy within the space!

Know where your community lives: **at least on Github, the web developer population is still much larger than the data or other developer populations.** Out of the top ~250 repos in 2021, web development repositories gathered ~600K stars, while Data, AI and ML related repos only around ~162K, 4x less. Note, per 2018 Slashdata estimates below, Github likely accurately indicates that there are far more web-centric vs. data-centric developers in the world today.

Q4 2018 Slashdata Developer Population Size Estimates ([link](https://sdata.me/GlobalDevPop19))

I guess there are two forces at play that will change this dynamic: 1) more and more ‘software engineers’ will begin primarily leveraging and interfacing with data infrastructure vs. web frameworks over time, 2) more and more non-technical data practitioners (any business analyst) will become increasingly technical, whether via learning Python or SQL/dbt… Not quite sure either will become hardcore Github users though!

Area #3: Dev Tools

Honestly, I don’t have much to say here… Github is full of quirky and cool developer tools. A bunch of these top repos were shells, terminals, IDEs or Code Editors, perhaps the most important components of developer experience:

  • Microsoft’s VS Code topped the list with 20K stars and probably one of the best code editors out there today, a difficult truth for many to accept 😅 Ofc, Powershell is also on the list with ~9K stars in 2021

  • Coder— VS Code in the browser, is one of the fastest growing repos by other activity metrics as well

  • There are also some new shells in the list, including Tabbyand Nushell, which both had around 7K stars, though Nushell is a much newer project

There are also some interesting plug-ins, add-ons for the same set of tools:

  • Started in late 2020 and growing ~8K stars almost from scratch this year, is Fig, which adds advanced autocomplete to your terminal, regardless of which you choose to use

  • Somewhat comically, one of the older and more popular utilities and overall more popular repos on Github (65K stars) is ‘thefuck’, which autocorrects your previous console command

  • Starship, on the other hand, helps customize the prompt of any shell that you may be using

Fig adds VSCode-style autocomplete to your existing terminal ([link](https://fig.io/))

Area #4: Data, AI & ML

No surprises here: deep learning is the most popular subcategory, with hugging face transformers repo, YOLOv5, Tensorflow and Deepmind’s Alphafold all in the mix. Surprisingly, the only proper infrastructure-ey repos on the list are Meilisearch and Clickhouse, a tad bit surprising given all the hype data infrastructure receives in VC-world, but again, probably just a question of size of end-user populations + whether data scientists spend tons of time on Github vs. Web Developers…

I have tons more to write on Data, plan on publishing future posts… For now, I just will highlight two more resources that are cool:

  1. the AI Expert Roadmap (interactive web page), seems to have taken inspiration from the developer roadmap linked above and is awesome. I LOVE how they separate out different personas, from data scientist, to machine learning, to deep learning, to data engineering, etc. It’s really well done and fun to browse through! It is also kind of fun to juxtapose this with the aforementioned Developer Roadmap, as well as the Analytics Engineers Club, as they collected cover so much of modern tech is slightly MECE² (#BCG) ways 😃

Data scientist excerpt from the AI Expert Roadmap by **AMAI GmbH ([link](https://i.am.ai/roadmap/#note))**

  1. The second repo I LOVE is Eugene Yan’s Applied ML repository. This is a brilliant idea to create and actually something I was planning on sort of casually doing in my non-existent free time… Anyhow, it is a curated list of technical posts from top engineering teams (Netflix, Amazon, Pinterest, Linkedin, etc.) detailing how they built out different types of AI/ML systems (e.g. forecasting, recommenders, search and ranking, etc.). Ofc, it focuses on AI/ML, but something similar could be made for the traditional or BI-oriented analytics stack, as well as the streaming world, super high value for practitioners! Btw-one of my favorite things at BCG used to be looking at our IT architecture team’s reference architecture diagrams… the best way to understand technologies is to look at how a ton of stuff is architected… and its fun!

Area #5: New languages on the block

Typescript is by far the dominant programming language in the top trending technology repositories in 2021, followed by Javacript, Python, C++, Go and Rust. Somewhat interestingly, top educational repositories have a very different composition, with Javascript #1, Python #2, and Java still coming in at #3.

The technology repositories probably reflect what hackers are excited about and the future, while the educational repos reflect which languages are still most popular today. A friend has suggested to me it also just has to do with languages that are ‘easiest barrier to entry’ which makes sense!

To this end, I compare three more data sources for programming language popularity below. It would seem that looking at top trending Github repos is actually the most strongest forward looking indicator for the future!

  1. Github actually publishes their own ranking of programming languages used across the entire service: you can see how strongly Typescript trends below, though Rust and Go don’t even make it on that list yet!

Top languages over the years on Github ([link](https://octoverse.github.com/#top-languages-over-the-years))

  1. Perhaps the most general source of all is the TIOBE index, which aggregates searches for different programming languages across search engines. Surprisingly, Java and C are still super dominant on this list, with Python only overtaking them in the last month!

Top programming languages via. TIOBE index ([link](https://www.tiobe.com/tiobe-index/))

  1. Another interesting source is looking at tags on Stack Overflow. Python is really dominant here, but I do think there may be some selection bias with tons of data-oriented folks leveraging the website… However, C# was the most popular on the platform in 2009, and JavaScript still performs pretty well. You can see a nice uptick for Typescript as well.

Top programming languages via. Stack Overflow trends ([link](https://insights.stackoverflow.com/trends?tags=java%2Cc%2Cc%2B%2B%2Cpython%2Cc%23%2Cvb.net%2Cjavascript%2Cassembly%2Cphp%2Cperl%2Cr%2Crust%2Ctypescript))

Area #6: A couple MISC points

a. Microsoft has come a really really long way in terms of open-source, developer friendliness, and its broader company strategy. Senior Microsoft strategy executives once compared open-source companies to those that caused the dot-com bubble crisis — irresponsible and recklessly giving away services for free. Bill Gates even once claimed open-source is bad for jobs 😂

The significant change came with the appointment of Satya Nadella as CEO in 2014 and the open sourcing of the .NET stack (Wikipedia has a good recap of it all). Today, Microsoft finds itself with 8 of the most popular repos in 2021 on Github:

  • three educational courses- Web Dev, ML, and IoT for beginners. Note re using educational resources as a strategy for marketing , at least the ML course links to various Azure services. Google does this a bunch as well, with Collab notebooks often being used to demo educational materials.

  • Dev tools and utilities including VS Code, Microsoft Terminal, and PowerToys,

  • Web Dev technologies including the dominant TypeScript language, as well as an increasingly popular alternative to Selenium for browser automated testing: Playwright

b. Hackers embody a healthy dose of counter-culture **(in American terms at least haha)**. Anti-paying for things, anti-advertisements, and strongly pro-labor 💪

  • Bypass paywalls chrome is plug-in that does exactly what it says, let’s users bypass website paywalls to access content. Note, if you can, please pay for quality journalism.

  • Block the spot helps block advertisements on the internet. I am not a fan of hidden, advertisement-driven business models. That’s actually a big part of why I like enterprise vs. consumer tech more broadly, much cleaner and ethical business models

  • 996.ICUis an amazing repository, basically a list of bad tech employers in China (perhaps broader now). It received significant media attention when started trending in 2019. Their own description below:

    The name 996.ICU refers to "Work by '996', sick in ICU", an ironic saying among Chinese developers, which means that by following the "996" work schedule, you are risking yourself getting into the ICU (Intensive Care Unit)

c. There are some sketchy things trending, not all tech does good (e.g. a less fun way to end things)

AI/ML is awesome and will bring a ton of good to the world, but there are also serious risks and safety considerations. Enhanced surveillance and State control is certainly one of them, and perhaps one of the ripest use cases for abuse is around facial recognition. One of the top trending repos in 2021 was Tencent’s GFPGAN, which ‘aims at developing Practical Algorithms for Real-world Face Restoration’. Another trending library was DeepFaceLab, for creating deep fakes. Note, famously in 2020, Huawei published about testing software for facial recognition of Uighurs. Earlier that year, IBM announced it would no longer develop facial recognition software. I come from a country where state surveillance is fairly normalized, albeit discreet. I’m talking journalists have their homes broken into, their messenger texts intercepted, and the secret police taps your cell phone type surveillance. So when our government bought 1000+ Huawei smart cameras a couple years back with facial recognition embedded, human rights activist were not thrilled.

btw: there is an excellent compilation of awful use cases of AI in this repository aptly named Awful AI

Another two trending repositories dealt with locating people across social media accounts: project Sherlock and social analyzer— kind of sketchy seeing tech like this floating around in the public and easily downloadable domain and a good reminder of how public our lives are on the internet.

[1] SRE — Site Reliability Engineering, e.g. CNCF

[2] MECE — Mutually Exclusive, Collectively Exhaustive, uber consulting speak (link)

Thank you Nick Joseph and Myric Lehner for reading an advance copy and providing helpful comments!

Discussion (0)