1. Copilot Who? Open-Source Autocoders Take Over
These new LLMs built for coding might be the next recipient of the open-source community’s generosity
The LLaMA leak ignited the open-source LLM surge, inspiring others to follow suit. Recently, two coding LLMs emerged, indicating a growing trend. With open-source LLMs rivaling GitHub Copilot's capabilities, developers express mixed emotions.
LLaMA's release incited numerous open-source LLMs, and new coding LLMs will likely do the same for auto-coders. BigCode's StarCoder and Replit's Code V1 provide open-source alternatives to Copilot's proprietary GPT-4-based LLM, encouraging community-driven experimentation and integration.
2. Is open source the future of AI?
Not long ago, a Google researcher declared in a leaked document that neither OpenAI nor Google possesses any significant competitive advantage concerning large language models (LLMs), resulting in a victory for the open-source community. The researcher emphasized the following:
“We shouldn't anticipate catching up with open source. The modern internet thrives on open source due to numerous valid reasons. Open source offers certain undeniable benefits that we are unable to replicate.”
This statement holds true, particularly considering the powerful influence that LLaMA has had on the open-source ecosystem. Numerous groundbreaking innovations have been introduced by this model, with countless indispensable improvements being made by volunteer AI enthusiasts eager to develop a superior product.
Although Meta appeared to gain the least from the LLaMA leak, the enhancements contributed by community members essentially provided them with an enormous amount of unpaid labor. This is undoubtedly the most significant value proposition of open source, enabling volunteers to contribute meaningfully to even the grandest projects.
Previously, the primary obstacle for the open-source community concerning LLMs was the exorbitant training cost – often reaching millions for larger models. Nonetheless, developers successfully managed to reduce LLaMA's training cost to just $300 while simultaneously optimizing its performance to run smoothly on a Raspberry Pi.
Prior to LLaMA's introduction, auto-coding programmers and developers faced limited options with regard to existing LLMs. They could either resort to proprietary, closed-source solutions such as OpenAI's GPT-4, GitHub Copilot, or Tabnine or invest significantly in fine-tuning an available open-source LLM.
Both these strategies were ill-suited for realizing the full potential of open source since they were hindered by stringent licenses and usage conditions. However, innovative developments like StarCoder and CodeV1 now offer a unique opportunity for auto-coding-driven projects to thrive in the market.
3. Innovation waiting to happen
Although both CodeV1 and StarCoder were introduced under open-source licenses (Creative Commons BY-SA for CodeV1 and OpenRAIL-M for StarCoder), it appears that they have distinct purposes within the open-source community. Of the two, StarCoder seems to be explicitly designed for the open-source audience, as both the model and a massive 6.4TB dataset of source code were released simultaneously as open-source resources.
Moreover, StarCoder has demonstrated a better overall quality as compared to Replit's Code V1, which appears to have primarily concentrated on being economical in terms of training and execution. In the HumanEval benchmark, StarCoder achieved a score of 40.8%, while Code V1 managed only 30.5%. Additionally, StarCoder can do more than merely predicting code; it can also assist programmers in reviewing code and resolving issues by utilizing code metadata.
A notable drawback of StarCoder is its substantial hardware requirements, necessitating at least 32GB of GPU memory in 16-bit mode. Nevertheless, if LLaMA is any indication, the open-source community may only need a few more weeks to optimize this model in order to function on mobile devices and laptops.
In contrast, Replit's CodeV1 appears as a valuable addition to their existing software ecosystem. CodeV1 seems to be a further step towards implementing their strategy to democratize access to AI-driven software platforms. By doing so, it aims at targeting a broader pool of developers who seek autonomy in employing AI instead of being confined within specific ecosystems such as GitHub Copilot.
Hacker News user runnerup had this insight regarding CodeV1: “A solution like GitHub Copilot does not allow me to utilize their AI against my codebase as per my preferences. Therefore, if Replit can explore more creative and groundbreaking methods of incorporating large language models (LLMs), they may not necessarily require the finest quality LLMs to deliver an exceptional user experience.”
Crucial to note is that it is not a significant concern if these models are either challenging to execute or lacking in accuracy. The open-source community can easily address these shortcomings. By engaging in projects such as crowd-sourced RLHF, volunteers working on LLaMA have demonstrated that they can make models both more accessible and accurate. One thing remains clear: this innovation marks only the beginning for such open-source models.
4. Collaboration and Community Contribution
Open-source autocoders foster collaboration and community contribution. Developers can not only benefit from the existing capabilities of these autocoders but also actively participate in their improvement and enhancement. This collective effort can lead to a faster pace of innovation and the development of robust coding models that cater to diverse programming needs.
5. Democratizing AI-Powered Coding
The availability of open-source autocoders represents a step towards democratizing AI-powered coding. While proprietary solutions like GitHub Copilot offer powerful capabilities, they may come with limitations such as access restrictions or cost. Open-source alternatives provide a more accessible avenue for developers to leverage AI in their coding workflows, empowering a broader community of programmers to benefit from this technology.
Overall, the rise of open-source autocoders signals an exciting shift in the coding landscape, offering developers more choices, fostering innovation, and promoting collaboration within the developer community.