DEV Community

Cover image for Avoiding accidental open-source laundering with GitHub Copilot
Lars Gyrup Brink Nielsen for The Transient Thoughts of a Restless Mind

Posted on • Updated on

Avoiding accidental open-source laundering with GitHub Copilot

Cover photo by Tina Bosse on Unsplash.

GitHub Copilot is now available to individuals as a paid product. Personally, I will never code without GitHub Copilot again. It's that good.

Training an AI on open-source software

Codex is the AI engine used by GitHub Copilot and it's based on OpenAI GPT-3. Codex is trained on publicly available texts and source code, including but not limited to public GitHub repositories.

The issue lies in training Codex on software projects using an open-source license. Permissive open-source licenses require including the project's license in derivative work. According to GitHub, the code generated by GitHub Copilot matches 150 or more characters of publicly available code in 1% of suggestions.

Legal and ethical issues

The question is: When is it legally or ethically allowed to include partial or full copies of licensed source code without including its open-source license or in the case of restrictive (copyleft) licenses, publish our own code using the same license?

According to Kate Downing, an IP lawyer specializing in open-source compliance, code generated by GitHub Copilot might legally be considered fair use of open-source code used for training its AI model. It's similar to how Google Books is not infringing on the copyright of the books it cites.

However, suggestions matching publicly available code are not original and therefore questionable. GitHub leaves it up to us as developers to take each and all responsibilities for code generated by GitHub Copilot, including adhering to licenses as well as verifying security and other quality aspects of it.

Avoiding accidental open-source laundering

GitHub Copilot setting to opt out of public code matches

To avoid licensing issues, we can opt out of Suggestions matching public code by selecting Block at then pressing Save.

Top comments (2)

geraldew profile image

Unsurprisingly GitHub considers themselves to be good judges about where the boundaries of software freedom infringement lies. Others disagree, so I'd recommend seeing what some of those other parties say.

The Software Freedom Conservancy has recently updated its view, which you can see here

I would merely point out the obvious, that while pre-purchase GitHub was putting out some Open Source software, GitHub was and remains a closed source proprietary product and service.

layzee profile image
Lars Gyrup Brink Nielsen

It would be great if GitHub released an open-source core.

GitHub Actions is a rapidly growing ecosystem of open-source automation. I expect we will see more like this from GitHub.