There are many popular license types for open-source software out there, such as the MIT, BSD or Apache Software License. When building software privately, these license types are minor. However, things get way more complicated when building a commercial product, even if it's open-source. For us as a company, that meant a lot of insecurities about how to handle these licenses.
In a nutshell, when using a dependency, you'll need to ensure that the dependency allows for commercial use. That's not a problem with the majority of the licenses, but there are some lesser-known ones that could cause some trouble.
For our tool, the Kern AI refinery, we use dozens of different libraries. Checking all the dependencies manually for all the repositories would be an extremely tedious task, to say the least. So, our machine learning engineer Felix thought to himself "why don't I automate this then?". And that's exactly what he did!
Checking Python licenses with LicenseCheck
We have a lot of Python dependencies, so checking these licenses was our biggest priority. When it comes to checking licenses of Python dependencies, we've found a really cool tool called LicenseCheck, which can check the requirements.txt file of a GitHub repository and find the licenses for all the dependencies listed inside the file. LicenseCheck can simply be installed via pip and can then be used to print out all the licenses. This already helps a lot, but when you have 50+ repositories, it's still a lot of manual work.
Building a Python script
To check all of our repositories, our ML engineer Felix build an amazing Python script that completely automates the whole license checking of our Python dependencies. You can find the whole script here if you are interested in using it!
How does the script work? In a nutshell, you simply paste in the repositories you want to check by putting the name and the URL of the repo inside of a dictionary. Feel free to select as many repos as you like. The script then loops over all the repositories and checks the requirements.txt file from each repo.
Using the script, you can simply check all the licenses in your repository
Checking the results
Finally, the script then saves all the results into a handy Excel spreadsheet, in which you'll get a list with all your dependencies and the corresponding license.
Using this script we were able to have the licenses of 114 dependencies in one list just by running this script. In the future, we might have even more dependencies, but with this tool, we can easily check them again in the future with very little effort.
Finding licenses for JavaScript dependencies
Python is not the only programming language that we use. Our application is also build with JavaScript, mainly for the UI and the dashboards for our admins. Sadly, the LicenseCheck tool doesn't work for JavaScript or any other language other than Python.
As an alternative, we've found LincenseFinder, which is an awesome open-source tool to check dependencies for JavaScript. The tool checks the package.json file of a repository and tells you the used licenses. You can also create a list of permitted licenses and LicenseFinder will check if your dependencies are in that list. It basically works very similar as the LicenceChecker for Python did.
We hope that you'll find this helpful. Let us know in the comments if you found similar tools for other programming languages as well so that other people can also see them.
Make sure to check out our GitHub page to find out more about our open-source, data-centric IDE which we are building!
Top comments (2)
Your link to Felix script isn't correct, 4040
Thanks! Should work now