If you want to get into cheminformatics or AI/ML for drug discovery, you may have encountered RDKit. It is a great tool to help you analyze chemical information and is used in many labs, making it a great open-source tool to contribute to if you're looking to work in the biotech industry!
I needed help figuring out how to contribute to the RDKit. However, one tutorial explained it well, although I did need just a bit more help figuring out how to exactly to do it. I wrote this blog to provide an informal way to fill the gaps in case anyone else is missing a certain piece to building RDKit too. I was working on MacOS Ventura 13.1 when I built RDKit.
The below steps will explain how to run, test, and ultimately contribute to the RDKit
1. Fork and clone to your local machine
You can do this a few different ways; choose whichever you like.
Visit RDkit's Github and fork the project.
Once forked, go to the green button on the repo that says "<> Code" and copy the HTTPS URL from the dropdown.
Open the folder you want to use and enter the command git clone https://github.com/<YOURUSERNAME>/rdkit
where is replaced with your own.
2. Compile and Run RDKit on Your Machince
Open the project folder in whichever source code editor you prefer.
Make a conda environment using the below commands in your terminal...
conda create -c conda-forge -n my-rdkit-env rdkit
conda activate my-rdkit-env
Followed by the commands below to help RDKit locate what it needs to...
export RDBASE=`pwd`
export DYLD_LIBRARY_PATH="$RDBASE/lib"
export PYTHONPATH="$RDBASE"
You need to create your build folder and change the directory to it...
mkdir cmake-build
cd cmake-build
Use cmake called with the following flags. The ..
after all the flags are essential for the command to navigate the files, do not exclude.
cmake -DPy_ENABLE_SHARED=1 -DBOOST_ROOT="$CONDA_PREFIX" -DBoost_NO_SYSTEM_PATHS=ON -DBoost_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ; print(numpy.get_include())')" -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON ..
Now we will build the RDKit using the command below, don't be surprised if this takes 1 hour or longer. Luckily, you will only have to do this once, as CMake is smart enough to rebuild the files you have changed.
(make -j 6 install ; find $RDBASE/rdkit -name \*.so -exec install_name_tool -add_rpath $RDBASE/lib {} \; -print )
Now you have a fully compiled and built RDKit you can use from here on out. You can use ctest
to ensure it runs and all test cases are successful. Typically you would like to run test from only one test file and to do that you would run something similar to ctest -R <test_name>
This way you can focus better on your own changes.
3. Finding an Issue and Making A Branch
Now that everything is built, you can navigate to GitHub and pick an issue that interests you and branch to start working on this issue as such...
git branch fixes-#<issue number>
Insert the issue number where is; this way when your PR is merged, it automatically is linked to the issue so it can be removed.
I hope this helps clarify building and contributing to RDKit. I would love feedback about potential changes I can add to this post, so please do not hesitate to share.
Thanks for reading!
Top comments (0)