DEV Community

Desmond Gilmour
Desmond Gilmour

Posted on • Edited on

Contributing to RDKit

If you want to get into cheminformatics or AI/ML for drug discovery, you may have encountered RDKit. It is a great tool to help you analyze chemical information and is used in many labs, making it a great open-source tool to contribute to if you're looking to work in the biotech industry!

I needed help figuring out how to contribute to the RDKit. However, one tutorial explained it well, although I did need just a bit more help figuring out how to exactly to do it. I wrote this blog to provide an informal way to fill the gaps in case anyone else is missing a certain piece to building RDKit too. I was working on MacOS Ventura 13.1 when I built RDKit.

The below steps will explain how to run, test, and ultimately contribute to the RDKit

1. Fork and clone to your local machine

You can do this a few different ways; choose whichever you like.

Visit RDkit's Github and fork the project.

Once forked, go to the green button on the repo that says "<> Code" and copy the HTTPS URL from the dropdown.

Open the folder you want to use and enter the command git clone https://github.com/<YOURUSERNAME>/rdkit where is replaced with your own.

2. Compile and Run RDKit on Your Machince

Open the project folder in whichever source code editor you prefer.

Make a conda environment using the below commands in your terminal...

conda create -c conda-forge -n my-rdkit-env rdkit
conda activate my-rdkit-env
Enter fullscreen mode Exit fullscreen mode

Followed by the commands below to help RDKit locate what it needs to...

export RDBASE=`pwd`
export DYLD_LIBRARY_PATH="$RDBASE/lib"
export PYTHONPATH="$RDBASE" 
Enter fullscreen mode Exit fullscreen mode

You need to create your build folder and change the directory to it...

mkdir cmake-build
cd cmake-build
Enter fullscreen mode Exit fullscreen mode

Use cmake called with the following flags. The .. after all the flags are essential for the command to navigate the files, do not exclude.

cmake -DPy_ENABLE_SHARED=1 -DBOOST_ROOT="$CONDA_PREFIX" -DBoost_NO_SYSTEM_PATHS=ON -DBoost_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ; print(numpy.get_include())')" -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON ..
Enter fullscreen mode Exit fullscreen mode

Now we will build the RDKit using the command below, don't be surprised if this takes 1 hour or longer. Luckily, you will only have to do this once, as CMake is smart enough to rebuild the files you have changed.

(make -j 6 install ; find $RDBASE/rdkit -name \*.so -exec install_name_tool -add_rpath $RDBASE/lib {} \; -print )
Enter fullscreen mode Exit fullscreen mode

Now you have a fully compiled and built RDKit you can use from here on out. You can use ctest to ensure it runs and all test cases are successful. Typically you would like to run test from only one test file and to do that you would run something similar to ctest -R <test_name> This way you can focus better on your own changes.

3. Finding an Issue and Making A Branch

Now that everything is built, you can navigate to GitHub and pick an issue that interests you and branch to start working on this issue as such...

git branch fixes-#<issue number>
Enter fullscreen mode Exit fullscreen mode

Insert the issue number where is; this way when your PR is merged, it automatically is linked to the issue so it can be removed.

I hope this helps clarify building and contributing to RDKit. I would love feedback about potential changes I can add to this post, so please do not hesitate to share.

Thanks for reading!

Top comments (0)