DEV Community

Cover image for Lasso Highlighting In Datamol
Desmond Gilmour
Desmond Gilmour

Posted on

Lasso Highlighting In Datamol

TL;DR

In this blog post, we'll explore the lasso_highlight_image() function in Datamol, a Python library for scientists working with molecular data. We'll see how it can be used to highlight specific parts or features of a molecule quickly. We'll also provide some examples of how to use the function and discuss its limitations and areas for improvement for future contributors.

Intro

If you work with chemical data and need to visualize it, you'll want to check out Datamol. This Python library has a new addition called lasso highlight, which was initially produced by Christian W. Feldmann. The lasso highlighting function allows you to quickly identify and visualize specific parts or features of a molecule. It's useful for identifying functional groups, comparing and contrasting different molecules, and analyzing molecular structure and properties.

Examples

To use the lasso_highlight_image() function, you supply a molecule and specify the substructures you want to highlight. The function returns an image of the molecule with the specified substructures highlighted. You can provide the target and search molecules in SMILES format or as a rdkit.Chem.mol object. The function also takes parameters to specify the image type (PNG or SVG), size, and image characteristics.The details of each parameter can be found here.

Here are two examples to help you get started:

1.Lasso highlight with multiple substructures with PNG image

import datamol as dm

target_molecule = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
substructure = ["CONN", "N#CC~CO"]

dm.lasso_highlight_image(target_molecule, substructures, (400, 400), use_svg=False)
Enter fullscreen mode Exit fullscreen mode

Multi substructure highlighting

  1. Lasso highlight with single substructures with SVG image
import datamol as dm

target_molecule = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
substructure = dm.to_smarts("CONN")

dm.lasso_highlight_image(target_molecule, substructure, (300, 300))
Enter fullscreen mode Exit fullscreen mode

Single substructure highlighting

Limitations and What's Next

Although the lasso_highlight_image() function is highly valuable, it does have a few limitations. To enhance its capabilities and overcome these constraints, the following features could be incorporated:

  1. Add functionality to write to a file, similar to the to_image() function in Datamol/viz/viz.py.
  2. Allow the analysis of multiple target molecules at once.
  3. Canonicalize search molecules to prevent duplicate highlighting.
  4. Update the documentation in Visualization.ipynb.

If you're interested in contributing to the project, check out the Datamol website and the contribution guidelines. Alternatively, feel free to contact me from my social media on my personal website for any questions or feedback.

Top comments (0)