DEV Community

Cover image for Medical Subject Headings (MeSH) Into Neo4j
Tom Nijhof
Tom Nijhof

Posted on

Medical Subject Headings (MeSH) Into Neo4j

To make a knowledge graph, it is useful to have a vocabulary in place, which is called an ontology.

The Medical Subject Headings is one such ontology, which includes many of the medical terms that are currently being used.
It can be downloaded as an RDF file (N-triples), making it easy to import to Neo4j with neosemantics (n10s).

Installing n10s in neo4j desktop

Installing n10s in neo4j desktop

The next three commands will import the 2021 MeSH graph directly into Neo4j. It will take a moment before all 2 million nodes and 4 million relations are loaded in.

    CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
    CALL n10s.graphconfig.init();
    CALL n10s.rdf.import.fetch("https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2021/mesh2021.nt","N-Triples");
Enter fullscreen mode Exit fullscreen mode

Exploring the Data

Before I start, I will set the caption to rdfs_label for resources, so the nodes have a name. For ns0_Term, I will use ns0_prefLabel.

Naming nodes within Neo4j desktop

Naming nodes within Neo4j desktop

Let's start with the sexiest thing to do — reading the documentation of RDF data structure of medical terms used to sort medical papers.
Did I say “sexy”? I meant nerdiest.

I will not go over the full structure; instead, I will select just two elements I think are interesting to start with. Feel free to disagree.

The code snippets in this blog are cypher query you can use in Neo4j. It is not needed but might be useful if you want to know how I got the results, or they can serve as an example that my cypher is not optimized, up to standard, etc.

Terms, Descriptors, and Concepts

Descriptors, concepts, and terms are very closely related. Descriptors are the broadest — within descriptors, you have concepts (at least one that is the preferred one). Concepts have terms — these terms hold synonyms for the concepts. Each concept has one preferred term, while the descriptor also has one preferred term out of all (see picture below).

    MATCH (q:ns0__Term)<-[]-(n:ns0__TopicalDescriptor)-[]->(p:ns0__Concept)-[]->(z:ns0__Term) 
WHERE (n.rdfs__label = "Calcimycin")  
return n, p, q, z
Enter fullscreen mode Exit fullscreen mode

Relation between descriptor (pink), concepts (green), and terms (blue)

Relation between descriptor (pink), concepts (green), and terms (blue)

Terms are very useful for labeling text. Concepts can define a part that is smaller than the whole descriptor. The descriptor holds the connection to the rest of the graph (tree, other descriptors, SCR, Qualifiers, etc.). I will mainly focus on the descriptors for graph algorithms.

Tree Structure

All TopicalDescriptor have a link to a tree-number (ns0_treeNumber) and to another TopicalDescriptor (ns0_broaderDescriptor).

These two hold very similar information but have one use case where they differ: multiple tree locations.

A descriptor can be in more than one tree at the same time (like the descriptor “eye”). Eye has tree number **A01.456.505.420 **as a subcategory of face, and **A09.371 **as a subcategory of Sense Organ. This can give us problems because these two tree numbers do NOT have the same subcategories!

Eyebrows are part of the eye as part of the face but are NOT part of the eye as part of a sense organ.

Tree overview of Eye in [online MeSH Browser](https://meshb.nlm.nih.gov/record/ui?ui=D005123)

Tree overview of Eye in online MeSH Browser

If we use ns0__broaderDescriptor to go back from Eyebrows to the broadest description, we come upon a mistake. The broader description of Eyebrows is Eye, which has two broader descriptions (namely, sense organs and face). As Eyebrows is not a sense organ, this shouldn’t be correct.

    MATCH (n:ns0__TopicalDescriptor)-[:ns0__broaderDescriptor*]->(p:ns0__TopicalDescriptor) 
WHERE n.rdfs__label = "Eyebrows" 
return n, p
Enter fullscreen mode Exit fullscreen mode

Sense Organs is found as broader description of Eyebrows

Sense Organs is found as broader description of Eyebrows

The other way is to go via the tree numbers. This will mean Eyebrows is only connected to one of the two tree numbers of Eye and does NOT have Sense organs as a broader description.

    MATCH (n:ns0__TopicalDescriptor)-[:ns0__treeNumber]->(t:ns0__TreeNumber)-[:ns0__parentTreeNumber*]->(p:ns0__TreeNumber)<-[:ns0__treeNumber]-(d:ns0__TopicalDescriptor) 
WHERE n.rdfs__label = "Eyebrows" 
return n, t, p, d
Enter fullscreen mode Exit fullscreen mode

Going via the tree number gives only “Body Regions” and “Integumentary System” as the broadest descriptor

Going via the tree number gives only “Body Regions” and “Integumentary System” as the broadest descriptor

For this reason, I will use ns0_treeNumber to find hierarchical relationships rather than ns0_broaderDescriptor.

Conclusion

In conclusion, using the Medical Subject Headings (MeSH) ontology to create a knowledge graph is highly beneficial. By importing MeSH as an RDF file into Neo4j with neosemantics (n10s), we can easily explore the extensive collection of medical terms and their relationships.

Descriptors, concepts, and terms are essential components of MeSH. Descriptors encompass broad categories, concepts provide specific definitions within descriptors, and terms offer synonyms for concepts. Understanding the hierarchical structure is crucial for effective graph analysis, with tree numbers being a more reliable way to establish relationships than broader descriptors.

In summary, MeSH is a valuable resource for constructing medical knowledge graphs. Leveraging its rich information and employing appropriate graph analysis techniques, researchers can gain meaningful insights from medical literature and data.

Top comments (0)