In an effort to publish this quickly, I am posting without proofreading. Errata is welcome.
In the last post I discussed using rules to generate RDF statements that are entailed by RDFS. This is useful stuff, but is very limited due to the lack of expressivity of RDFS.
This is to be expected, since RDFS was a limited vocabulary that was released well before the full Web Ontology Language (OWL) was released. But if we adopt OWL, then what entailments will be valid and useful?
Description Logics
As shown in my first post in this series, OWL provides a vocabulary for a Description Logic. In particular, OWL2 conforms closely to a family of logics known as ๐ฎโ๐ชโ๐ฌ. This name indicates some of the elements of the vocabulary:
 ๐ฎ: An Abbreviation for ๐โ๐
 ๐โ๐: Contains concepts including:
 Classes
C
, and rolesr
 โค (top, or everything)
 โฅ (bottom, or nothing)
 โ (conjunctions, or intersections)
 โ (disjunctions, or unions)
 ยฌ (negation, or inverse)
 โr.C (existential role restrictions)
 โr.C (universal role restrictions)
 Classes
 โ: Class hierarchies (classes, with subclasses)
 โ: Complex role inclusion. This is both a role hierarchy (indicated by โ) and role composition.
 โ: Inverse roles.
 ๐ช: Nominals.
 ๐ฉ: Cardinality restrictions on roles.
 ๐ฌ: Qualified cardinality. This includes cardinality restrictions (indicated by ๐ฉ), and can also qualify them to classes.
To explain each of the above:

Classes are a way to classify things. Entities can be classified multiple ways, in which case we say the entity has a type of the class, or that the entity is an instance of the class. e.g.
Person
can be a class and the entity we name "Alice" may be an instance ofPerson
. 
Roles describe relationships between entities. e.g. an entity named "Alice" may have a
hasChild
relationship to an entity named "Susan".  Top is a universal class that every entity is an instance of.
 Bottom is an empty class that no entity is an instance of.

Conjunctions are the combination of multiple classes where every class must apply. e.g. A wooden chair is an instance of the conjunction formed from the classes
Wooden
andFurniture
. 
Disjunctions are the combination of multiple classes where one or more classes must apply. e.g.
OfficeEquipment
could be a disjunction ofOfficeFurniture
,ComputerEquipment
,Stationery
, andKitchenSupplies
. 
Negation is used to create a class of everything that is not the negated class. e.g. The negation of the class of
Visible
things is everything that cannot be seen. That includes both physical objects, like air, but also arbitrary concepts like "imagination" or "price". It is the entire universe of things that are not in theVisible
class. 
Existential role restrictions means that a given relationship must exist in order to be a member of a class. e.g. To be a
Parent
an entity must have ahasChild
relationship to another entity. 
Universal role restrictions means that all use of a role has to be with a specific class. e.g.
hasChild
can be defined to always refer to instances of the classChild
. 
Role hierarchy indicates more general or specific relationships between roles. This creates subproperty and superproperty relationships and is roughly analogous to subclasses and superclasses. e.g.
hasDaughter
is a more specific role thanhasChild
, whilehasDescendent
is a more general role. SohasDaughter
is a subproperty forhasChild
, andhasChild
is a subproperty forhasDescendent
. 
Inverse roles refers to the relationship that goes in the opposite direction between entities. e.g.
hasParent
is the inverse role tohasChild
. 
Nominals describes a class of items that is defined by its membership in the class. e.g.
PresidentOfTheUnitedStates
can be defined as a class of the 46 people who have had that position (as of this writing). 
Number restrictions (or cardinality restrictions) refers to a minimum or maximum number of relationships. e.g. To be a member of
FullTimeStudent
a university might require that a student have a minimum of 4enrolledIn
relationships. 
Qualified cardinality is a more specific type of nominal, where the class of the relationship must apply. e.g. For a student to be a
MathMajor
they might require a minimum of 10passed
relationships to instances ofMathSubject
.
OWL
These constructs are all supported by OWL, and each of those constructs has a mapping to RDF. This means that for each Description Logic expression there is a way to express that expression precisely in RDF. It is data like this that was read and processed by Pellet in the Oedipus example in my initial post.
The problem with systems like Pellet is that they rely on memory to explore all possibilities for the data. Consequently, they can struggle with large bodies of data, and are unable to handle large ontologies such as SNOMED CT. We have a much better chance of scaling OWL processing if we can use operations that databases are designed to provide.
The principal database operation is the Query
, and so the various approaches to scaling try to build on this operation.
Query Rewriting
One approach to identifying entailments in OWL is by using it to rewrite queries.
The first to consider is a general approach to rewriting queries, where the query can be expanded to ask for parts of the ontology. For instance, consider asking for the classes a resource is an instance of:
select ?class
where { my:resource a ?class }
This can be rewritten to consider superclasses as well:
select ?class
where { my:resource a/rdfs:subClassOf* ?class }
Where the *
modifier describes the transitive closure of the rdfs:subClassOf
relationship, starting with the 0 step, meaning that it includes the step where the class is the immediate type of the resource.
A more specific case is using the ontology to rewrite the query. For instance, if the property my:prop
is transitive, then querying for it can always be expanded to use the +
modifier. So the query:
select ?related
where { my:resource my:prop ?related }
Would be modified to:
select ?related
where { my:resource my:prop+ ?related }
These are some trivial examples, but some great work was done on this by โชHรฉctor PรฉrezUrbina in his PhD dissertation.
Rules
Another approach to scalable entailment is using rules. The mechanism for this is using certain existing data structures to generate new data structures that get inserted alongside the original data. I described this in the last post, where I used construct
queries to obtain the data generated by each rule. The other option is to send this generated statements back into the graph by changing construct
to insert
.
For instance, the transitive closure of the property my:prop
above could be created with an update operation:
insert { ?a my:prop ?c }
where { ?a my:prop ?b . ?b my:prop ?c }
Note that this is a single update, and not actually a full "rule". Running this will only extend all of the my:prop
relations by 1 step, doubling the length to 2. But running this iteratively will extend the maximum length of the closure by doubling, so it will rapidly cover the entire closure.
What we can learn from this is that rule systems can be built from query/update operations like this, but they need a mechanism for scheduling the rules to be run over and over when needed, and to stop when nothing new is being generated. The basic algorithm for doing this is called RETE, and I discussed this and an implementation at Clojure/conj 2016.
Because rules are built on a querying mechanism that is foundational to the database, they are often very fast. They also rely on the main database storage, so they can scale with the database. They allow computational complexity to be precalculated, with the results stored. This allows subsequent operations to rely on space complexity instead. These are very important characteristics for working with large quantities of data, and for this reason I will be focusing on this approach to entailment.
Rule Justification
Many OWL operations describe entailments that can be expressed as a rule. For instance, the OWL 2 Structural Specification and FunctionalStyle Syntax document describes many constructs with examples of what they entail. There are many examples of this, but a simple one is the Symmetric Object Property which describes that a:friend
is symmetric. Consequently, if "Brian is a friend of Peter" then this entails "Peter is a friend of Brian".
The general rule for symmetry can then be given as:
insert {?b ?prop ?a}
where {?a ?prop ?b . ?prop a owl:SymmetricProperty}
This can seem to be a straightforward operation, but interesting insights come about when we consider exactly why these new statements are allowed to be asserted.
Validity and Consistency
Logic systems can be described using a pair of properties: validity and consistency.
 A system is valid if every interpretation of the systems leads to conclusions that are true.
 A system is invalid if there exists an interpretation where the conclusions are not true.
 A system is consistent if there exists an interpretation where the conclusions are true.
 A system is inconsistent if there are no interpretations where the conclusions are true.
The interpretation of a system is a selection of values that conform to the system.
To explain some of this, let's use the mathematical domain. In this case, the interpretation will usually be a selection of numbers to associate with values.
Valid
Since every possible interpretation of a valid system is true, these are also referred to as a tautology. At first glance, this does not seem that useful, however it is a very important concept.
An example of a valid math expression is:
 x  โฅ x
The various interpretations of this system are the values that x can take in the domain of real numbers โ. In this case, it doesn't matter what value x takes, since the equation will always be true.
This is still logic, so we can introduce an or
operation, with another valid equation:
x > 1 โจ x < 2
Again, this is a tautology, as it will be true for every interpretation of x in the domain.
Invalid
This applies to any system which is not valid. That simply means that there exists an interpretation where truth does not hold. For instance:
x > 2
This is true when x is 3 or 4, but it is not true when x is 1 or 2. Lots of systems are Invalid, since tautologies (i.e. valid systems) don't have a lot to say.
Consistent
A system is consistent when there exists an interpretation that leads to truth. The example invalid statement also happens to be consistent:
x > 2
As already mentioned, when x is 3 then the statement is true, so this system is consistent.
Inconsistent
This indicates that a system is not consistent, meaning that there are no possible interpretations which can be true. For instance:
x < 3 โง x > 4
There are no numbers that meet both of these conditions, and therefore there are no interpretations where this is true.
Relations to Each Other
When considered in relation to one another we see the following states for systems:
 Always true: Valid and Consistent
 Sometimes true, Sometimes false: Invalid and Consistent
 Always false: Invalid and Inconsistent
Entailment
Entailment is the operation of finding new statements that are true in every possible interpretation of a system, given its semantics. This is possible when using the Open World Assumption (OWA), since new statements can always be added, which is a concept that is sometimes expressed as, "Anyone can say Anything about Anything" (this phrase appears in an early draft of the RDF Concepts and Abstract Data Model, and also in the book "Semantic Web for the Working Ontologist", by Allemang, Hendler, and now in the second edition, Gandon. I will refer to this book as SWWO).
This cuts both ways though: not only does it mean that new statement may be created, but it also limits which statements may be created. The OWA says that there are possibly a lot of other statements that the system does not describe which could lead to a statement not being possible in every possible interpretation. When it comes to identifying new statements that can be entailed, it is a useful exercise to consider all the possible constructs that could lead to the statement leading to a falsehood, even if it requires a convoluted set of statements to get there.
Tableaux
One approach in using validity and consistency is to determine if a given system entails a statement using the Tableaux Algorithm. In this case, for a given system ๐ฎ an entailment ๐จ is described as:
๐ฎ โจ ๐จ
This entailment can only be true if:
๐ฎ โ {ยฌ๐จ} is inconsistent
This is a useful test, because the algorithm need only discover a single false case to prove inconsistency. Pellet is an implementation of this algorithm, and while it does not scale to very large ontologies, it is nevertheless very powerful.
Rules
Another approach to entailment is to use rules. As mentioned above, this can be done when we know that a statement is legal in every possible interpretation of the system.
There is a defined subset of possible reasoning in OWL2 which can lead to entailments via rules. This subset is called the OWL 2 RL Profile, and the rules can be found in tables 4 through to table 9 in the rules section of the OWL 2 Profiles document. It is some of these rules that I want to explore here and in other posts.
Intersections
A practical application of all of this can be seen in Intersections.
As described in SWWO, an intersection of classes :A
and :B
can be described using a subclass relationship:
This is described in Turtle as:
:IntersectionAB rdfs:subClassOf :A, :B .
Let's consider what can be inferred from this.
If we have an instance of :IntersectionAB
called :x
, then this is represented as:
:x a :IntersectionAB .
The rule rdfs9
is:
insert { ?zzz a ?yyy } where { ?xxx rdfs:subClassOf ?yyy . ?zzz a ?xxx }
Applying this will result in :x
being an instance of both :A
and :B
:
:x a :IntersectionAB .
:x a :A .
:x a :B .
These inferences are valid, because the definition of the rdfs:subClassOf
relationship states and instances of a subclass will also be instances of the superclass. There are no interpretations where this does not hold.
Class Membership
Another possibility is when :y
is a member of both :A
and :B
.
:y a :A .
:y a :B .
It would seem reasonable to infer that :y
is therefore an instance of :IntersectionAB
. However, inferences are only valid if they apply in every possible system, and the Open World Assumption (OWA) must allow for any new consistent statement. There are statements that can be introduced that are both consistent with the existing statements, and inconsistent with inferring :y
as a member of :IntersectionAB
.
To see an example of this, we can introduce a two new classes, called :C
and :D
, are the complements of each other. This means that anything that is a member of :C
is not a member of :D
, and vice versa. We can also make :IntersectionAB
a subclass of :C
:
:IntersectionAB rdfs:subClassOf :A, :B, :C .
:C owl:complementOf :D .
If :y
becomes an instance of :D
then if cannot be a part of the intersection, since it cannot be an instance of :C
.
:IntersectionAB rdfs:subClassOf :A, :B, :C .
:C owl:complementOf :D .
:y a :A, :B, :D .
Regardless of how contrived the example may be, the fact that any such example exists indicates that the inference may not be made.
OWL Intersections
The problem with inferring membership in an intersection above is that the intersection is open, meaning that new classes can be added to the intersection, and those classes can preclude an instance of the other classes from becoming a member.
OWL addresses this by defining an closed intersection using an RDF Collection. This uses a linked list structure that does not allow for extra members.
Redefining :IntersectionAB
we can express this in TTL as:
:IntersectionAB owl:intersectionOf (:A :B) .
This looks short and simple, but expands into a longer set of triples:
:IntersectionAB owl:intersectionOf _:b1 .
_:b1 rdf:first :A .
_:b1 rdf:rest _:b2 .
_:b2 rdf:first :B .
_:b2 rdf:rest rdf:nil .
RDF defines collections to have a specific structure with each node in the list having a single rdf:first
and rdf:rest
connection, terminating in the rdf:nil
node. This means that it is not a valid construct to include any more elements in the collection.
As a consequence, if there is an element :y
which is an instance of both :A
and :B
, then it is not possible to add in triples that make :y
a member of something that is excluded from the intersection. Therefore, it is valid to infer that :y
is in the intersection:
:y a :IntersectionAB .
This is described in the OWL semantics, and demonstrated in the OWL 2 RL profile in the rule clsint1
found in table 6. This rule states:
IF
T(?c, owl:intersectionOf, ?x)
LIST[?x, ?c1, ..., ?cn]
T(?y, rdf:type, ?c1)
T(?y, rdf:type, ?c2)
...
T(?y, rdf:type, ?cn)
THEN
T(?y, rdf:type, ?c)
In English it says:
If ?c
is an intersection described by ?x
and ?x
is a list containing ?c1
through to ?cn
and ?y
is an instance of every element in that list
Then ?y
is an instance of the intersection ?c
In Practice
Unfortunately, this is tricky to describe in SPARQL. It is easy to check if a value is an instance of one or more members of a list, but how can you check if it is a member of every member of the list?
Let's start with some example data, and try to perform the entailment on it.
First of all, we can define the intersection of 3 classes: :A
, :B
and :C
. Then we'll describe 3 objects: :m
, :n
, and :o
.
 The
:m
object will be an instance of all 3 classes  The
:n
object will be an instance of 2 of the 3 classes  The
:o
object will not be an instance of any of the classes
We should be able to find that :m
is a member of the intersection, while :n
and :o
are not.
:IntABC owl:intersectionOf (:A :B :C).
:m a :A, :B, :C.
:n a :A, :C.
:o a :D.
Let's start with finding a value ?y
which is a member of the intersection class ?c
:
select distinct ?y
where {
?c owl:intersectionOf ?x .
?x rdf:rest*/rdf:first ?cn .
?y a ?cn }
This returns :m
and :n
, since they are both instances of classes in the intersection.
Now we need to remove values of ?x
which don't match every class in the intersection. To do that, start with finding those that don't match everything in the intersection. This can be found by considering each element in the collection (call it ?d
) and pairing it with every other element in the collection (call these ?d2
). We can then look for the instances of ?d
:
select distinct ?y ?d ?2
where {
?c owl:intersectionOf ?x .
?x rdf:rest*/rdf:first ?d .
?x rdf:rest*/rdf:first ?d2 .
FILTER (?d != ?d2)
?y a ?d }
This returns every class paired with every other class, but only for instances of the first class:
?y 
?d 
?d2 

:m 
:C 
:A 
:m 
:C 
:B 
:m 
:A 
:C 
:m 
:A 
:B 
:m 
:B 
:A 
:m 
:B 
:C 
:n 
:C 
:A 
:n 
:C 
:B 
:n 
:A 
:C 
:n 
:A 
:B 
Note how :n
does not include a ?d
equal to :B
, but it does have ?d2
set to each value. If we remove cases where ?y
is set to the second value, then everything will be removed for ?y
=:m
, since:
 when
?y
is:m
, and:m
is an instance of:A
, then?y
is also an instance of:B
and:C
 when
?y
is:m
, and:m
is an instance of:B
, then?y
is also an instance of:A
and:C
.  when
?y
is:m
, and:m
is an instance of:C
, then?y
is also an instance of:A
and:B
However, when ?y
is :n
not everything is cancelled:
 when
?y
is:n
, and:n
is an instance of:A
, then?y
is and instance of:C
, and that gets removed, but:n
is not an instance of:B
.  when
?y
is:n
, and:n
is an instance of:C
, then?y
is an instance of:A
, and that gets removed, but:n
is not an instance of:B
.
The query to express this is:
select distinct ?y ?d ?d2
where {
?c owl:intersectionOf ?x .
?x rdf:rest*/rdf:first ?d .
?x rdf:rest*/rdf:first ?d2
FILTER (?d != ?d2)
?y a ?d MINUS {?y a ?d2}}
?y 
?d 
?d2 

:n 
:C 
:B 
:n 
:A 
:B 
Now that we've found the values of ?y
that we don't want, they can be removed from the original query:
select distinct ?y
where {
?c owl:intersectionOf ?x .
?x rdf:rest*/rdf:first ?cn .
?y a ?cn
MINUS {
?x rdf:rest*/rdf:first ?d .
?x rdf:rest*/rdf:first ?d2 .
FILTER ( ?d != ?d2 )
?y a ?d MINUS { ?y a ?d2 }
} }
This gives the single solution of :m
?y 

:m 
So the final rule is:
insert { ?y a ?c }
where {
?c owl:intersectionOf ?x .
?x rdf:rest*/rdf:first ?cn .
?y a ?cn
MINUS {
?x rdf:rest*/rdf:first ?d .
?x rdf:rest*/rdf:first ?d2 .
FILTER ( ?d != ?d2 )
?y a ?d MINUS { ?y a ?d2 }
} }
NOTE: This query is for demonstration only. These operations are implemented doubly nested loops, which will not scale at all. It works for small ontologies, but if you try it on something like SNOMED then you will discover that the database will process for over a week. Nonstandard SPARQL operations can do this much more efficiently.
Final Comment
This post was to introduce people to some of the more detailed elements of OWL's representation of Description Logic, explained Valid models are ones in which all interpretations will be true, and how entailment can be made for Consistent statements that lead to correct models for every possible interpretation.
The examples at the end demonstrated how entailment can be limited in the open structures of RDFS, but is more capable for the closed structures described in OWL, always remembering that the model itself always follows the Open World Assumption.
The SPARQL rule for owl:intersectionOf
was me being clever, even if it's useless in the real world due to the scalability issues. ๐ I've been doing this in the real world with code that is outside of SPARQL, but I ought to be able to do it with SPARQL extensions like stardog:list:member
(this could remove one level of loop in the above query, but I think it's possible to do even better).
All of this is to provide background for my next blog post, which I ought to be able to start now that I've finished writing this!
Top comments (0)