## DEV Community Paula Gearon

Posted on • Updated on

# Classification

My previous post showed how to use a reasoner like Pellet to make some interesting inferences about stated data. In that case, it demonstrated how a compound class could be determined with insufficient data to identify which of its components was satisfied.

Interesting though it was, this was a contrived question made against data that was specifically designed for it, and required a complex reasoner that is limited in its ability to scale. Less contrived scenarios are more interesting, precisely because they can be applied to real-world data at scale.

This leads to the question of what sorts of inferences are valid to make in OWL?

## Specification

The OWL 2 specification describes the vocabulary and how this maps into Description Logic. Most of it is rather obtuse due to the need to provide a precise mathematical definition.

Fortunately, a number of the concepts are relatively straightforward to follow. Indeed, many of the concepts can be codified into rules to be applied mechanically.

## Entailment

The basic structure of most rules is a description of conditions that must be met by existing data, and new data that may be asserted. Mathematical logic describes this using the Horn Clause, and this has become the standard tool for logic programming systems as well.

As an example, we can make a statement saying that all people are mortal, so if an individual is a person, they must be mortal:

``````Mortal(X) :- Person(X).
``````

Here, I have described `Person` and `Mortal` as classes, and `X` as an individual.

To read a Horn clause, the outcome is described first, given the conditions that come after the `:-` symbol. So the above says, "X is Mortal, if X is a Person".

The outcome of the clause is referred to as the Head of the clause, while the conditions are referred to as the Body.

Multiple necessary conditions may be described in the body by separating them with commas. For instance, if a person has a child, then they are a parent:

``````Parent(X) :- Person(X), hasChild(X,Y).
``````

Note that the body now includes an entity `Y` who is not referred to in the head.

A more mathematical approach for this statement might be:

``````∀x x∊Person: ∃y (x,y)∊hasChild
``````

i.e. For all `x` where `x` is a person, there exists a `y` where relationship of `x,y` is a member of the role-relationship `hasChild`. Or, less strictly, for `x` where `x` is a person, there exists a `y` where `x.hasChild.y`

To be honest, a lot of these notations are merely very precise and tedious way to express concepts that don't seem too difficult. But the precision allow mathematicians to eventually getting around to saying more interesting things, while proving that they got it right.

## RDF and SPARQL

### Data

The Resource Description Framework (RDF) and RDF Schema (RDFS) were designed to describe classes and data in a form that can be managed by a computer.

To declare the above classes we can say that the they are each instances of the schema type called `Class`, using Turtle syntax:

``````@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://quoll.gearon.org/data/demo.ttl#> .

:Person a rdfs:Class .
:Mortal a rdfs:Class .
:Parent a rdfs:Class .
``````

The prefixes here will be used throughout this post. Strictly speaking, they should appear with each block of Turtle and each SPARQL query, but I will elide them for brevity.

We can then use these classes to declare instances of data, say `:mary` and `:jane`:

``````:mary a :Person .
:jane a :Person .
``````

We can also declare the role `:hasChild`, and connect Mary and Jane with it:

``````:hasChild a rfds:Property .
:mary :hasChild :jane .
``````

There is a whole technology stack that can start working with this already, but for now, I'm going to stick to the Horn Clauses we've already discussed.

### Querying

#### Mortality

Let's consider the first clause:

``````Mortal(X) :- Person(X).
``````

This means we are looking for all values of `X` that have a type of `Person`. Once we have those values for `X`, we want to declare that each `X` has a type of `Mortal`. We can create a graph based on this query:

``````construct { ?x a :Mortal }
where { ?x a :Person }
``````

This would result in:

``````:mary a :Mortal .
``````

Alternatively, we could assert all this back in with the original data by saying `insert` rather than `construct`:

``````insert { ?x a :Mortal }
where { ?x a :Person }
``````

With either expression, the `where` clause describes the conditions for the body of the Horn Clause, and the `insert` or `construct` describes the head of the clause.

#### Parenthood

Similarly, we can look at the second clause again in order to convert it:

``````Parent(X) :- Person(X), hasChild(X,Y).
``````

The two expressions in body will result in a pair of patterns in the `where` clause of the query:

``````construct { ?x a :Parent }
where { ?x a :Person . ?x :hasChild ?y }
``````

This would result in:

``````:mary a :Parent .
``````

## RDFS Entailment

The above rules are fine for a specific application, but they aren't generally useful. Instead, we'd like to be able to describe data with greater expressivity, and identify rules that can be applied more generally.

In the early days of the Semantic Web, the W3C had planned on releasing RDF as a data model, along with an expressive language to describe that data. There was a lot of progress with prior systems like DAML, OIL and DAML+OIL, but getting the semantics of these systems fully defined and correct is a very difficult task, leading to the creation of each of these systems, which all attempted to correct the issues of their predecessors. These efforts culminated in the Web Ontology Language (OWL) which was formally released along with the first formal release of RDF in 2004.

Prior to the formal releases of these standards, there was already a lot of RDF in use, which meant that there was a need to describe that data. While the complicated details of ontologies were being worked out, the working groups published a minimal vocabulary that could describe a taxonomy of classes and roles. This was first published as a draft for RDF Schema, or RDFS.

Alongside RDFS came a set of "semantics" (a word that means "meaning") which described what the vocabulary meant when it said things. A blog post isn't the place to get into what I mean when I say, "Class" or "Role" (also called a "property"), but I expect that if you're reading this then you may already have an idea. Instead, I'm going to look at a few other elements of RDFS.

### Subclass Descriptions

The first concept I'm going to consider is the "subclass". If individuals can be classified into a class, then a subclass is a refinement on that classification, resulting in a smaller group.

Considering the classes we had earlier, we can see that all instances of `:Person` must be in the class `:Mortal`, but there is nothing that says that all `:Mortal`s must in the the class of `:Person`. So `:Person` is at most equal to `:Mortal`, and most likely a smaller group. One way to write this mathematically is:

``````Person ⊑ Mortal
``````

This is essentially the same notation as "subset" ⊆, and it means the same sort of thing, just in relation to classes.

RDFS expresses the same thing with the `rdfs:subClassOf` relationship:

``````:Person rdfs:subClassOf :Mortal .
``````

In a similar way, all Parents are Persons, but not all people are parents:

``````Parent ⊑ Person
``````
``````:Parent rdfs:subClassOf :Person .
``````

### Domain and Range Descriptions

Whether we use the term "role", "property", "predicate" or "attribute", we are referring to some kind of relationship between things. Once we introduce relationships it starts being useful to declare what a type of things can be related to each other. For instance, `hasChild` is an interesting relationship to have between people, but is not typically associated with something like an item of furniture.

RDFS uses the term "domain" to describe the class of things that a property can be applied to. For instance, the `hasChild` property can be applied to the class of `Parent`. This is declared with:

``````:hasChild rdfs:domain :Parent .
``````

In the same way, the things that a property can refer to are referred to as the "range":

``````:hasChild rdfs:range :Person .
``````

Note that only parents have children, but any person can be someone's child.

### Entailment

We now have enough to start looking at some of the entailments defined by RDFS.

Given that RDF uses an Open World Assumption (OWA), RDFS never considers applying a property to something whose type does not match to be an error. One reason for this is because the data model can always have new data added, and it is presumed that this data will be consistent with whatever exists.

One effect of this is that we can sometimes derive what that new data must be. For instance, say we introduce a new person:

``````:jane :hasChild :kathy .
``````

Until now, `:jane` has only been a `:Person`, but because `:hasChild` has a domain of `:Parent` we now know that Jane is also a parent. Similarly, we haven't seen Kathy before, but the appearance in this statement tells use that she is a person.

We can codify these with rules:

``````Parent(X) :- hasChild(X,Y).
Person(Y) :- hasChild(X,Y).
``````

But this is where having a vocabulary for describing the system becomes useful. Instead of describing rules for each property, we can base the rules on the vocabulary and create a more general form:

``````C(X) :- rdfs:domain(P,C), P(X,Y).
D(Y) :- rdfs:range(P,D), P(X,Y).
``````

Having variable predicates like this is often considered 2nd order logic, but in this case it's not really. It just looks like it is because of the syntax we use. Instead, I'll use the term that my supervisor called it: 1.5th order logic.

To explain why this is not 2nd order logic, consider it this way: We may represent `(subject,predicate,object)` as `predicate(subject,object)` but a true Predicate-Logic representation would be with ternary predicates like: `statement(subject,predicate,object)`. This is not as useful for our purposes, and is aesthetically less inviting, but nevertheless more correct.

#### Domain and Range Entailment

The above rules can also be converted into SPARQL:

``````construct { ?x a ?c }
where { ?p rdfs:domain ?c . ?x ?p ?y }
``````
``````construct { ?y a ?d }
where { ?p rdfs:range ?d . ?x ?p ?y }
``````

The statements created from these queries are considered valid entailments from RDFS:

``````:mary a :Parent .
:jane a :Parent .
:jane a :Person .
:kathy a :Person .
``````

Note that some of the entailed statements already exist. This is fine, as graphs are considered to be a set of edges, so duplicates are ignored.

#### Subclass Entailment

Subclass entailment can also be described using queries. In this case, there are two entailments to consider.

We know that:

• `:Person` is a subclass of `:Mortal`, therefore all instances of `:Person` will be an instance of `:Mortal`.
• `:Parent` is a subclass of `:Person`, therefore all instances of `:Parent` will be an instance of `:Person`.

Putting this together:

• Instances of `:Parent` must be instances of `:Person`.
• Instances of `:Person` must be instances of `:Mortal`.
• Therefore: instances of `:Parent` must be instances of `:Mortal`.

The formal term for this is that `rdfs:subClassOf` is a transitive property. This is something that OWL describes explicitly, but I'll leave that for later. More succinctly:

``````A ⊑ B ⊑ C  ⊢  A ⊑ C
``````

Or as a Horn Clause:

``````rdfs:subClassOf(A,C) :- rdfs:subClassOf(A,B), rdfs:subClassOf(B,C).
``````

This can be evaluated in SPARQL with:

``````construct { ?a rdfs:subClassOf ?c }
where { ?a rdfs:subClassOf ?b . ?b rdfs:subClassOf ?c }
``````

The next part is considering the instances. We've already stated that if an entity is an instance of a class, then it will also be an instance of whatever that class is a subclass of (also called the superclass, just like in software development):

``````construct { ?x a ?d }
where { ?x a ?c . ?c rdfs:subClassOf ?d }
``````

With all of these rules in place, suddenly we have:

``````:Parent rdfs:subClassOf :Mortal .
:mary a :Parent .
:mary a :Mortal .
:jane a :Parent .
:jane a :Mortal .
:kathy a :Person .
:kathy a :Parent .
:kathy a :Mortal .
``````

Incidentally, both `rdfs:domain` and `rdfs:range` have a domain of `rdf:Property` and a range of `rdfs:Class`. So a full entailment regime means that we don't need to declare that `:hasChild` is a property, and we don't need to explicitly declare that any of our classes a `rdfs:Class`. I will sometimes include these declaration for the sake of clear documentation, but they're not needed.

### More Entailment

The patterns of RDFS entailment are all described in the RDF Semantics specification, and collected in the section 9.2 of that document.

The rules described above correspond to several of the RDFS Entailment patterns:

``````# rdfs2
construct { ?x a ?c } where { ?p rdfs:domain ?c . ?x ?p ?y }
# rdfs3
construct { ?y a ?d } where { ?p rdfs:range ?d . ?x ?p ?y }
# rdfs11
construct { ?a rdfs:subClassOf ?c } where { ?a rdfs:subClassOf ?b . ?b rdfs:subClassOf ?c }
# rdfs9
construct { ?x rdfs:subClassOf ?d } where { ?x a ?c . ?c rdfs:subClassOf ?d }
``````

Using the patterns above, the process for converting all of the entailment patterns to executable rules takes very few steps.

``````# rdfs1
construct { ?aaa a rdfs:Datatype } where {?xxx ?yyy ?ddd FILTER isLiteral(?ddd) BIND (datatype(?ddd) AS ?aaa)}
# rdfs2
construct { ?yyy a ?xxx } where { ?aaa rdfs:domain ?xxx . ?yyy ?aaa ?zzz }
# rdfs3
construct { ?zzz a ?xxx } where { ?aaa rdfs:range ?xxx . ?yyy ?aaa ?zzz }
# rdfs4a
construct { ?xxx a rdfs:Resource } where { ?xxx ?aaa ?yyy }
# rdfs4b
construct { ?yyy a rdfs:Resource } where { ?xxx ?aaa ?yyy FILTER !isLiteral(?yyy)}
# rdfs5
construct { ?xxx rdfs:subPropertyOf ?zzz } where { ?xxx rdfs:subPropertyOf ?yyy . ?yyy rdfs:subPropertyOf ?zzz }
# rdfs6
construct { ?xxx rdfs:subPropertyOf ?xxx } where { ?xxx a rdf:Property }
# rdfs7
construct { ?xxx ?bbb ?yyy } where { ?aaa rdfs:subPropertyOf ?bbb . ?xxx ?aaa ?yyy }
# rdfs8
construct { ?xxx rdfs:subClassOf rdfs:Resource } where { ?xxx a rdfs:Class }
# rdfs9
construct { ?zzz a ?yyy } where { ?xxx rdfs:subClassOf ?yyy . ?zzz a ?xxx }
#rdfs10
construct { ?xxx rdfs:subClassOf ?xxx } where { ?xxx a rdfs:Class }
# rdfs11
construct { ?xxx rdfs:subClassOf ?zzz } where { ?xxx rdfs:subClassOf ?yyy . ?yyy rdfs:subClassOf ?zzz }
# rdfs12
construct { ?xxx rdfs:subPropertyOf rdfs:member } where { ?xxx a rdfs:ContainerMembershipProperty }
# rdfs13
construct { ?xxx rdfs:subClassOf rdfs:Literal } where { ?xxx a rdfs:Datatype }
``````

Executing them all together is a task for system that can coordinate the rules, and is typically tied to a specific database implementation though this is not necessary (e.g. Naga, though I have yet to port this to SPARQL).