The post Adventure Game Sentence Parsing with Compromise appeared first on Kill All Defects.
In this article I’ll show you how to use the Compromise JavaScript library to interpret user input and translate it to a hierarchical sentence graph.
I’ll be using Compromise to interpret player input in an Angular interactive fiction game, but you can use Compromise for many different things including:
- Analyzing text for places, names, and companies
- Building a context-sensitive help system
- Transforming sentences based on tenses and other language rules
Learning Objectives
In this article we’ll cover:
- What compromise is
- How you can use compromise to analyze sentences
- Making inferences about sentence structure based on compromise
Note: this article is an updated and more narrowly scoped version of an older article I wrote on Compromise. This information works with modern versions of Angular as well as modern versions of Compromise.
What is Compromise?
Compromise is a JavaScript library aiming to be a compromise between speed and accuracy. The aim is to have a client-side parsing library so fast that it can run as you’re typing while still providing relevant results.
In this article I’ll be using Compromise to analyze the command the player typed into a text-based game and build out a Sentence
object representing the overall structure of the sentence they entered. This sentence can then be used in other parts of my code to handle various verbs and make the application behave like a game.
Installing and Importing Compromise
To start with compromise, you first need to install it as a dependency. In my project I run npm i --save compromise
to save the dependency as a run-time dependency.
Next, in a relevant Angular service I import Compromise with this line:
import nlp from 'compromise';
Thankfully, Compromise includes TypeScript type definitions, so we have strong typing information available, should we choose to use it.
String Parsing with Compromise
Next let’s look at how Compromise can be used to parse text and manipulate it.
Take a look at my parse
method defined below:
Here I use nlp(text)
to have Compromise load and parse the inputted text value. From there I could use any one of a number of methods Compromise offers, but the most useful thing for my specific scenario is to call .termList()
on the result and see what Compromise has inferred about each word in my input.
Note: the input text doesn’t have to be a single sentence, it could be several paragraphs and Compromised is designed to function at larger scales should you need to analyze a large quantity of text.
When I log the results of Compromise’s parse operation, I see something like the following:
Note here that the Term
array contains information on a few different things, including:
- text – the raw text that the user typed
- clean – normalized lower-case versions of the user’s input. This is useful for string comparison
- tags – an object containing various attributes that may be present on the term, based on Compromise’s internal parsing rules.
This tags collection is the main benefit to Compromise that I’ll be exploring in this article (aside from its ability to take a sentence and break it down into individual terms as we’ve just seen).
Here we see that the tags
property of the Open
term contains {Adjective: true, Verb: true}
. This is because English is a complex language and open can refer to the verb of opening something or an object’s state, such as an open door.
We’ll talk a bit more about this disambiguation later on, but for now focus on Compromise’s ability to recognize English words it knows and make inferences on words it doesn’t know based on patterns in their spelling and adjacent terms.
Compromise’s intelligence in this regard is its main selling point for me on this type of an application. Compromise gets me most of the way there on figuring out how the user was trying to structure a sentence. This lets me filter out words I don’t care about and avoid trying to codify the entire English language in a simple game project.
Adding an Abstraction Layer
If you scroll back up to my parse
method, you’ll note it has a : Sentence
return type specified.
This is because I believe in adding abstraction layers around third party code whenever possible. This has a number of benefits:
- If third party behavior or signatures change significantly, you only need to adapt signatures in a few places since everything else relies on your own object’s signature
- If you need to change out an external dependency with another, you just need to re-implement the bits that lead up to the abstraction layer
- Wrapping other objects in my own makes it easier for me to define new methods and properties that make working with that code easier
For Compromise, I chose to implement two main classes, a Word class and a Sentence class:
I won’t stress any of the details of either of these implementations except to state that they wrap around Compromise’s Term
class while allowing me to do integrated validation and structural analysis of the entire sentence.
Validating Sentences
Once I have a Sentence
composed of a series of Word
objects, I can make some inferences on word relationships based on how imperative (command-based) sentences are structured in English.
Note that for the purposes of my application that I treat all input as a single sentence regardless of punctuation. My validation rules catch cases with multiple sentences fairly easily so I don’t see a need to distinguish on sentence boundaries.
Specifically, I validate that the first word in a sentence is a verb. This makes sense only for imperative sentences such as Eat the Fish
or Walk North
, but that’s the types of sentences we expect in a game like this.
Next I validate that a sentence only contains a single verb (Term with a Verb
tag). Anything with two or more is too complex for the parser to be able to handle.
Once these checks are done, I can start to analyze words in relation to each other.
Making Inferences about Sentences
I operate under the assumption that the sentence is mainly oriented around one verb and zero or more nouns.
I then loop over each word in the sentence from the right to the left and apply the following rules:
- If the word is an adverb, I associate it with the verb
- If the word is not a noun, verb, or adverb, I associate it with the last encountered noun, if any.
The full method can be seen here:
Once that’s done, I have a hierarchical model of a sentence. For ease of illustration, here is a debug view of a sample sentence:
Next Steps
With parsing in place the sentence contains a fairly rich picture of the structure of the sentence. This doesn’t mean that the player’s sentence makes logical or even grammatical sense, or even refers to something present in the game world.
The sentence can, however, be passed off to a specific verb handler for the command entered, which in turn can try to make sense of it and come up with an appropriate reply, though this is out of the scope of this article, so stay tuned for a future article on game state management.
Top comments (0)