DEV Community

Cover image for [12/52] Fundamentals of Knowledge Engineering: Introductory Graphs
Brian Kirkpatrick
Brian Kirkpatrick

Posted on • Originally published at github.com

[12/52] Fundamentals of Knowledge Engineering: Introductory Graphs

Today represents the first in a long-promised transition of the series away from purely software development problems towards more generalist engineering topics. And I'm excited to finally get going! It's already most of the way through July, so I have a lot of catching up to do if I want to hit 52 "episodes" by the end of the year.

Latin Quizzer

When I was a senior in high school, I took Latin 1 & 2 to see if it would help me boost the verbal section of my SAT score. Most of my classmates were starting Latin in 6th grade, though, so I was way behind and needed a way to catch up. I wound up coding my own flashcard-like quiz app in GW-BASIC (I was just staring to learn C/C++ at the time). Even though it was just a quick "this term translates into what other term?" string evaluation--long before Duolingo was a thing!--it was incredibly useful.

Fundamentals of Engineering

I'd like to take a similar approach as we launch on our journey through the fundamentals of engineering. Let's think about writing a "quiz generator" application logic of some kind. How do we need to break down a topic area (whether it's electronics, or structures, or anything else)? We need some representation that lets us "generate" problems. There are a couple of useful ways to think about this.

Problem Generation

We'll consider a few different "types" of problems. Each problem will have a combination of "concepts" and test the student's knowledge of these concepts and their "relationships". The easiest way to think about this is with a specific equation:

P V = n R T
Enter fullscreen mode Exit fullscreen mode

In this example, we have five "concepts" (pressure, volume, count, gas constant, and temperature). These concepts are related to one another with a specific "relationship". In this case, this relationship constrains those concepts (much like we saw in our earlier "Spicy Math" video). Five variables (or "degrees of freedom") in a problem mean we would automatically generate values for four of them, then ask the student to use knowledge of this relationship to find the unique solution for the final degree of freedom.

Knowledge Graphs

When we talk about concepts and relationships, you've likely formed a picture of a graph in your head. Keep in mind we will have different "types" of relationships we want to explore (not just equations). Nonetheless, this idea of a "knowledge graph" is a powerful way to model content within a specific topic area.

But we can't "connect" specific nodes to other nodes because those connections (or relationships) themselves are uniquely modeled areas of our knowledge. Instead, we have multiple "sets" of nodes--at least two, even if we just focus on equations. This type of graph is commonly called a "bipartite graph", in which connections exist between sets of nodes (but not within the sets themselves).

Measurements as Nodes

One "set" of nodes in our graph will be the "concepts" themselves. Let's focus on our ideal gas equation. The "concepts" in that case were specific mostly measurements, like temperature and volume. (The exception of course was the ideal gas constant.) So, this defines the first "set" of nodes in our graph. We might model some of these measurement like so:

{
  "id": "P",
  "name": "pressure",
  "unit": "Pa",
  "type": "variable"
}
Enter fullscreen mode Exit fullscreen mode

In the case of the ideal gas constant, we might want a slightly different model:

{
  "id": "R",
  "name": "Gas constant",
  "unit": "J/(mol*K)",
  "type": "constant",
  "value": 8.314
}
Enter fullscreen mode Exit fullscreen mode

For easier lookup and parsing, let's assume these models are captured in a CSV table our app will consume. Empty columns will be unused for nodes that don't have relevant properties (pressure, for example, has no constant value).

There may be other "concepts" we want to model as nodes. For example, we may want to assert that a specific measurement "has a" specific type of unit. Or, a unit may "belong to" a specific set, like the set of all base SI units. In each relationship, both "ends" will be concepts, and their relationships will be specific kinds of connections linking those concepts.

basic sketch of a knowledge graph

Relationships as Equations

The next "set" of nodes we will want to consider are the relationships themselves. In our ideal gas equation example, the equation is a relationship that constrains (connects) specific concepts (or variables and constants). How would we model a relationship like this? Maybe like this:

{
  "id": "ideal_gas_law",
  "type": "equation",
  "equation": "PV=nRT",
  "involves": "P,V,n,R,T"
}
Enter fullscreen mode Exit fullscreen mode

We are ignoring, for the moment, the semantic issue of evaluating these equations symbolically. Instead, we'll assume the relationship can be "evaluated" at runtime to determine if the user's answer was correct. (This is one reason we include a specific expansion of the variables this equation involves.) Some degree of variation or symbolic manipulation could be introduced, though, in order to mix up which specific values must be "solved for".

Memberships

Just like there are other concepts we may want to model, there are other kinds of relationships we want to consider. For example, we may want to assert the membership of R in the set of all physical constants. We may also want to specify that the concept of "milliliters" is related to the concept of its corresponding "base SI" unit, liters, by a specific conversion factor.

Memberships are a key relationship but expand the "set" (beyond just two) of relationships we model between concepts. Let's assume for a moment this is only one degree of abstraction.

We can also recursively expand upon memberships, though, and all other relationships for that matter. In fact, the "difficulty" of a problem could be determined by how may recursive expansions (additional relationships) the student must consider when determining the correct solution. Maybe pressure and volume are not defined, for example, but must be derived from some other relationship (like an adiabatic expansion).

Problem Sets

Let's think about specific problems we might want to generate. Depending on the problem, different relationships will be draw and presented in different ways.

Solve an Equation

We've mostly focused on a "solve this equation" problem so far. This would "draw" (at least one) equation from our table of relationships. It would then randomly generate values for all but one variable. (This is also implicitly true for "units" type problems, even though the relationship in that case is not an equation but a factor conversion--unless you're talking about Farenheit and Celsius conversions, of course!) The "givens" would be presented to the user, who would be asked to determine the "missing" value. This value would then be plugged into an evaluation (literally an eval() call within Javascript, perhaps) of both sides from the equation relationship. These sides would then be compared to each other to determine, within some tolerance, whether the user was correct.

Determine Set Membership

We could also use set memberships to generate "multiple choice"-style questions. Given a particular assertion, several values would be presented, some subset of which would be true for the given assertion. For example:

The following are base SI units:
- [ ] Milliliters
- [ ] Pascal
- [ ] Yards
- [ ] Metric Tons
Enter fullscreen mode Exit fullscreen mode

Given a set relationship, we would draw upon multiple "concepts" (some of which would be directly related by membership, others of which would not be). If only one is directly related, we would have a "check one"-style question instead, which is considerably easier.

obligatory break for memes

Statement is True?

The last kind of question type would be a combination of both. "Given" a certain set of values, an assertion is made, and the user must determine whether or not that assertion is true. For example, "will this projectile land within X yards of a target Y yards away given certain kinematic parameters?" could be considered. In this case, the algorithm generates a problem for which it knows the answer, but presents a value (which may or may not be correct), including true/false assertions, that the user must consider.

Units Examples

Let's consider the "Units" topic area. Let us assume we have "drawn" a relationship from our table. In this case, it is a "hierarchy" relationship that expresses a conversion from a unit type to the corresponding base SI unit.

// 'draw' relationship
let relationship = {
    "id": "milliliters_to_liters",
    "type": "hierarchy",
    "to": "liters",
    "from": "milliliters",
    "factor": 1e-3,
    "prefix": "milli",
};
Enter fullscreen mode Exit fullscreen mode

Given the "from" and "to" references to specific topics, we would then query the specifics of those "nodes".

let to = {
    "id": "L",
    "name": "liters",
    "unit": "L",
    "type": "unit"
};
let from = {
    "id": "mL",
    "name": "milliliters",
    "unit": "mL",
    "type": "unit"
};
Enter fullscreen mode Exit fullscreen mode

We seek to generate a problem statement, but must first determine the values for the other degrees of freedom. Let's assume this is somewhere between 0.1 and 10.0, which is a reasonable range of values for most measurements. (We ignore negative values here, but there are concepts--like speed--that we could model to consider negative values.)

// given measurement x in units X, convert to measurement y given units Y
let x = Math.pow(10, Math.random() * 2 - 1); // random draw between 0.1, 10.0
let X = from.id;
let Y = to.id;
Enter fullscreen mode Exit fullscreen mode

Now we could construct the problem statement and the appropriate solution. This would likely live in a more complicated set of switch-case statements that would consider what kind of relationship is being used to generate the problem.

// construct problem & solution
let problem_statement = `Given measurement ${x} [${X}], what is the equivalent measurement in [${Y}]?`;
let solution_value = x * relationship.factor;
Enter fullscreen mode Exit fullscreen mode

Lastly, we would "present" the problem statement and evaluate a result "given by the user" (use your imagination!). Note in this case we specifically look at a relative error tolerance to determine if the user's answer was correct.

// 'render' problem
console.log(problem_statement);

// 'evaluate' result
let result = 0.1; // pretend this came from user
let error = Math.abs(solution_value - result) / solution_value;
if (error < 1e-2) {
    console.log("RIGHT!");
} else {
    console.log(`WRONG! The correct answer is ${solution_value}.`);
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

I won't re-generate this kind of video for each topic area of engineering we explore. I will, however, try and keep the "database" of knowledge graphs somewhat up to limited date with the content we cover. At the end of this series, this will hopefully provide you with some degree of training and verification of the engineering knowledge you are consuming!

Top comments (4)

Collapse
 
ruirepiji profile image
Rui Repiji

I'm not sure what recursion or recursive expansion is supposed to mean here?

Collapse
 
tythos profile image
Brian Kirkpatrick

So, think about the equations P=F/A and PV=nRT. If you intially "draw" the equation P=F/A, perhaps you determine the user should solve for A and you draw a random numerical value for F. You could also draw a random numerical value for P, or--given that it exists within other relationships as well--you could "recurse" to "expand" that variable by defining it through the other relationship too. In that case, you would provide values for V, n, R (well, it's a constant), and T, expecting the user to link the two relationships. To find the right answer, then, the user would need to solve for P from the ideal gas equation, then substitute into the first equation to finally determine A.

Basically, it's a choice when determining how you want to narrow down the degrees of a freedom for a problem. Either you draw a specific numerical value, or you "expand" that degree of freedom using another relationship. You could potentially "expand" in this manner through several steps, in order to teach the user to associate key concepts (and their physical phenomena) between different relationships.

Collapse
 
xephun profile image
X. Ephun

What about something like github.com/badass-courses/course-b... ? The theory here is interesting but you practically started this "quizzer" thing from scratch and there's a lot of great open source ed-tech resources out there.

Collapse
 
ebcefeti profile image
E. B. Cefeti

Huh. Interesting. Will you be following up with more specific details for topic areas?