Fuzzy Logic, Fuzzy Matching, and Why I Haven’t Used Either (Yet!)

#fuzzylogic #fuzzystringmatching #beginners #ruby

1: Measuring the Distance Between Words

Imagine you could measure the distance between two words by changing one letter at a time and finding the fewest changes needed to transform one into the other. Imagine each change pushes the words one foot farther apart, and you could measure that distance with a tape measure: it takes 3 changes minimum to transform Honda into Hyundai, so imagine those words are three feet apart. Now, imagine you can do that not just words, but any string of characters—imagine every string you can think of, all at once, all arranged in front of you, each one the perfect distance from all the others.

That distance between two strings is called Edit Distance, and it’s a real thing.

Okay, Edit Distance is really cool and I’m really really excited to dive into it—but we need to put it in context. Edit Distance is a real mathematical measurement (and it conforms to some geometric laws as well), and it’s used in many common technologies like search engines and spellcheckers. But the real power in Edit Distance is that it enables computers to have flexibility when dealing with user input. In short, Edit Distance gives computers a way to guess and make assumptions about what users meant to input. More broadly, it’s an illustration of something called “fuzzy logic”—a way of evaluating TRUE and FALSE not just in terms of 0 and 1, but also with numbers between 0.0 and 1.0.

2. Boolean Logic vs. Fuzzy Logic

Think back to when you were just starting to learn coding (like me!)—how did regular ol’ computer logic affect your perspective on the world? I’d guess that you probably tried to see things in terms of TRUE/FALSE and IF/THEN in order to practice translating the real world into code. How handy that every single question we could ask about anything in the world can be answered with either TRUE or FALSE, right?

Of course not! Reality is messy, and even simple questions can quickly fall into gray areas. Consider two statements: “Isa is 5’10” versus “Isa is tall”. Both of these can be answered as TRUE or FALSE. The first question can also be verified by an independent measurement, like measuring tape. (The 5’10 question is TRUE, by the way.) But let’s say we also evaluate “Isa is tall” as TRUE—how do we know, since being tall is qualitative?

The core of “fuzzy logic” is that it can describe closeness to truth or likelihood of truth. As coding students, we are used to seeing Boolean logic—TRUE/FALSE, YES/NO, IF/ELSE, 0 or 1. The statement “Isa is 5’10” is TRUE, but that’s not the only way to describe me. Fuzzy logic lets us decide if “Isa is tall” is TRUE or FALSE without knowing for sure. Instead, additional context helps define how close to TRUE that statement is—for instance, it helps to know how tall the people around me are!

Instead of 0 or 1, “Isa is tall” could return another value—like 0.7. Translated: “If you called Isa tall, it is 70% likely that would be TRUE.”

Fuzzy logic is a broad concept that has had far-reaching implications for classical logic and its related fields beyond coding. Formally coined by Dr. Lofti Zadeh in the 1960’s in relation to computer translations of natural language, fuzzy logic has allowed humans to interact with computers with more flexibility, imprecision, and colloquialism.

In other words, fuzzy logic enables humans to interact with computers in a more human way.

3. Fuzzy (String) Matching

I collided with fuzzy logic because I wanted to build flexibility into a command line interface game that I was making for fun. My Ruby Tic-Tac-Toe program asked users to input coordinates to place their marker on this board:

A screenshot of my first Tic-Tac-Toe program in Ruby!

I wanted to allow users to input any combination of coordinates—a1, A1, 1a, 1A—and have all of them be valid. So, I wrote some code that checked string length, presence for integers 1/2/3 and letters a/b/c, and made all letters lowercase. Mission accomplished!

Thanks, Gob!

However, I wasn’t using real fuzzy logic. I was essentially rewriting strings if they met certain parameters—every step required Boolean logic. Instead, this code illustrated a simplified version of a subset of fuzzy logic called “fuzzy matching”, and more specifically “fuzzy string matching.” My Tic-Tac-Toe game wasn’t looking for an exact string match, but for something close enough.

This concept came up again in my first Flatiron School coding project—I was working on another command line interface that involved lots of menus, and I wanted users to be able to enter something besides a number to choose a menu option:

My command line interface Ruby project for Flatiron School.

So, I built a Ruby class “Thesaurus” with a bunch of hard-coded synonyms contained in arrays, and linked those to the menu options via conditionals such as if user_input == “1” || synonyms1_array.include?(user_input):

Okay, yeah, this wasn't my proudest moment of writing flexible code...

It worked, but once again I was not using real fuzzy logic—I had just built more complex conditionals, which still ultimately returned TRUE or FALSE. Instead, fuzzy matching is a measurement of how close two strings are—I wouldn’t need to hard-code every permutation of the word 'search' if I had a way to measure how close my user input was to the string “search”. If only…

4. “Measurements of Closeness” and Edit Distance

…oh wait, there’s TOTALLY a way to mathematically measure how close two strings are to each other! Remember that Edit Distance we imagined at the start? It’s a weird concept, but “measurements of closeness” like Edit Distance arise from fuzzy logic because it provides a way to answer qualitative questions using quantitative methods. “How similar are two strings?” is a qualitative question, but we can approach it with hard numbers if we establish mathematical rules—for instance, focusing on changing only one individual character at a time, and counting each change as a measure of “distance.” The most foundational “measurement of closeness” between two strings is Levenshtein Distance, or Edit Distance.

Like we "imagined" at the beginning of this article, Edit Distance examines two strings, and describes the minimum number of changes needed to transform one string into another. There are some other rules too: each “change” affects one character at a time, and characters can only be added, substituted, or deleted:

A screenshot from Nikhil Babar's excellent introduction to Levenshtein Distance on www.cuelogic.com.

Conveniently, there’s an algorithm to determine this minimum number of changes called the Wagner-Fischer Algorithm—check out Nikhil Babar's excellent introduction on running the algorithm yourself to find the Edit Distance between any two strings here.

But here’s the gist: the “distance” between the words Honda and Hyundai is 3, because that’s the fewest number of single-character changes needed to transform one into the other.

In application, Edit Distance is an essential part of tools we use everyday. Search engines use Edit Distance to make suggested searches and correct typos. Your phone’s auto-correct uses Edit Distance to predict the words you’re trying to type. Spellcheckers use Edit Distance to determine if spellings are wrong and suggest correct ones. When Ruby errors are asking you “Did you mean…”, they’re using Edit Distance to guess if you made a typo in a method or object name. All of these are examples of fuzzy string matching—in other words, real fuzzy logic.

5. Why Haven’t I Used Real Fuzzy Logic Yet? (Next Steps and Conclusion)

So, why am I writing about fuzzy logic when I haven’t even used it hands-on in my own code? Short answer: it’s really complicated, but also really really cool! I was lucky to have people around me point out that my code was approximating fuzzy matching because that enabled me to research it further—and once I did, I realized that my string manipulations and thesaurus code were only crude approximations of fuzzy logic’s depth and flexibility.

In addition to the challenge of modeling newfound concepts like Levenshtein Distance in code (which others have already done in lots of language here), parameters and thresholds need to be set to sort fuzzy data into “close enough to be TRUE” or “not close enough so FALSE”—I played around with the complexity of this sorting through this fuse.js tool to that illustrates fuzzy string matches in javascript.

Thus, my next steps are to dive into some source code that uses fuzzy logic and fuzzy matching so I can learn from it. Several places that people have pointed me to are:

Source code for Atom - auto-complete suggestions for code/syntax in many languages
Source code for Ruby - providing “Did you mean…” suggestions for typos in method/object names during errors
Ruby gems Searchkick and Amatch, and Ruby library “fuzzy-string-match” - all three are examples of Ruby code that implements fuzzy matching using different metrics and algorithms

I look forward to returning to this topic with more hands-on experience, and a greater understanding of the broader implications of fuzzy logic!