I recently came across a requirement in one of the projects where we need to implement fuzzy string match for the user's name field. Now if you do not what a fuzzy match is, you can google to learn in details, but in short it is a technique in computer science of finding strings that match a pattern approximately (rather than exactly).
For example, if we take names of two people to be Cathryn and Kathryn or Gaurav and Gourav (Indian names), while the exact spelling is different but the sound of two names are similar (or same in some cases). This is an example of phonetic based fuzzy string matching. There are many different algorithms. Read here
For practical purposes, you may come across a record dataset with people names where you might have to find people with names matching a given sound. In ours case, where I was integrating India health stack ABDM APIs within our application at Medblocks, we need to mark names with similar phonetic fuzzy match as same or correct, so if one searched for Gourav in the dataset, the results would also contains records with names such as Gaurav, Gauruv or anything that sounds similar to Gourav. This is particularly useful in scenarios where you might spell (or type) a person's name differently than what he/she has actually in his/her personal ID documents because names are not dictionary objects.
After some research, I found couple of open source libraries in python and javascript for this task and finally decided to go with Talisman. NPM package is here.
Here is a quick demo on how to use this library to implement your fuzzy string match requirements. I've only used two phonetic algorithms fuzzy-soundex
and daitch-mokotoff
in the demonstration here but in a real use-case, you might use different and more number of algorithms depending on why do you need a fuzzy string match.
Top comments (0)