What's fuzzywuzzy?
It's a string matching module. A string is variable that can store (and modify) text. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.
Fuzzy string matching like a boss.
How to get started? First you should know Python programming.
pip install fuzzywuzzy
Then you can use it like this:
#!/usr/bin/python3
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
r = fuzz.ratio("this is a test", "this is a test!")
print(r)
r = fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
print(r)
This outputs the ratio:
97
91
You can run this from the interpreter:
>>> fuzz.ratio("this is a test", "this is a test!")
97
Another example of fuzzywuzzy:
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio("this is a test","a test this is")
50
Related links:
Top comments (1)
I would suggest to also use word embeddings when doing name matching, for example fuzzywuzzy fails when comparing Bill and William. Robert and Bob. When using word embeddings you vectors will be closer together. Welcome to NLP