loading...

Daily Challenge #33 - Did you mean...?

thepracticaldev profile image dev.to staff ・1 min read

In this challenge, you'll mimic Google's "Did you mean...?" feature. When using Google's search engine, the user will receive spelling corrections and suggestions to aid their search. From a given (or self-made) dictionary, write a function that will accept a string and return the most similar string from the dictionary.

To complete this challenge, you'll have an entered term (lowercase string) and an array of known words (also lowercase strings). Your task is to find out which word from the dictionary is most similar to the input term. Their similarity is defined by the minimum number of letters that must be added, replaced, or removed to get from the entered word to one of the dictionary words. The lower the number of required changes is, the more similar the words are.

Words that are the same are obviously the most similar. A word that needs one letter to be changed is more similar to another word that needs 2 (or more) letters to be changed.

Use the arrays below, combine them, or make your own dictionaries.

Array 1 + Example
['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
fruits.findMostSimilar('strawbery'); // must return "strawberry"
fruits.findMostSimilar('berry'); // must return "cherry"

Array 2
['stars', 'mars', 'wars', 'codec', 'codewars']

Array 3
['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

Good luck!


This challenge comes from user BattleRattle. Thank you to CodeWars, who has licensed redistribution of this challenge under the 2-Clause BSD License!

Want to propose a challenge for a future post? Email yo+challenge@dev.to with your suggestions!

Posted on by:

thepracticaldev profile

dev.to staff

@thepracticaldev

The hardworking team behind dev.to ❤️

Discussion

pic
Editor guide
 

Implemented the algorithm in the Wikipedia article for Levenshtein distance in JavaScript:

/**
 * Find the Levenshtein distance between two strings
 * (see https://en.wikipedia.org/wiki/Levenshtein_distance)
 * @param {String} a  First string to compare
 * @param {String} b  Second string to compare
 */
function lev(a, b) {
  const _step = (i, j) => (
    Math.min(i, j) === 0 ? Math.max(i, j) :
    Math.min(
      _step(i-1, j) + 1,
      _step(i, j-1) + 1,
      _step(i-1, j-1) + (a[i-1] === b[j-1] ? 0 : 1)
    )
  )
  return _step(a.length, b.length)
}

/**
 * Find the most similar word in the dictionary to the provided word
 * @param {String[]} dict   Array of strings to use as dictionary
 * @param {String} word     Word to correct
 */
function findMostSimilar (dict, word) {
  const [, closest] = dict.reduce(([min, best], next) => {
    const dist = lev(word, next)
    return dist < min ? [dist, next] : [min, best]
  }, [Infinity, ''])
  return closest
}

EDIT

Forgot to add testing code, so here's some test cases. (The test runner assumes that it's being run in the web console, by the way.)

// Tests
const a1 = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
const a2 = ['stars', 'mars', 'wars', 'codec', 'codewars']
const a3 = ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

const testCases = [
  [a1, 'strawbery', 'strawberry'],
  [a1, 'berry', 'cherry'],
  [a2, 'code', 'codec'],
  [a2, 'starwars', 'stars'],
  [a3, 'script', 'javascript'],
  [a3, 'cofcript', 'coffeescript']
]

let caseNum = 1
for (const [dict, input, expected] of testCases) {
  const output = findMostSimilar(dict, input)
  const [status, style] = (
    output === expected
      ? ['', 'color: green']
      : ['', 'color: red']
  )
  console.log(`${caseNum++}: ${input} should match ${expected} %c${status}`, style)
  if (output !== expected) {
    console.log(`    returned ${output} instead`)
  }
}

And here's the output when I run it in Chrome:

Screenshot of Chrome console output from above testing code showing that all tests pass

 

Here it goes a python one :

I compare by the distances of each char and give less importance to length

fruits = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']

def find_most_similar(word):
    counts = []
    for fruit in fruits:
        count = 0
        if word == fruit:
            return fruit
        else:
            for i in range(min(len(word), len(fruit))):
                count += abs(ord(word[i]) - ord(fruit[i])) + max(0, len(fruit)-len(word))
        counts.append(count)
    min_count = min(counts)
    return fruits[counts.index(min_count)]
 

plzz explain ur code to me ...

 

It's not working 100% but for most basic cases it does, first of all I check if the word passed is inside the fruits list, if not i iterate through each character of the shortest word and I subtract with abs (so that the result is always positive) the value of the characters with the Ord function, then I add the difference of word lengths.

It is kind of a own metric that is not very thinked but it worked to me for these simple tests.

Finally I select as a result the word containing the less value of the metrics I calculated.

After completing the challenge this way I read about levenstein function, it may be a better way to do this.

 
let findClosest = (str, arr) => {
    let maxSimilarity = 0;
    let tempWord = '';
    for (let p of arr) {
        let count = 0;
        p.split('').forEach(x => {
            if (str.indexOf(x) > -1)
                count++;
        });
        if (count > maxSimilarity) {
            maxSimilarity = count;
            tempWord = p;
        }
    }
    return tempWord;
}

let result = findClosest('hp', ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']);
console.log(result);
 
(define (lev a b)
  (define (lev_ i j)
    (cond
      ((zero? (min i j))
       (max i j))
      (else
       (min (+ 1 (lev_ (- i 1) j))
            (+ 1 (lev_ i (- j 1)))
            (+ (if (char=? (string-ref a i) (string-ref b j))
                   0
                   1)
               (lev_ (- i 1) (- j 1)))))))
  (lev_ (- (string-length a) 1) (- (string-length b) 1)))

(define (mostSimilar arr word)
  (define (mostSimilar_ arr acc_word acc_lev)
    (cond
      ((null? arr) acc_word)
      (else
       (let ((new_lev (lev (car arr) word)))
         (if (> acc_lev new_lev)
           (mostSimilar_ (cdr arr) (car arr) new_lev)
           (mostSimilar_ (cdr arr) acc_word acc_lev))))))
  (mostSimilar_ (cdr arr) (car arr) (lev (car arr) word)))

(format #t "Result: ~a~%"
  (mostSimilar
    (list "strawberry" "bean" "banana" "cherry")
    "berry"))

Sorry I typed on my phone.

Will optimise The levenstein function asap. Right now Tail call optimization doesn’t apply.

 

Elixir:

defmodule Closest do
  def find(candidates, word) do
    Enum.min_by(
      candidates,
      &(&1
        |> String.myers_difference(word)
        |> Enum.count())
    )
  end
end

And yes, String.myers_difference/2 is built-in :)

 

Haskell:

lev :: (Eq a) => [a] -> [a] -> Int
lev a b = let memoed = [ [ lev' levmemo i' j' | j' <- [0..] ] | i' <- [0..] ]
              levmemo i j = memoed !! i !! j
          in levmemo (length a) (length b)
          where lev' f i j
                  | min i j == 0 = max i j
                  | otherwise = let n = f (i-1) j + 1
                                    m = f i (j-1) + 1
                                    q = f (i-1) (j-1) + if (a!!(i-1)) == (b!!(j-1)) then 0 else 1
                                in min n $ min m q

mostSimilar :: (Eq a) => [[a]] -> [a] -> [a]
mostSimilar list word = fst $ foldl1 smallestDistance $ zip list $ map (lev word) $ list
              where smallestDistance a@(minel, mindst) b@(el, dst) =
                      if dst < mindst then b
                      else a

This one was interesting for me: I'm pretty new to haskell and this is my first time playing around with memoization (without memoization, the lev function is absurdly slow).

The lev function is a recursive implementation of the Levenshtein distance from the Wikipedia article