dev.to staff

Posted on Aug 5, 2019

Daily Challenge #33 - Did you mean...?

#challenge

In this challenge, you'll mimic Google's "Did you mean...?" feature. When using Google's search engine, the user will receive spelling corrections and suggestions to aid their search. From a given (or self-made) dictionary, write a function that will accept a string and return the most similar string from the dictionary.

To complete this challenge, you'll have an entered term (lowercase string) and an array of known words (also lowercase strings). Your task is to find out which word from the dictionary is most similar to the input term. Their similarity is defined by the minimum number of letters that must be added, replaced, or removed to get from the entered word to one of the dictionary words. The lower the number of required changes is, the more similar the words are.

Words that are the same are obviously the most similar. A word that needs one letter to be changed is more similar to another word that needs 2 (or more) letters to be changed.

Use the arrays below, combine them, or make your own dictionaries.

Array 1 + Example
['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
fruits.findMostSimilar('strawbery'); // must return "strawberry"
fruits.findMostSimilar('berry'); // must return "cherry"

Array 2
['stars', 'mars', 'wars', 'codec', 'codewars']

Array 3
['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

Good luck!

This challenge comes from user BattleRattle. Thank you to CodeWars, who has licensed redistribution of this challenge under the 2-Clause BSD License!

Want to propose a challenge for a future post? Email yo+challenge@dev.to with your suggestions!

Top comments (9)

Ken Bellows • Aug 5 '19 • Edited

Implemented the algorithm in the Wikipedia article for Levenshtein distance in JavaScript:

/**
 * Find the Levenshtein distance between two strings
 * (see https://en.wikipedia.org/wiki/Levenshtein_distance)
 * @param {String} a  First string to compare
 * @param {String} b  Second string to compare
 */
function lev(a, b) {
  const _step = (i, j) => (
    Math.min(i, j) === 0 ? Math.max(i, j) :
    Math.min(
      _step(i-1, j) + 1,
      _step(i, j-1) + 1,
      _step(i-1, j-1) + (a[i-1] === b[j-1] ? 0 : 1)
    )
  )
  return _step(a.length, b.length)
}

/**
 * Find the most similar word in the dictionary to the provided word
 * @param {String[]} dict   Array of strings to use as dictionary
 * @param {String} word     Word to correct
 */
function findMostSimilar (dict, word) {
  const [, closest] = dict.reduce(([min, best], next) => {
    const dist = lev(word, next)
    return dist < min ? [dist, next] : [min, best]
  }, [Infinity, ''])
  return closest
}

EDIT

Forgot to add testing code, so here's some test cases. (The test runner assumes that it's being run in the web console, by the way.)

// Tests
const a1 = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
const a2 = ['stars', 'mars', 'wars', 'codec', 'codewars']
const a3 = ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

const testCases = [
  [a1, 'strawbery', 'strawberry'],
  [a1, 'berry', 'cherry'],
  [a2, 'code', 'codec'],
  [a2, 'starwars', 'stars'],
  [a3, 'script', 'javascript'],
  [a3, 'cofcript', 'coffeescript']
]

let caseNum = 1
for (const [dict, input, expected] of testCases) {
  const output = findMostSimilar(dict, input)
  const [status, style] = (
    output === expected
      ? ['✔', 'color: green']
      : ['❌', 'color: red']
  )
  console.log(`${caseNum++}: ${input} should match ${expected} %c${status}`, style)
  if (output !== expected) {
    console.log(`    returned ${output} instead`)
  }
}

And here's the output when I run it in Chrome:

Héctor Pascual • Aug 6 '19

Here it goes a python one :

I compare by the distances of each char and give less importance to length

fruits = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']

def find_most_similar(word):
    counts = []
    for fruit in fruits:
        count = 0
        if word == fruit:
            return fruit
        else:
            for i in range(min(len(word), len(fruit))):
                count += abs(ord(word[i]) - ord(fruit[i])) + max(0, len(fruit)-len(word))
        counts.append(count)
    min_count = min(counts)
    return fruits[counts.index(min_count)]

Dev Prakash • Aug 12 '19

plzz explain ur code to me ...

Héctor Pascual • Aug 12 '19 • Edited

It's not working 100% but for most basic cases it does, first of all I check if the word passed is inside the fruits list, if not i iterate through each character of the shortest word and I subtract with abs (so that the result is always positive) the value of the characters with the Ord function, then I add the difference of word lengths.

It is kind of a own metric that is not very thinked but it worked to me for these simple tests.

Finally I select as a result the word containing the less value of the metrics I calculated.

After completing the challenge this way I read about levenstein function, it may be a better way to do this.

Meysam Faghfouri • Aug 6 '19 • Edited

let findClosest = (str, arr) => {
    let maxSimilarity = 0;
    let tempWord = '';
    for (let p of arr) {
        let count = 0;
        p.split('').forEach(x => {
            if (str.indexOf(x) > -1)
                count++;
        });
        if (count > maxSimilarity) {
            maxSimilarity = count;
            tempWord = p;
        }
    }
    return tempWord;
}

let result = findClosest('hp', ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']);
console.log(result);

Tanguy Andreani • Aug 5 '19 • Edited

(define (lev a b)
  (define (lev_ i j)
    (cond
      ((zero? (min i j))
       (max i j))
      (else
       (min (+ 1 (lev_ (- i 1) j))
            (+ 1 (lev_ i (- j 1)))
            (+ (if (char=? (string-ref a i) (string-ref b j))
                   0
                   1)
               (lev_ (- i 1) (- j 1)))))))
  (lev_ (- (string-length a) 1) (- (string-length b) 1)))

(define (mostSimilar arr word)
  (define (mostSimilar_ arr acc_word acc_lev)
    (cond
      ((null? arr) acc_word)
      (else
       (let ((new_lev (lev (car arr) word)))
         (if (> acc_lev new_lev)
           (mostSimilar_ (cdr arr) (car arr) new_lev)
           (mostSimilar_ (cdr arr) acc_word acc_lev))))))
  (mostSimilar_ (cdr arr) (car arr) (lev (car arr) word)))

(format #t "Result: ~a~%"
  (mostSimilar
    (list "strawberry" "bean" "banana" "cherry")
    "berry"))

Sorry I typed on my phone.

Will optimise The levenstein function asap. Right now Tail call optimization doesn’t apply.

Craig McIlwrath • Aug 21 '19

Haskell:

lev :: (Eq a) => [a] -> [a] -> Int
lev a b = let memoed = [ [ lev' levmemo i' j' | j' <- [0..] ] | i' <- [0..] ]
              levmemo i j = memoed !! i !! j
          in levmemo (length a) (length b)
          where lev' f i j
                  | min i j == 0 = max i j
                  | otherwise = let n = f (i-1) j + 1
                                    m = f i (j-1) + 1
                                    q = f (i-1) (j-1) + if (a!!(i-1)) == (b!!(j-1)) then 0 else 1
                                in min n $ min m q

mostSimilar :: (Eq a) => [[a]] -> [a] -> [a]
mostSimilar list word = fst $ foldl1 smallestDistance $ zip list $ map (lev word) $ list
              where smallestDistance a@(minel, mindst) b@(el, dst) =
                      if dst < mindst then b
                      else a

This one was interesting for me: I'm pretty new to haskell and this is my first time playing around with memoization (without memoization, the lev function is absurdly slow).

The lev function is a recursive implementation of the Levenshtein distance from the Wikipedia article

Oleksii Filonenko • Aug 7 '19 • Edited

Elixir:

defmodule Closest do
  def find(candidates, word) do
    Enum.min_by(
      candidates,
      &(&1
        |> String.myers_difference(word)
        |> Enum.count())
    )
  end
end

And yes, String.myers_difference/2 is built-in :)

hazelowen • Mar 18 '21

There's no validation of the input to decode. Also, I'm not sure if the second test provided for decode is correct. I'm not getting any English words out of it, but my decode function hasn't failed besides that test.
The lev function is a recursive implementation of the Levenshtein distance from the Wikipedia article happy wheels

DEV Community

Daily Challenge #33 - Did you mean...?

Top comments (9)

Read next

Using Data to Drive Mental Health Outcomes

Rust: Controlling the Flow

Mastering Git Stash: A Handy Tool for Managing Uncommitted Changes

3. How to setup Jest in a Next 15 project (+ eslint for testing)