DEV Community

loading...

Daily Challenge #33 - Did you mean...?

dev.to staff
The hardworking team behind dev.to ❤️
・1 min read

In this challenge, you'll mimic Google's "Did you mean...?" feature. When using Google's search engine, the user will receive spelling corrections and suggestions to aid their search. From a given (or self-made) dictionary, write a function that will accept a string and return the most similar string from the dictionary.

To complete this challenge, you'll have an entered term (lowercase string) and an array of known words (also lowercase strings). Your task is to find out which word from the dictionary is most similar to the input term. Their similarity is defined by the minimum number of letters that must be added, replaced, or removed to get from the entered word to one of the dictionary words. The lower the number of required changes is, the more similar the words are.

Words that are the same are obviously the most similar. A word that needs one letter to be changed is more similar to another word that needs 2 (or more) letters to be changed.

Use the arrays below, combine them, or make your own dictionaries.

Array 1 + Example
['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
fruits.findMostSimilar('strawbery'); // must return "strawberry"
fruits.findMostSimilar('berry'); // must return "cherry"

Array 2
['stars', 'mars', 'wars', 'codec', 'codewars']

Array 3
['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

Good luck!


This challenge comes from user BattleRattle. Thank you to CodeWars, who has licensed redistribution of this challenge under the 2-Clause BSD License!

Want to propose a challenge for a future post? Email yo+challenge@dev.to with your suggestions!

Discussion (9)

Collapse
kenbellows profile image
Ken Bellows • Edited

Implemented the algorithm in the Wikipedia article for Levenshtein distance in JavaScript:

/**
 * Find the Levenshtein distance between two strings
 * (see https://en.wikipedia.org/wiki/Levenshtein_distance)
 * @param {String} a  First string to compare
 * @param {String} b  Second string to compare
 */
function lev(a, b) {
  const _step = (i, j) => (
    Math.min(i, j) === 0 ? Math.max(i, j) :
    Math.min(
      _step(i-1, j) + 1,
      _step(i, j-1) + 1,
      _step(i-1, j-1) + (a[i-1] === b[j-1] ? 0 : 1)
    )
  )
  return _step(a.length, b.length)
}

/**
 * Find the most similar word in the dictionary to the provided word
 * @param {String[]} dict   Array of strings to use as dictionary
 * @param {String} word     Word to correct
 */
function findMostSimilar (dict, word) {
  const [, closest] = dict.reduce(([min, best], next) => {
    const dist = lev(word, next)
    return dist < min ? [dist, next] : [min, best]
  }, [Infinity, ''])
  return closest
}

EDIT

Forgot to add testing code, so here's some test cases. (The test runner assumes that it's being run in the web console, by the way.)

// Tests
const a1 = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']
const a2 = ['stars', 'mars', 'wars', 'codec', 'codewars']
const a3 = ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']

const testCases = [
  [a1, 'strawbery', 'strawberry'],
  [a1, 'berry', 'cherry'],
  [a2, 'code', 'codec'],
  [a2, 'starwars', 'stars'],
  [a3, 'script', 'javascript'],
  [a3, 'cofcript', 'coffeescript']
]

let caseNum = 1
for (const [dict, input, expected] of testCases) {
  const output = findMostSimilar(dict, input)
  const [status, style] = (
    output === expected
      ? ['', 'color: green']
      : ['', 'color: red']
  )
  console.log(`${caseNum++}: ${input} should match ${expected} %c${status}`, style)
  if (output !== expected) {
    console.log(`    returned ${output} instead`)
  }
}

And here's the output when I run it in Chrome:

Screenshot of Chrome console output from above testing code showing that all tests pass

Collapse
hectorpascual profile image
Héctor Pascual

Here it goes a python one :

I compare by the distances of each char and give less importance to length

fruits = ['cherry', 'pineapple', 'melon', 'strawberry', 'raspberry']

def find_most_similar(word):
    counts = []
    for fruit in fruits:
        count = 0
        if word == fruit:
            return fruit
        else:
            for i in range(min(len(word), len(fruit))):
                count += abs(ord(word[i]) - ord(fruit[i])) + max(0, len(fruit)-len(word))
        counts.append(count)
    min_count = min(counts)
    return fruits[counts.index(min_count)]
Collapse
devparkk profile image
Dev Prakash

plzz explain ur code to me ...

Collapse
hectorpascual profile image

It's not working 100% but for most basic cases it does, first of all I check if the word passed is inside the fruits list, if not i iterate through each character of the shortest word and I subtract with abs (so that the result is always positive) the value of the characters with the Ord function, then I add the difference of word lengths.

It is kind of a own metric that is not very thinked but it worked to me for these simple tests.

Finally I select as a result the word containing the less value of the metrics I calculated.

After completing the challenge this way I read about levenstein function, it may be a better way to do this.

Collapse
mfaghfoory profile image
Meysam Faghfouri • Edited
let findClosest = (str, arr) => {
    let maxSimilarity = 0;
    let tempWord = '';
    for (let p of arr) {
        let count = 0;
        p.split('').forEach(x => {
            if (str.indexOf(x) > -1)
                count++;
        });
        if (count > maxSimilarity) {
            maxSimilarity = count;
            tempWord = p;
        }
    }
    return tempWord;
}

let result = findClosest('hp', ['javascript', 'java', 'ruby', 'php', 'python', 'coffeescript']);
console.log(result);
Collapse
tanguyandreani profile image
Tanguy Andreani • Edited
(define (lev a b)
  (define (lev_ i j)
    (cond
      ((zero? (min i j))
       (max i j))
      (else
       (min (+ 1 (lev_ (- i 1) j))
            (+ 1 (lev_ i (- j 1)))
            (+ (if (char=? (string-ref a i) (string-ref b j))
                   0
                   1)
               (lev_ (- i 1) (- j 1)))))))
  (lev_ (- (string-length a) 1) (- (string-length b) 1)))

(define (mostSimilar arr word)
  (define (mostSimilar_ arr acc_word acc_lev)
    (cond
      ((null? arr) acc_word)
      (else
       (let ((new_lev (lev (car arr) word)))
         (if (> acc_lev new_lev)
           (mostSimilar_ (cdr arr) (car arr) new_lev)
           (mostSimilar_ (cdr arr) acc_word acc_lev))))))
  (mostSimilar_ (cdr arr) (car arr) (lev (car arr) word)))

(format #t "Result: ~a~%"
  (mostSimilar
    (list "strawberry" "bean" "banana" "cherry")
    "berry"))

Sorry I typed on my phone.

Will optimise The levenstein function asap. Right now Tail call optimization doesn’t apply.

Collapse
craigmc08 profile image
Craig McIlwrath

Haskell:

lev :: (Eq a) => [a] -> [a] -> Int
lev a b = let memoed = [ [ lev' levmemo i' j' | j' <- [0..] ] | i' <- [0..] ]
              levmemo i j = memoed !! i !! j
          in levmemo (length a) (length b)
          where lev' f i j
                  | min i j == 0 = max i j
                  | otherwise = let n = f (i-1) j + 1
                                    m = f i (j-1) + 1
                                    q = f (i-1) (j-1) + if (a!!(i-1)) == (b!!(j-1)) then 0 else 1
                                in min n $ min m q

mostSimilar :: (Eq a) => [[a]] -> [a] -> [a]
mostSimilar list word = fst $ foldl1 smallestDistance $ zip list $ map (lev word) $ list
              where smallestDistance a@(minel, mindst) b@(el, dst) =
                      if dst < mindst then b
                      else a

This one was interesting for me: I'm pretty new to haskell and this is my first time playing around with memoization (without memoization, the lev function is absurdly slow).

The lev function is a recursive implementation of the Levenshtein distance from the Wikipedia article

Collapse
brightone profile image
Oleksii Filonenko • Edited

Elixir:

defmodule Closest do
  def find(candidates, word) do
    Enum.min_by(
      candidates,
      &(&1
        |> String.myers_difference(word)
        |> Enum.count())
    )
  end
end

And yes, String.myers_difference/2 is built-in :)

Collapse
hazelowen profile image
hazelowen

There's no validation of the input to decode. Also, I'm not sure if the second test provided for decode is correct. I'm not getting any English words out of it, but my decode function hasn't failed besides that test.
The lev function is a recursive implementation of the Levenshtein distance from the Wikipedia article happy wheels