DEV Community

Discussion on: Write a script to identify similarity from two string.

Collapse
 
jay profile image
Jay

Rust solution for similarity of strings. It uses iterative approach with dynamic programming fashion. It creates a matrix of values and the bottom last cell has the number of changes needed to make the 2 strings similar.

fn similarity(s1: &str, s2: &str) -> i32 {
    let s: Vec<char> = String::from(s1).chars().collect();
    let t: Vec<char> = s2.to_string().chars().collect();

    let m = s1.len();
    let n = s2.len();

    // If any of the strings is empty the Levenshtein distance equal to the length of the other string.
    if m == 0 {
        return n as i32;
    }

    if n == 0 {
        return m as i32;
    }

    // Create mutable matrix of size (m+1, n+1);
    let mut d = vec![vec![0i32; n + 1]; m + 1];

    for i in 1..=m {
        d[i][0] = i as i32
    }

    for j in 1..=n {
        d[0][j] = j as i32
    }

    for j in 1..=n {
        for i in 1..=m {
            let mut cost = 0;
            // Check substitutuion cost.
            if s[i - 1] != t[j - 1] {
                cost = 1;
            }
            d[i][j] = min(
                d[i - 1][j] + 1,        // Deletion
                d[i][j - 1] + 1,        // Insertion
                d[i - 1][j - 1] + cost, // Substituition
            );
        }
    }

    d[m][n]
}

// Custom min() to get minimum of 3 numbers
fn min(a: i32, b: i32, c: i32) -> i32 {
    std::cmp::min(a, std::cmp::min(b, c))
}

// Usage
fn main() {
    println!("{}", similarity(&"kitten", &"sitting"));    // 3
}

I used the explanation from Wikipedia here.
The dynamic part work by first starting with prefix of both strings (Single characters at first) and them working it's way onwards to the complete string. It checks the similarity and computer the distance between the sub-strings, by checking if insertion, deletion, or substitution help to make the sub-strings similar. The last computer value will be the value of the differences of both the strings.

If the name of the algo was not given in the challenge, I would have gone with LCS (due to familiarity with the algo) and it would be good enough, as it works on the same principle but does not take substitutions into account.
I loved working on this, and it introduced me to a new algo; but please add some examples (test cases). A little more explanation using the examples would help a lot. The question just feels like try and implement this algo.
Some test cases I used:

#[test]
fn test_similarity() {
    assert_eq!(1, similarity(&"son", &"song"));
    assert_eq!(3, similarity(&"kitten", &"sitting"));
    assert_eq!(5, similarity(&"", &"abcde"));
    assert_eq!(2, similarity(&"flaw", &"lawn"));
    assert_eq!(0, similarity(&"rust", &"rust"));
}