DEV Community

Tim McNamara
Tim McNamara

Posted on

Two trie implementations in Rust (one's super fast)

A trie is a data structure that enables very fast lookups, while taking up quite a small amount of space. Here's an implementation in Rust:

use std::collections::HashMap;

#[derive(Default, Debug)]
struct TrieNode {
    is_end_of_word: bool,
    children: HashMap<char, TrieNode>,
}

#[derive(Default, Debug)]
pub struct Trie {
    root: TrieNode,
}

impl Trie {
    pub fn new() -> Self {
        Trie {
            root: TrieNode::default(),
        }
    }

    pub fn insert(&mut self, word: &str) {
        let mut current_node = &mut self.root;

        for c in word.chars() {
            current_node = current_node.children.entry(c).or_default();
        }
        current_node.is_end_of_word = true;
    }

    pub fn contains(&self, word: &str) -> bool {
        let mut current_node = &self.root;

        for c in word.chars() {
            match current_node.children.get(&c) {
                Some(node) => current_node = node,
                None => return false,
            }
        }

        current_node.is_end_of_word
    }
}

fn main() {
    let mut trie = Trie::new();
    trie.insert("hello");
    trie.insert("hi");
    trie.insert("hey");
    trie.insert("world");

    println!("{trie:#?}");

    println!("hiiii? {}", trie.contains("hiiii"));
}
Enter fullscreen mode Exit fullscreen mode

You can gain some extra speed by replacing the standard library's hash map implementation with fxhash. To do so, replace the first few lines with this:

use std::collections::HashMap;
use fxhash::FxBuildHasher;

type FxHashMap<K, V> = HashMap<K, V, FxBuildHasher>;

#[derive(Default)]
struct TrieNode {
    is_end_of_word: bool,
    children: FxHashMap<char, TrieNode>,
}
Enter fullscreen mode Exit fullscreen mode

What's happening?

The standard library's std::collections::HashMap uses a hash function -- a function which takes an input and morphs it into an output of a fixed size -- that's designed to be cryptographically secure and general purpose at the expense of speed. Because we know that our keys will be the char type, which takes up 4 bytes, we can use a hash function that's more suited for short keys. (By the way, in case you were wondering, it probably still makes sense to use a hash function for char values, even though we could just use the bits' values directly as they're all a fixed sized. Because human language tends to use a very narrow spectrum of bit patterns within a char, and hash tables like keys to be well spread out, the hash function can spread the bits out for what is hopefully a net performance win.)

Top comments (1)

Collapse
 
noracodes profile image
Leonora Tindall

Nice example, Tim! I personally love tries, possibly because a trie was the first "cool" datastructure I got to apply to a real job.

I tried this myself, with a different main() function that includes the words and cracklib-small word files from my Ubuntu install. These are ~1MB and ~500KB respectively, and while words contains only English words, cracklib-small contains many non-English tokens like zlzfrh and omits many real words.

fn main() {
    let words = include_str!("/usr/share/dict/words").split_whitespace();
    let cracklib = include_str!("/usr/share/dict/cracklib-small").split_whitespace();
    let mut trie = Trie::new();
    for word in words {
        trie.insert(word);
    }
    let (mut hits, mut misses) = (0,0);
    for word in cracklib {
        if trie.contains(word) { hits += 1 } else { misses += 1 };
    }
    println!("Hits {}, Misses {}", hits, misses);
}
Enter fullscreen mode Exit fullscreen mode

This program uses your Trie implementation to check how many of the cracklib-small tokens are in the English words file. (It's about 75%.)

I ran this through hyperfine, a really excellent Rust benchmarking program, and on my T480, the default HashMap implementation takes about 92.7 ms to insert and check all 1.5MB of data, while the FxHashMap implementation only takes about 70.5 ms.

Neat stuff, thank you for sharing!