Using data science to solve a real life auto complete issue
Problem background
In one of the applications I maintain, we had to create an auto complete with 20,000 options.
There is no typo, 20,000 is the real number.
So I looked up how to do an auto complete and I implemented it with a datalist
, something that looked roughly
like this.
<input list="datalist" name="datalist">
<datalist id="datalist">
<option>1</option>
<option>2</option>
...
...
<option>19,999</option>
<option>20,000</option>
</datalist>
The problem was, that this is VERY SLOW, each time a user inserted a character, it will trigger an iteration on
the entire 20,000 options and check for each one of them which is slow, especially since I actually only need to
compare the prefix and datalist
compares if the input string is contained in the options and not if they start with it.
For example:
<option>dog food</option>
<option>cat food</option>
<option>dogs and cats</option>
<option>cats and dogs</option>
In this case, inserting dog
will match all but cat food
while the user is interested only in options starting with dog
.
This is an O(n)
time complexity and it wasn't good enough.
Trie Tree
Trying to find a solution, I was thinking to myself, auto complete need doesn't care about any string that isn't
starting with the current input. Hence if I'll restructure my data in the form of a tree I'll be able to start
my check from the relevant place and not iterate over all options each time.
As it turns out, I didn't need to reinvent the wheel, people already thought about it, created it long before me.
This type of structure is called a trie tree.
A nice visualisation cant be found at trie tree visualization.
(Keep in mind that the find method implemented in the visualization only check for exact match which isn't our case)
Fortunately there are plenty of simple trie tree implementation.
And I saved myself lots of time writing it from scratch.
End Result
The end result can be seen in sandbox example.
Inserting any string into the original datalist
input will take significantly more time
and will show too many unrelated results. Using the trie
based auto complete is considerably faster and doesn't show
unneeded data, two wins with one change! :)
<label>Data List</label>
<input list="dropdown_menu" name="example">
<datalist id="dropdown_menu"></datalist>
<br>
<label>Trie Tree</label>
<input id="trie_tree" onkeyup="updateTrie()">
<ul id="trie_menu"></ul>
// Here comes code from https://gist.github.com/tpae/72e1c54471e88b689f85ad2b3940a8f0
var trie = new Trie();
function makeid(length) {
var result = '';
var characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
var charactersLength = characters.length;
for ( var i = 0; i < length; i++ ) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
}
return result;
}
var datalist = document.getElementById('dropdown_menu');
for (var i = 0; i < 10000 ; i++) {
var value = makeid(30);
var option = document.createElement('option');
option.value = value;
option.setAttribute("value", value);
var t = document.createTextNode(value);
option.appendChild(t);
datalist.appendChild(option);
trie.insert(value.toString());
}
function removeAllChildNodes(el) {
while (el.firstChild) {
el.removeChild(el.firstChild);
}
}
function updateTrie() {
var trie_el = document.getElementById('trie_tree');
var trie_menu = document.getElementById('trie_menu');
removeAllChildNodes(trie_menu);
var text = trie_el.value;
var options = trie.find(text);
for (var i = 0 ; i < options.length ; i++) {
var tmp_text = options[i];
var li = document.createElement('li');
li.innerHTML = tmp_text;
trie_menu.appendChild(li);
}
}
Top comments (0)