So far in this series we have discussed tries and TSTs as possible alternatives to binary search trees when it comes to (in-memory) strings containers.
But is there a real advantage in using them? And do TSTs really allow a further saving on tries?
To answer these questions, in this article I'll be using YourKit Java profiler to check and compare both the memory usage and runtime of simple operations for these data structures.
YourKit is pretty easy to use and it's nicely integrated with IntelliJ Idea, but obviously there are also many alternatives that would serve my purpose, like for instance JProfile.
Profiling Memory
For the memory print, my test is simple: I'll load the full list of words in a typical English dictionary, and see how much RAM each of these data structures take.
The code for profiling these structures might look something like this:
public class StringsTreeProfiler {
public static final String WORDS_JSON = "words_dictionary.json";
@Test
public void profileTrieMemory() throws FileNotFoundException {
Trie trie = new Trie();
Tst tst = new Tst();
BST<String> bst = new BST<>();
InputStream inputStream = new FileInputStream(WORDS_JSON);
try {
JSONObject json = new JSONObject(new JSONTokener(inputStream));
for (String key : json.keySet()) {
trie.add(key);
tst.add(key);
bst.add(key);
}
} catch (ValidationException e) {
Assert.fail();
}
}
For practical purposes, to easily spot the total memory used, I'll split it into three different methods, one per data structure.
You can check the actual code on GitHub.
Obviously when we do such a practical test we need to be aware that there are many variables that impact on the results, from the implementations (for instance, the degree of optimization of the code, or choices like using java.util.Optional
, as we'll see), the compiler options, the libraries used, the operating system, and the architecture of the underlying hardware (for instance, if it's a 32-bit or 64-bit processor).
All of these details make a difference in the final result; do they matter? That depends: clearly when the compared results are close to each other, these factors are more important, because they can make a difference. But, in terms of asymptotic analysis, these choices mostly influence the constant factor that multiplies the running time (unless, of course, there are implementation errors that cause severe inefficiencies, like f.i. implementing a reverse method that takes quadratic time instead of the optimal linear-time version).
Long story short, if the difference is large enough, the system details and compiler options won't matter, and the implementation details will be irrelevant for what concerns fine-grain optimization, and only relevant in case of serious design mistakes affecting the asymptotic running time.
Binary Search Tree
Let's start with our baseline, binary search trees.
First, let's see how much RAM is overall used:
Then, what's even more interesting, we can break down this information by class, and see that we needed 1.8M BST nodes, and a lot of space was also taken up by Optional objects (that, however, were temporary objects used just as return values by some of the methods).
Trie
Now that we have a baseline, let's see how tries are comparatively doing: apparently the total memory used is slightly larger, 206MB versus the 172MB used by binary search trees.
This might be a bit surprising, but breaking it down to the classes level, helps us understand why...
While, in fact, only half as many nodes are created - barely 1M Trie::Node
versus 1.8M BST::Node
- each trie node has to hold a HashMap
, and the total space required by these structures amounts to more than 150MB - almost 75% of the total RAM used.
The overall values, however, are very close, so the two data structures can be considered comparable in their memory print - the actual relative performance will of course depend on the length and overlapping of the strings stored: the more they overlap, the better tries will perform compared to BSTs.
Ternary Search Tree
Now, removing hash tables from nodes is exactly the reason why we introduced TSTs in this series. So, let's see how they do on the same task.
The result is amazing: they just need 61MB of RAM to store the same dictionary that needed three times over with BSTs and tries.
Looking at the breakdown by class, we can see that approximately the same number of nodes is created as it was for tries, but this time we don't have the burden of the HashMap
s to store.
Tst Random Insertion Order
Well that would be conclusive, wouldn't it?
There is only one test that we could, and should, make.
In the listing at the beginning of this post, we are adding words in a for-in loop, directly from the JSON object created from the file.
What's the order in which keys are inserted? Are they sorted, or randomly ordered?
A method like keySet
doesn't offer any guarantee on the order used to return items, so I arranged two tests: one with all keys sorted ascendingly, and one with keys explicitly shuffled, using a List
to hold them and Collections.shuffle
.
List<String> keys = new ArrayList<>((new JSONObject(new JSONTokener(inputStream))).keySet());
Collections.shuffle(keys);
for (String key : keys) {
trie.add(key);
}
Remember that while for tries the order of insertion doesn't matter, it is relevant in TSTs (and BSTs), and an adverse sequence of insertions (like adding keys in a sorted sequence) can result in a skewed, list-like tree.
Despite that, the number of nodes needed stays the same, and the total RAM needed remains consistently below 70MB.
A note on using Optional
For BSTs, you can see how a lot of space is taken up by Optional
- even without that, the total amount of RAM used would be way more than for TSTs, but it would get closer.
Optional
is a great way to make your code cleaner, although the java.util version has some problems limiting its applicability, and it's not meant to be used to store values, like in a class property (it's not even serializable), but just as an intermediate product (especially and originally for Stream
's methods).
In other words, projecting this to the case of our tree-like data structures, replacing null
with Optional.empty
for a missing link to a child is not advisable.
But, just for the sake of curiosity, what would be the impact of doing so, wrapping links to children in Optional
?
Even if it wasn't just poor design, it would be as heavy as for BSTs, more than doubling the size needed for storing the same set of strings.
Profiling Running Time
We now have a decent idea of the advantage given by TSTs in terms of RAM, but that's just one side of the coin!
The other important factor to consider is the running time of the most important operations on these containers: core operations, like insertion and deletion, but also search.
For this test as well, we are going to separately evaluate two situations: when the inputs are shuffled, and when they are sorted ascendingly. For the sorted sequences, however, it wasn't possible to test the binary search tree implementation: by inserting the words in the worst possible sequence, a non-balanced binary search tree would become a long (really long!) linked list, and the recursive implementation of method BST::add
would cause a stack overflow!
You might wonder, why that doesn't happen with Trie
and Tst
? If you remember what we discussed in part 2 of this series, the important difference is that with binary search trees the height of the tree depends on the number of elements stored - logarithmically if the tree is balanced, or linearly in the worst case (which happens to be exactly when inputs are added in a sorted sequence). For tries and TSTs, instead, the height of the tree only depends on the length of the words added and, for TSTs only, also on the size of the alphabet: in the worst-case scenario, we are therefore talking about a difference of orders of magnitudes: n
is in the order of 370K elements, while the sum of the length of the longest word and the size of the alphabet doesn't reach 50.
Insertion
For the core methods, we are going to focus on testing insertion; We will reuse the code inserting all the keys in the English dictionary in a container, and check the performance in two cases: when the list of keys is shuffled, and when it is sorted lexicographically.
@Test
public void profileRunningTime() throws FileNotFoundException {
Trie trie = new Trie();
Tst tst = new Tst();
BST<String> bst = new BST<>();
InputStream inputStream = new FileInputStream(WORDS_JSON);
try {
List<String> keys = new ArrayList<>((new JSONObject(new JSONTokener(inputStream))).keySet());
// Collections.shuffle(keys);
Collections.sort(keys);
for (String key : keys) {
trie.add(key);
tst.add(key);
// bst.add(key);
}
} catch (ValidationException e) {
Assert.fail();
}
}
Shuffled List
What happens when we measure the total time needed to populate the containers with all the words in the dictionary?
Not surprisingly, as we had suggested in the first part of this series, BSTs are the slowest: it takes 1.28 seconds to add the whole dictionary to a BST.
Tries are much better, the whole operation took only 0.51 seconds, less than half the time needed for a BST.
What about TSTs? that's the best part, it only took 0.25 seconds, less than half the time needed for a BST, and 20% of the time used by a BST!
I obviously ran these profiling several times, and besides small variations, the results were always consistent with the image above.
Sorted List
But, as we said, this is the best-case scenario for a TST, because with a random input sequence, the resulting tree will be balanced, on average (with 300K words added, the chances of having a worst-case-scenario input sequence are pretty grim...).
In this case we only could test tries against TSTs, because BSTs, which would have been terribly slow anyway, ran into a stack overflow.
That's one reason why I usually suggest avoiding recursive code, unless you know that your language/compiler has tail-call optimization, or you are sure that your input won't be larger than a safety threshold.
When the input is sorted, tries (which aren't sensitive to the order of insertion) performed better that TSTs.
Longest Shared Prefix
The final step for this post is checking how these three structures are performing on search.
To probe that, I decided to test the method longestPrefixOf
that, given a string s
, checks what's the longest prefix of s
stored in the container.
The testing method will create the containers, and then call longestPrefixOf
a million times, using randomly chosen real words actually stored in the containers, to which a variable number of random characters is added.
@Test
public void profileRunningTime() throws FileNotFoundException {
Trie trie = new Trie();
Tst tst = new Tst();
InputStream inputStream = new FileInputStream(WORDS_JSON);
try {
List<String> keys = new ArrayList<>((new JSONObject(new JSONTokener(inputStream))).keySet());
// Collections.shuffle(keys);
Collections.sort(keys);
for (String key : keys) {
trie.add(key);
tst.add(key);
}
for (int i = 0; i < 1000000; i++) {
String key = keys.get(random.nextInt(keys.size())) + randomKey(0, 10);
trie.longestPrefixOf(key);
tst.longestPrefixOf(key);
}
} catch (ValidationException e) {
Assert.fail();
}
}
private String randomKey(int minLength, int maxLength) {
int length = (int) Math.floor(Math.random() * (maxLength - minLength)) + minLength;
char[] chars = new char[length];
for (int i = 0; i < length; i++) {
chars[i] = (char)(random.nextInt(26) + 'a');
}
return new String(chars);
}
The results tell us that when a TST is balanced, it performs slightly better than an equivalent trie.
When a TST is skewed, instead, a trie performs much better (approximately twice as fast).
Conclusions
Profiling helped us understand the advantages and pitfalls of using various string containers in a real scenario.
We could prove that the effort of implementing and maintaining a dedicated data structure (like a trie or TST) is compensated by a large saving in terms of space (down to 20% of the RAM used by a BST) and running time.
This concludes this series, if you'd like to get more insight or would like to see more, please feel free to add a comment!
Top comments (0)