DEV Community

Discussion on: Java interview prep: 15 Java interview questions

Collapse
 
aminmansuri profile image
hidden_dude

Q13: HashSet vs TreeSet

You claim HashSet is thread safe. But the Javadocs say otherwise:

"Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Set s = Collections.synchronizedSet(new HashSet(...));
"

For multi-threading I highly recommend java.util.concurrent rather than synchronizedSet(). But no, its very dangerous to use HashSet in multithreaded situations. In fact, in the past there were cases where HashSet's cousin, HashMap hung even when there were only reads across threads.

The main difference is that HashSets use a hash table implementation whereas the TreeSet uses some binary tree implementation. There are many differences such as the ones you mentioned. Other differences include:

  • Hash tables use an O(1) algorithm except when hashes collide when it can degrade to O(N) if we're not careful. Of course, they aren't immune to memory paging
  • Balanced binary tree implementations (such as red black trees used in Java) are O(logN) so they have very good performance as well but less than Hash tables
  • Hash tables are contiguous in memory so they can take advantage of spatial locality in the CPU
  • Binary trees are not contiguous so the memory may be all over RAM meaning that its unlikely that reading one node would lead to reading the next one. So you'll probably get a lot more cache misses (even though you may get some advantages in terms of temporal locality in the cache)
Collapse
 
aminmansuri profile image
hidden_dude

In practice I've used TreeSet as an alias of "ordered list with no repetitions". HashSet I've used to check what has been processed vs what still needs to be processed.

But often I just end up using a Map instead. They're often more useful.