Investigating tree height of a red black tree

#python #computerscience

I'm implementing a red black tree in python, with the aim of learning about data structures and algorithms.

In the process, I've noticed something interesting. I looked at the relationship between number of elements in the red black tree, and the depth (or height) of that tree. For this test, I fed the trees with varying amounts of elements and tracked the tree height.

For example, if the number elements is 20, I found that the height fluctuated between 4 and 5. Here's an example output:

Number of elements in balanced tree 20
Height of balanced tree: 5
Number of elements in balanced tree 20
Height of balanced tree: 5
Number of elements in balanced tree 20
Height of balanced tree: 4
Number of elements in balanced tree 20
Height of balanced tree: 5

This matched my expectation that the height would be mostly balanced, but fluctuate slightly.

However, I was surprised by the result at 85 elements. Here's a representative output:

Heights of 100 balanced trees with 85 elements:
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7]

I was so surprised by this that I ran the trial quite a few more times. Very occasionally, the resulting height would be 6. I estimate this happens 2 or 3 times out of a thousand trials.

A good follow up would be the following: Measure the likelyhood of getting a given height h as a function of number of elements, n. As in P(h, n).

I hope to do that and write about it soon. It might also be useful to explain some of the terminology and concepts surrounding a red black tree, since I wrote this post assuming rather in depth knowledge of the subject. Apparently I've graduated beyond writing for the beginner tag. (Not that I'm done writing for the beginner tag, nor that there's anything wrong with writing beginner posts - it can actually be a challenge to present things at a beginner's level, and it's very important - but it's kind of exciting to see how I've progressed in terms of the complexity of topics I can take a bite of.

P.S. The code for this project is available here:
https://github.com/erik-kristofer-anderson/Red-Black-Tree

Also, you can play with it in repl.it here. The code for what I talked about in this post is in the file test_height.py.

Note that the content at these links may change as I work on this project.

Top comments (2)

Jaakko Kangasharju • Dec 13 '19

I'm guessing here, but the stability of the height might happen because 85 is a bit far from powers of 2, which are where the height of an optimal binary tree changes. If I'm right about that, you should see more variation if you test with, say, 125 or 130 elements.

Using random numbers as elements probably contributes something too. Usually, to get deviating behavior out of data structures, especially when the size increases, you need to craft the input specially a bit. Random inputs tend to show the "standard" behavior. To check this, it would be interesting to save the insertion sequences for when you got height 6 and study how the insertion order there differs from a more typical case.

Random insertion order will also make a regular binary search tree, without any balancing, perform well enough. It will have a larger height than a red-black tree, but not by much. It could be interesting to repeat your height test by inserting without balancing and comparing how the heights differ.

I don't know how deep you want to go in studying this specific phenomenon, though, so maybe I'm overthinking this. :-)

Erik Anderson • Dec 13 '19

You aren’t overthinking this. :-)
I’ve actually looked into some of what you mentioned and I’ll be writing some more about it shortly.