The shape property of heaps is essential because a complete binary tree provides some important mathematical relationships between the number of nodes in the tree and its height.
We refer to the height of a tree as the number of edges on the longest path from the root to a leaf. A tree with just a single node has a height of 0, since there are no edges. A tree with a single layer of children has a height of 1. If there are children of those children, the height is 2, and so on.
Note that in a complete binary tree, every level has twice as many nodes as the level above it. The first level (the root) has 1 node. The second level has 2 nodes. The third level has 4 nodes. The fourth level has 8 nodes. In general, level \(h\) has \(2^h\) nodes.
If we sum up all the nodes from level 0 to level \(h\text{,}\) we can calculate the total number of nodes in a complete binary tree of height \(h\text{.}\) This sum is a geometric series:
We can confirm this using a height 3 tree. A height 3 three would have levels of 1, 2, 4, and 8 nodes respectively, summing to 15 nodes in total. The formula gives us \(2^{3+1} - 1 = 2^4 - 1
= 15\text{,}\) a perfect match.
The more useful formula for heaps is the inverse of this. Given \(n\) nodes, what is the height of the complete binary tree that contains them? We can solve the formula for \(h\text{:}\)
The precise formula for height is not as important as the fact that height grows logarithmically with the number of nodes. Given a number of nodes \(n\text{,}\) the height \(h\) is given by \(h\) and\(h \approx \log_2(n)\text{.}\) The height of a tree also tells us the maximum number of steps it takes to get from the root to any node. In the case of our binary heap, this means that any operation that involves traversing from the root to any node (or vice versa) will take at most \(O(\log n)\) steps.
We can see this fact intuitively if we focus on the number of nodes in the current row as we move from the bottom of the tree to the top of the tree. Imagine we start on a level with 32 nodes. When we move up to the next level, there will only be 16 nodes. The level above that will have 8 nodes. Then 4, then 2, then 1. Each time we move up a level, the number of nodes is halved. This pattern:
Is a classic \(log_2\) pattern. The starting value we used is the number of nodes at the bottom level. That is different than the number of nodes in the entire tree, but the two are directly related. In a complete tree, the bottom level will always have about half the nodes in the entire tree. (Confirm it yourself by drawing a few complete binary trees and counting the nodes.)
So, if we know there are 100 nodes in a tree, we can expect the bottom level to have about 50 of them. And, from there, it is approximately \(log_2(50)\) steps to the top. Those are approximations, but even so, they give an answer of ~5.64, which is essentially what we calculated with the precise formula.