Skip to main content

Section 32.5 AVL Tree Guarantees

Subsection 32.5.1 Do Rotations Work?

First, we should convince ourselves that a rotation is sufficient to restore balance when we find a node with a balance factor of 2 or -2.
The rotation will update the height of the nodes involved in the rotation, but not subtrees below the nodes being rotated (pivot and its longer child). When we start the process, one subtree is taller than the other by 2. That taller subtree is the one that will rotate up. As we do so, it’s height will decrease by one (because the root of that subtree is moving up). Meanwhile, the other subtree (the shorter one) will gain one height because the pivot node is being rotated down into it. The net effect is that the heights of the two subtrees will now be equal, restoring balance.

Subsection 32.5.2 Structure of a Worst-Case AVL Tree

Even if we use rotations to fix trees with a balance factor of 2 or -2, we do allow a balance factor of -1, 1 at each node. So what kind of performance guarantees can we make given that we won’t always have perfectly balanced trees?
To answer this, we depict the minimum number of nodes to construct trees of height 0, 1, 2, and 3 in FigureΒ 32.5.1. In the diagram, the most left-heavy tree possible is constructed for each height. Take a minute to convince yourself that each tree is a legal AVL tree and that removing any node on the right side of a subtree would make it invalid.
Image showing the most unbalanced left-heavy AVL trees for heights 0, 1, 2, and 3, adhering to the AVL balance factor rules. Each tree is labeled with nodes and balance factors: - For height 0, the tree contains a single node "A" with a balance factor of 0. - For height 1, the tree has "B" as the root with a balance factor of 1 and "A" as its left child with a balance factor of 0. - For height 2, the tree has "C" as the root with a balance factor of 1. "B" is the left child of "C" with a balance factor of 1, and "A" is the left child of "B" with a balance factor of 0. "D" is the right child of "C" with a balance factor of 0. - For height 3, the tree has "E" as the root with a balance factor of 1. "C" is the left child of "E" with a balance factor of 1. "B" is the left child of "C" with a balance factor of 1, and "A" is the left child of "B" with a balance factor of 0. "D" is the right child of "C" with a balance factor of 0. "G" is the right child of "E" with a balance factor of 1, and "F" is the left child of "G" with a balance factor of 0. This sequence demonstrates the maximum permissible left-heavy configurations in AVL trees while maintaining balance factors of -1, 0, or 1 at all nodes.πŸ”—
Figure 32.5.1. Worst-Case Left-Heavy AVL Trees.
The number of nodes in each tree is summarized in the table below.
Table 32.5.2. Number of Nodes in Worst-Case AVL Trees
Height 0 1 2 3 4
Number of Nodes 1 2 4 7 12
If you look carefully at the trees, you can see that each tree is constructed by making a root node and then attaching the worst-case tree of height h-1 as the left child and the worst-case tree of height h-2 as the right child. (i.e. The height 3 tree has a height 2 tree on its left and a height 1 tree on its right. The height 4 tree has a height 3 tree on its left and a height 2 tree on its right.)
This means that the number of nodes in a worst-case AVL tree of height \(h\) (\(N_h\)) can be expressed using the recurrence:
\begin{equation*} N_h = 1 + N_{h-1} + N_{h-2}\text{.} \end{equation*}
We can see that this is true of the values in the table. For instance, for height 3 we have \(N_3 = 1 + N_2 + N_1 = 1 + 4 + 2 = 7\text{.}\) For height 4 we have \(N_4 = 1 + N_3 + N_2 = 1 + 7 + 4 = 12\text{.}\)

Subsection 32.5.3 Height of an AVL Tree - Informal

The number of nodes in a worst-case AVL tree of height \(h\) can be expressed using the recurrence
\begin{equation*} N_h = 1 + N_{h-1} + N_{h-2} \end{equation*}
as established in the previous section. Because \(N_{h - 1}\) is larger than \(N_{h - 2}\text{,}\) we can say that
\begin{equation*} N_h > 2N_{h-2} + 1\text{.} \end{equation*}
This says that the number of nodes in a worst-case AVL tree of height \(h\) is always greater than twice the number of nodes in a worst-case AVL tree of height \(h-2\) plus one.
The recurrence
\begin{equation*} T(n) = 2T(n-2) + 1 \end{equation*}
solves to \(O(2^{\frac{n}{2}})\text{.}\) By analogy, we can see that the number of nodes in a worst-case AVL tree of height \(h\) is greater than \(2^{\frac{n}{2}}\text{.}\) So, at this point we have
\begin{equation*} N_h > 2^{\frac{n}{2}} \end{equation*}
What we really care about is the height of an AVL tree given the number of nodes in the tree. If we take the base 2 log of both sides of the inequality
\begin{equation*} N_h > 2^{\frac{n}{2}} \end{equation*}
we get
\begin{equation*} \log_2{N_h} > \frac{h}{2}\text{.} \end{equation*}
If we rearrange this to solve for \(h\) we get
\begin{equation*} h < 2\log_2{N_h}\text{.} \end{equation*}
This final equation tells us that the height of an AVL tree is less than twice the base 2 log of the number of nodes in the tree. This is great news for searching our AVL tree because it limits height of the tree to \(O(\log{N})\text{.}\) This ensures that our search, insertion, and deletion operations will all run in \(O(\log{N})\) time.

Insight 32.5.1.

The height of an AVL tree is \(O(\log{N})\text{,}\) guaranteeing logarithmic time complexity for search, insertion, and deletion operations.

Subsection 32.5.4 Height of an AVL Tree - Formal

A stricter analysis of the AVL relies on identifying that the pattern
\begin{equation*} N_h = 1 + N_{h-1} + N_{h-2} \end{equation*}
is very similar to the Fibonacci sequence. We can use this fact to derive a formula for the height of an AVL tree given the number of nodes in the tree.
For the Fibonacci sequence the \(i_{th}\) Fibonacci number is given by:
\begin{equation*} \begin{split} F_0 &= 0 \\ F_1 &= 1 \\ &\vdots \\ F_i &= F_{i-1} + F_{i-2} \text{ for all } i \ge 2 \end{split} \text{.} \end{equation*}
We can express the value of the number of nodes in a tree of height \(h\) in terms of its position in the Fibonacci sequence:
\begin{equation*} N_h = F_{h+3} - 1, h \ge 0\text{.} \end{equation*}
An important mathematical result is that the \(i^{th}\) Fibonacci number can be found using Binet’s formula:
\begin{equation*} F_i = \frac{\Phi^i - (1 - \Phi)^i}{\sqrt{5}}\text{.} \end{equation*}
You can consult a math text if you want to see a derivation of the previous formula. \(\Phi\) is the golden ratio and as is defined as:
\begin{equation*} \Phi = \frac{1 + \sqrt{5}}{2} \approx 1.618\text{.} \end{equation*}
In Binet’s formula, as \(i\) grows large, the term \((1 - \Phi)^i\) approaches zero, so we can use the following approximation to obtain a value close to the \(i^{th}\) Fibonacci number:
\begin{equation*} F_i \approx \frac{\Phi^i}{\sqrt{5}}\text{.} \end{equation*}
By replacing the Fibonacci reference with its golden ratio approximation we get:
\begin{equation*} N_h = \frac{\Phi^{h+3}}{\sqrt{5}} - 1\text{.} \end{equation*}
If we rearrange the terms, and take the base 2 log of both sides and then solve for \(h\) we get the following derivation:
\begin{equation*} \begin{split} \log_2{(N_h + 1)} &= (h+3)\log_2{\Phi} - \frac{1}{2} \log_2{5} \\ h &= \frac{\log_2{(N_h + 1)} - 3 \log_2{\Phi} + \frac{1}{2} \log_2{5}}{\log_2{\Phi}} \\ h &\approx 1.44 \log_2{(N_h + 1) - 2.57} \end{split} \text{.} \end{equation*}
This derivation shows us that at any time the max height of our AVL tree is approximately equal to a constant (1.44) times the log of the number of nodes in the tree.
We already knew that the height of an AVL tree is \(O(\log{N})\text{,}\) but this formal analysis gives us a more precise understanding of the key constant factor that is hidden in the big-O notation.

Checkpoint 32.5.1.

Which is the best estimate for the maximum height of an AVL tree with \(N\) nodes?
  • \(1.44 \log_2{N}\)
  • \(2 \log_2{N}\)
  • \(\log_2{N}\)
You have attempted of activities on this page.