The rules of a B-tree ensure that it never has empty nodes and the height of every leaf is always the same distance from the root (the tree grows up, not down). Each node has some minimum number of children. Let us call that minimum number of children \(k\text{.}\) At each level of the tree, the number of nodes is at least \(k\) times the number of nodes in the previous level. This means that the number of nodes at level \(h\) is at least \(k^h\text{.}\) If we have \(n\) total nodes in the tree, then we know that \(n \geq k^h\text{,}\) which means that \(h \leq \log_k n\text{.}\) The height of the tree is bounded by a logarithm of the number of nodes, so we can say that the height is \(O(\log n)\text{.}\)
That minimum number of children is determined by the degree of the tree. (Recall that the minimum number of children is half the degree of the B-tree.) As we increase the degree of the B-tree, we decrease its height. A B-tree of say degree 20 (and thus a minimum of 10 children per node) will be shorter than a binary search tree with the same number of values. Given 100 values, the B-Tree would have a height at most \(log_{10} 100 \approx 2\text{.}\) A balanced binary search tree with that many values would have a height at most \(\log_2 100 \approx 7\text{.}\)
Normally we donโt care about the base of logarithms when analyzing time complexity because they only differ by a constant factor. (You can convert \(\log_2\) to \(\log_{10}\) by multiplying by \(~0.301\text{,}\) which is a constant.) Thus, normally we would not distinguish between \(O(\log_2{N})\) and \(O(\log_{10}{N})\text{.}\)
However, if the constant factors associated with each node visit are significant, then the difference in height can have a noticeable effect on performance. One example of where this would be the case is for a data structure where each node visit may involve reading from a disk. A traditional spinning hard drive has a seek time of around 10 milliseconds. That is approximately 100,000 times slower than accessing data in RAM, which typically takes around 100 nanoseconds. (See Latency Numbers Every Programmer Should Know for more details on typical access times for various storage technologies.)
If every step from one node to the next involves a constant factor of 100,000 to go retrieve data from disk and bring it to main memory, then minimizing the number of node visits becomes very important. In this case, the height reduction provided by a B-tree can significantly reduce the clock time required for operations like search, insertion, and deletion.
Of course, there is a tradeoff. If a B-tree node has a list of values, then when trying to search for a given key, we will need to do more work to determine which child pointer to follow. In a binary search tree, we only need to compare the key we are looking for with the one value in the current node (do we go left or right?). In a B-Tree, we need to compare the search key with multiple values (is it before the first value?, the first and second values?, etc...).
This adds some constant overhead to each node visit (there is a fixed, constant number of values in each node). However, that extra work is done on the node that has already been read into memory, so it is comparatively cheap. Although we spend more time in each node, the constant factor for that search within a node is much smaller than the constant factor for reading a node from disk.
Thus, B-trees are often used in database systems and file systems where data is stored on disk. By keeping the tree height low, B-trees help ensure that operations can be performed efficiently even when dealing with large datasets that do not fit entirely in main memory.
When working fully in main memory, the extra complexity of managing the multi-value nodes of a B-tree is often not worth the effort compared to a simpler balanced binary search tree like an AVL tree or a Red-Black tree.