When inserting into a binary search tree, we will always add the new value as a leaf node. For any value we might want to add, there will always be a single location where that leaf can go that maintains the search tree property. Consider the following tree:
If we want to insert 12, it needs to be to the right of 10 and the left of 15. If we want to insert 6, it would need to be to the left of 10, the right of 5, and the left of 7. Each of those series of comparisons results in a unique locationβthe same location where we would expect to find it using the find operation. So, the insertion algorithm is very similar to the find algorithm, only instead of stopping when we find the value, we wonβt stop until we reach a null pointer where the new node can be added as a leaf.
The binary search tree for the values \(10, 5, 15, 3, 7, 20, 22\) is shown below. We are prepared to insert \(4\text{.}\) Use the > button to step through the insertion process.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
Because nodes always go in as leaves, the order we insert values can have a big impact on the shape of the tree. For instance, if we insert the values \(1, 2, 3, 4, 5, 6, 7\) in that order, we end up with a tree that is essentially a linked list. If we instead insert them in the order \(4, 2, 6, 1, 3, 5, 7\text{,}\) we end up with a much more balanced tree.
Because many algorithms on a binary search tree depend on the height of the tree, this balance can have a big impact on performance. For now we will just note this fact and return to it later when we discuss tree balancing algorithms.
If you clear the BST interactive and try inserting a value, you will see one special case that needs to be handled. When the tree is empty, there is no root node to start the search from. In that case, we simply create a new node and set it as the root of the tree.
In every other case, we need to find the right location to insert the new node. However, once we find the right location, we need to update a pointer in the parent node to point to the new node. This means we have a few choices:
Peek ahead to the child pointer before moving down the tree. If the child pointer is null, insert the new node there. Otherwise, move down to the child and continue.
Use the logic we first saw while implementing recursive functions for linked lists. We can return the new node back to the function that was responsible for its parent.
Here is an implementation of the third option. It is a recursive function that returns the (possibly new) subtree that results from inserting the value at the current location. If the current node is null, we create a new node and return it. Otherwise, we recursively call the insertion function on either the left or right child, depending on the value being inserted. Finally, we return the current node.
Node* insertRecursive(Node* node, int value) {
if (node == nullptr) {
// found location
Node* newNode = new Node(value);
return newNode; // return the new node
}
else if (value < node->data) {
// belongs in left subtree, hand it off
// set left child to result of insertion
node->left = insertRecursive(node->left, value);
}
else {
// belongs in right subtree, hand it off
// set right child to result of insertion
node->right = insertRecursive(node->right, value);
}
return node; // return the node we started with
}
There are a couple of key things to note. First, we always return a Node*. It is either the new node we created (if we found the location, line 5) or it is the address of the node that we started with (βIβm still here!β, line 17). Second, when we make the recursive call, we know that we will get back the memory address that should be assigned to the appropriate child pointer. So we assign the result of the recursive call to either node->left or node->right (lines 10, 15). Most of the time, that assignment will not make a change. But, if the recursive call creates a new node, this is how we will update the parent.
This recursive helper will be called by a public BST::insert method that starts the process at the root of the tree. It will call the recursive function on the root. If the root is null, the new node will be created and returned, and we will set the treeβs root pointer to that new node. Otherwise, the existing root pointer will be returned and reassigned to itself (no change).
class IntBST {
private:
Node* root = nullptr;
public:
void insert(int value);
};
// Non-recursive member function that starts the recursive search
void IntBST::insert(int value) {
// Start the recursive insertion at the root (possibly nullptr)
// The address of the old or new root node will come back
root = insertRecursive(root, value);
}
Checkpoint31.5.1.
Construct the logic for an insert function that βpeeks aheadβ to determine where to insert the new value. (We would need to update the logic of IntBST::insert to match it.)
We can also build an iterative version. To do so, we first find the correct parent node by walking the tree until we reach a null pointer. As we do so, a parent pointer will keep track of the last non-null node we visited by remaining one step behind the current pointer. Once we find the correct parent node, we can insert the new node as its left or right child.