Removal of a value from a BST is a bit more complicated than insertion or finding a value. Like insertion, we start by recursively searching for the node with that value. As we recurse, we will update the pointer on the side of the recursion that leads to the node we are removing (lines 7, 10 below). Each recursive call is responsible for returning the node that will be in that position after removal.
Node* removeRecursive(Node* node, int value) {
if (node == nullptr) {
return nullptr; // value not found, nothing is there
}
if (value < node->data) {
// Value is in left subtree
node->left = removeRecursive(node->left, value);
} else if (value > node->data) {
// Value is in right subtree
node->right = removeRecursive(node->right, value);
} else {
// Node with the value found
}
return node; // node remains unchanged
}
That logic is missing the key part at line 12: what to do once we find the node to be removed. That logic not only needs to delete the node, it needs to figure out what to do with any children that are hanging off that node. There are a few cases to consider:
The node to be removed is a leaf (has no children).
The easiest case to handle is when the node to be removed is a leaf node, meaning it has no children. In this case, we can simply delete the node and return nullptr to indicate that this position in the tree is now empty.
If the node to be removed has only one child, we can remove the node and connect its parent directly to its child. This effectively bypasses the node being removed. Whether the child is on the left or right, that node will be a valid replacement for the removed node.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
Below is the code to handle the one-child case. We need to use a temporary pointer to hold onto the child node before we delete the node being removed. Then we can return the child pointer to the caller, which will update its child pointer to bypass the removed node.
The most complex case is when the node to be removed has two children. In this case, we canβt just have the parent adopt the children of the removed node, as it only has one pointer to connect to a child.
Instead, in this case, we will find a replacement value to put in the node of the value being removed. The node itself will remain in place, we will just swap out its value. Then, we will go remove the node that originally held the replacement value (which will now be a duplicate).
That replacement value will be the next largest value in the tree, which is the smallest value in the right subtree of the node being removed. It is guaranteed that this value will have at most one child (it canβt have a left child, or it wouldnβt be the smallest value). So when we remove that node, we will be in either the no-children or one-child case.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
Here is the code to handle the two-children case. We first find the minimum value in the right subtree, replace the value in the current node with that minimum value, and then recursively call removeRecursive on the right child to remove the duplicate value.
...
else {
// Case 3: Two children
// Find the minimum value in the right subtree
int minValue = findMin(node->right);
node->data = minValue; // Replace value with min value
// Remove the duplicate value from the right subtree
node->right = removeRecursive(node->right, minValue);
}
...
Note that in this case, we do not have a delete. We are just changing the current node. We rely on the recursive call to find and remove a node and thus end up with one fewer node in the tree.
Even when we continue the removal process note that we end up following a single path down the tree, just like we do when inserting or finding a value. Thus it should take \(O(\log n)\) time.
If we end up in the two child case, we may make two recursive trips down the tree: one to find the value to replace the current nodeβs value with and the other to remove the duplicate minimum value. However, two trips that are each \(O(\log n)\) add up to just \(O(\log n)\text{.}\) We could develop a more complex algorithm that finds and removes the minimum value in a single pass, but it would not improve the overall time complexity.
delete the current node #tag:a; depends:;
---
return nullptr #tag:b; depends:a;
---
set left child to nullptr #distractor
---
get value (X) that is
the smallest value in the right subtree #distractor
---
set right child to result of
removing X from right subtree #distractor
---
set current node's value to X #distractor
---
set right child to nullptr #distractor
---
return the right child's address #distractor
---
return the current node's address #distractor
---
return the left child's address #distractor
delete the current node #tag:a; depends:;
---
return the left child's address #tag:b; depends:a;
---
set left child to nullptr #distractor
---
get value (X) that is
the smallest value in the right subtree #distractor
---
delete the smallest value in the right subtree #distractor
---
set right child to result of
removing X from right subtree #distractor
---
set current node's value to X #distractor
---
set right child to nullptr #distractor
---
return nullptr #distractor
---
return the current node's address #distractor
---
return the right child's address #distractor
get value (X) that is
the smallest value in the right subtree #tag:a; depends:;
---
set current node's value to X #tag:b; depends:a;
---
set right child to result of
removing X from right subtree #tag:c; depends:a;
---
return the current node's address #tag:d; depends:b,c;
---
set left child to nullptr #distractor
---
set right child to nullptr #distractor
---
return nullptr #distractor
---
return the left child's address #distractor
---
return the right child's address #distractor
---
delete the current node #distractor