When we remove an item from a hash table using closed hashing, we cannot simply mark the cell as empty because that would break the probing sequence for any items that were inserted after it. Consider the situation where we have inserted a record for Jessica Ng but we were forced to probe to find an open cell for it.
Currently, if we go to find Jessica Ng, we will hash her name to cell 4, see that it is occupied by "John Smith", and then probe to cell 5, see that it is occupied by "Doug Jones", and then probe to cell 6, where we will find "Jessica Ng".
Now imagine we remove the record for "John Smith". If we simply mark cell 4 as empty, then when we try to find "Jessica Ng", we will hash her name to cell 4, see that it is empty, and conclude that "Jessica Ng" is not in the table, even though it is!
Because we stopped at the empty location, we never got to cell 6 where "Jessica Ng" is. But we have to stop at empty locations. To not do so would mean we would probe the entire table every time we looked for something that was not in the table. That would require \(O(n)\) time, which would defeat the purpose of using a hash table in the first place. So we need a way to mark a cell as having been occupied but now being empty.
One approach to mark cells as βdeletedβ is to use βtombstoneβ, a special value used to indicate that something used to be there. When we are looking for an item in the table, we will treat tombstone cells as occupied and continue probing. When we are inserting an item into the table, we can treat tombstone cells as empty and insert the new item there. This way, we can maintain the integrity of the probing sequence while still allowing us to reuse cells that have been removed.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
The animation shows a tombstone as <deleted>. But in a real implementation, we would need to pick a value of the type being stored to represent the tombstone. If we are storing integers, we need a tombstone that is an integer. If we were storing Person records, we would need a tombstone that is a Person record.
This can be tricky because we need to make sure that the value we choose cannot be confused with a valid item that we might want to store in the table. For example, if we are implementing a hash table for strings, we might choose a special string like ###deleted### to represent the tombstone. But we then would not be able to store the string ###deleted### in the table because it would be indistinguishable from a tombstone.
We have a similar problem for identifying empty cells. The animation we are using shows unused cells as empty. But if we are storing an array of integers, there is no such thing as an empty integer. So we also need to pick another special value to represent an empty cell. Whatever value we choose, it also canβt be stored in the table as a valid item.
This, if our table is storing integers, we would likely not want 0 to mean βempty cellβ because we then would not be able to store the value 0 in the table. Instead, we might decide that INT_MAX (the largest possible int) will represent a tombstone, and INT_MIN (the smallest possible int) will represent an empty cell. This way, we can store any valid integer other than those two values in the table.
Rather than deal with special values, we can instead use extra storage to keep track of whether a cell is empty, occupied, or a tombstone. One approach would be to store a parallel array of enums that indicate the state of each cell. (If our table is storing 100 cells, we would have an array of 100 enums indicating the status of each cell.) Before checking the value of a cell at a given index, we first check its state by checking that index in the state array. If the state is marked as empty, or as a tombstone, we would then ignore the corresponding value in the main array.