Removal and Special Values

Section 33.7 Removal and Special Values

Subsection 33.7.1 Removal in Closed Hashing

When we remove an item from a hash table using closed hashing, we cannot simply mark the cell as empty because that would break the probing sequence for any items that were inserted after it. Consider the situation where we have inserted a record for Jessica Ng but we were forced to probe to find an open cell for it.

🔗

"John Smith" is in cell 4, "Doug Jones" is in cell 5, and "Jessica Ng" is in cell 6.🔗 — Figure 33.7.1. The `Jessica Ng` record should have gone in cell 4. But it and 5 were occupied, so it ended up in cell 6.
🔗

Currently, if we go to find Jessica Ng, we will hash her name to cell 4, see that it is occupied by "John Smith", and then probe to cell 5, see that it is occupied by "Doug Jones", and then probe to cell 6, where we will find "Jessica Ng".

🔗

Now imagine we remove the record for "John Smith". If we simply mark cell 4 as empty, then when we try to find "Jessica Ng", we will hash her name to cell 4, see that it is empty, and conclude that "Jessica Ng" is not in the table, even though it is!

🔗

Nothing is in cell 4, "Doug Jones" is in cell 5, and "Jessica Ng" is in cell 6.🔗 — Figure 33.7.2. The `Jessica Ng` record maps to cell 4. Because it is empty we assume she is not in the table.
🔗

Because we stopped at the empty location, we never got to cell 6 where "Jessica Ng" is. But we have to stop at empty locations. To not do so would mean we would probe the entire table every time we looked for something that was not in the table. That would require \(O(n)\) time, which would defeat the purpose of using a hash table in the first place. So we need a way to mark a cell as having been occupied but now being empty.

🔗

One approach to mark cells as “deleted” is to use “tombstone”, a special value used to indicate that something used to be there. When we are looking for an item in the table, we will treat tombstone cells as occupied and continue probing. When we are inserting an item into the table, we can treat tombstone cells as empty and insert the new item there. This way, we can maintain the integrity of the probing sequence while still allowing us to reuse cells that have been removed.

🔗

Figure 33.7.3. A tombstone (#) marking cell 4 tells us to continue searching if we encounter it.

🔗

Insight 33.7.1.

A tombstone counts as occupied when we are looking for an item in the table, but counts as empty when we are inserting an item into the table.

🔗

This animation demonstrates the role of tombstones in retrieval and insertion:

🔗

Activity 33.7.1. Hash Table Tombstones.

The hash table is set to Remove 9. Use > to step through the removal process.

🔗

Then try finding 33 which also belongs in cell 1.

🔗

Finally, try inserting 17. It also belongs in cell 1 and can replace the tombstone.

🔗

Instructions.

When the > button is highlighted, an animation is prepared to run. Use the Step button to step through the animation one step at a time.

🔗

When no animation is in progress, you can use the data controls to start a new animation.

🔗

You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.

🔗

Core Controls.
- << : Skip to the beginning of the animation.
  
  🔗
- < : Step backwards one step in the animation.
  
  🔗
- > : Step forward one step in the animation.
  
  🔗
- >> : Skip to the end of the animation.
  
  🔗
- Auto Step Speed Set to anything other than off to automatically step through the animation at the selected speed.
  
  🔗
- Zoom Set the zoom level. You can also use Ctrl + Mouse Wheel to zoom in and out.
  
  🔗
🔗
🔗
Data Controls.
- Value Use this area to enter a value when inserting, finding, deleting, etc...
  
  🔗
🔗
🔗

🔗

Subsection 33.7.2 Special Values

The animation shows a tombstone as <deleted>. But in a real implementation, we would need to pick a value of the type being stored to represent the tombstone. If we are storing integers, we need a tombstone that is an integer. If we were storing Person records, we would need a tombstone that is a Person record.

🔗

This can be tricky because we need to make sure that the value we choose cannot be confused with a valid item that we might want to store in the table. For example, if we are implementing a hash table for strings, we might choose a special string like ###deleted### to represent the tombstone. But we then would not be able to store the string ###deleted### in the table because it would be indistinguishable from a tombstone.

🔗

We have a similar problem for identifying empty cells. The animation we are using shows unused cells as empty. But if we are storing an array of integers, there is no such thing as an empty integer. So we also need to pick another special value to represent an empty cell. Whatever value we choose, it also can’t be stored in the table as a valid item.

🔗

This, if our table is storing integers, we would likely not want 0 to mean “empty cell” because we then would not be able to store the value 0 in the table. Instead, we might decide that INT_MAX (the largest possible int) will represent a tombstone, and INT_MIN (the smallest possible int) will represent an empty cell. This way, we can store any valid integer other than those two values in the table.

🔗

Rather than deal with special values, we can instead use extra storage to keep track of whether a cell is empty, occupied, or a tombstone. One approach would be to store a parallel array of enums that indicate the state of each cell. (If our table is storing 100 cells, we would have an array of 100 enums indicating the status of each cell.) Before checking the value of a cell at a given index, we first check its state by checking that index in the state array. If the state is marked as empty, or as a tombstone, we would then ignore the corresponding value in the main array.

🔗

Checkpoint 33.7.1.

Which statements are true about tombstones in a hash table?

🔗

Whatever value is used as the tombstone cannot be stored as a valid item in the table.
When we see a tombstone while searching, we must continue as if it were an occupied cell.
When we see a tombstone while inserting, we must continue as if it were an occupied cell.
When we see a tombstone while inserting, we can treat it as an empty cell and insert the new item there.
They mark cells that had values which were deleted.
They mark all cells that are empty.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next