Skip to main content

Section 33.6 Hash Tables Insertion and Collisions

Now that we have explored hashing, we can focus on using it to build a hash table.
A hash table is a data structure built on top of an array. Given a key, we will compute the hash value of the key and then modulo that hash value by the size of the array to determine where the item belongs. If we are inserting the value, we will store it in that cell. If we are looking for a value, we will look in that cell to see if the value is there.

Activity 33.6.1. Hash Table.

The hash table contains the values 20, 33, and 7. Try inserting 18. Because the hash table size is 8, it should be placed into location 18 % 8 == 2 .
Then try finding 20.

Instructions.

When the > button is highlighted, an animation is prepared to run. Use the Step button to step through the animation one step at a time.
When no animation is in progress, you can use the data controls to start a new animation.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
  • Core Controls.
    • << : Skip to the beginning of the animation.
    • < : Step backwards one step in the animation.
    • > : Step forward one step in the animation.
    • >> : Skip to the end of the animation.
    • Auto Step Speed Set to anything other than off to automatically step through the animation at the selected speed.
    • Zoom Set the zoom level. You can also use Ctrl + Mouse Wheel to zoom in and out.
  • Data Controls.
    • Value Use this area to enter a value when inserting, finding, deleting, etc...
Our table only has 8 cells. If you try inserting other values, it will not take long to find a collision, which is when two different keys hash to the same location. For example, if you insert 12, it will hash to the same location as 20 because 12 % 8 == 4 and 20 % 8 == 4. We need a way to store both of these values even though they want to be in the same cell.
There are two main strategies for handling collisionsβ€”closed and open hashing. The first general strategy we will examine is closed hashing (also confusingly known as open addressing). In closed hashing, if we are inserting a value and there is a collision, we move to a different cell in the array to store the value. If that next location is also occupied, we will continue visiting different cells until we find an open cell.
The most straightforward way to implement this is to use linear probing. In linear probing, we simply add 1 to the index each time we need to move to another cell because the current one is full. If we reach the end of the array, we will wrap around to the beginning and continue looking for an open cell. (We are moving in a straight line through the array, probing for an opening.)
For example, imagine we are inserting the value "Jessica Ng" into a hash table of size 7. The hash function and mapping tell us that the record should go in cell 4. If we see that cell is already occupied, we will move to cell 5 and check if it is available. If not, we would move to cell 6, and so on.
"John Smith" is in cell 4, "Doug Jones" is in cell 5, and "Jessica Ng" is in cell 6.πŸ”—
Figure 33.6.1. The Jessica Ng record should have gone in cell 4. But cells 4 and 5 were occupied, so it ended up in cell 6.

Activity 33.6.2. Linear Probing.

The hash table is set to insert 22. That should go in bucket 22 % 8 == 6. But we will have to probe to find a free cell. Use > to step through the probing process.

Instructions.

When the > button is highlighted, an animation is prepared to run. Use the Step button to step through the animation one step at a time.
When no animation is in progress, you can use the data controls to start a new animation.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
  • Core Controls.
    • << : Skip to the beginning of the animation.
    • < : Step backwards one step in the animation.
    • > : Step forward one step in the animation.
    • >> : Skip to the end of the animation.
    • Auto Step Speed Set to anything other than off to automatically step through the animation at the selected speed.
    • Zoom Set the zoom level. You can also use Ctrl + Mouse Wheel to zoom in and out.
  • Data Controls.
    • Value Use this area to enter a value when inserting, finding, deleting, etc...
This also means that when we go to check if we have a value in the table, and find a different value in the location the hash function produced, we need to probe to see if the value was inserted in an alternate location. A value that was supposed to be placed at index 6 may have had to probe through several cells before finding an open location. Thus, to find a value, we need to continue probing until we either find it, or we reach an open location before deciding that the key is not present.
Try using Find to search for 16 in the animation above (after the insertion is done). The value is not present, but we can’t be sure until we have probed through a few cells to reach an empty cell.
You may also have noted that our hash table has built up a contiguous group of occupied cells. This is known as a cluster. Clusters are problematic because they can lead to long probe sequences. Even worse, any value that hashes to a cell in the cluster will end up being added to the end of the cluster, which makes that cluster even bigger.
To avoid clustering, we can modify the probing sequence or use other collision resolution techniques. Instead of linear probing where our distance from the original location increases like 1, 2, 3, 4, 5, ..., we can do quadratic probing, where we square the probe number to get the distance: 1, 4, 9, 16, 25, .... Alternatively, we can use double hashing, where we use a second hash function to determine the step size for probing. This way different items that hash to the same original location will have different probe sequences, reducing clustering.
The animation below demonstrates quadratic probing in action.

Activity 33.6.3. Quadratic Probing.

This hash table has 16 cells. 3, 4, and 5 are occupied. The animation will insert 19 using quadratic probing. Use > to step through the process.
Then insert 35. It also belongs in the cell at index 3 and will have to probe one step further.

Instructions.

When the > button is highlighted, an animation is prepared to run. Use the Step button to step through the animation one step at a time.
When no animation is in progress, you can use the data controls to start a new animation.
You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.
  • Core Controls.
    • << : Skip to the beginning of the animation.
    • < : Step backwards one step in the animation.
    • > : Step forward one step in the animation.
    • >> : Skip to the end of the animation.
    • Auto Step Speed Set to anything other than off to automatically step through the animation at the selected speed.
    • Zoom Set the zoom level. You can also use Ctrl + Mouse Wheel to zoom in and out.
  • Data Controls.
    • Value Use this area to enter a value when inserting, finding, deleting, etc...
Notice that even when we have a sequence of collisions, quadratic probing helps to reduce clustering by spreading out the probe sequence. If you like, you can also experiment with double hashing.

Note 33.6.1.

The size of the table can also have an impact on clustering. If the size of the table is a power of 2, then certain patterns in the keys can lead to more collisions and clustering. To mitigate this, it is often recommended to use a prime number for the size of the table. The animation used in this book uses a size of 8. In a real implementation we would likely want to make a small table using a size of 7 or 11.

Checkpoint 33.6.1.

Build the pseudocode for inserting a value into a hash table using linear probing. You will not use all the blocks.
You have attempted of activities on this page.