Load Factor and Growing

Section 33.8 Load Factor and Growing

If we keep inserting items into a hash table, eventually we will run out of space in out array. Even before that point, we will start to see more and more collisions, which will degrade the performance of our hash table. As the table gets more full, the average time for insertions and lookups will increase.

🔗

To measure how full a hash table is, we use the concept of load factor, which is defined as the number of items in the table (not including tombstones) divided by the size of the array. A load factor of 0 means the table is empty, while a load factor of 1 means the table is full.

🔗

\begin{equation*} \text{load factor} = \frac{\text{number of items}}{\text{size of array}} \end{equation*}

We can estimate the number of probes required for insertions and lookups based on the load factor. For example, with linear probing, the average number of cells that will be checked while doing an insertion or unsuccessful lookup is approximately \(\frac{1}{2} \cdot (1 + (\frac{1}{1 - \alpha})^2)\text{,}\) where \(\alpha\) is the load factor. The graph of this function is shown below.

🔗

The graph of \(\frac{1}{2} \cdot (1 + (\frac{1}{1 - \alpha})^2)\) bends sharply upwards after \(\alpha\) passes 0.7.🔗 — Figure 33.8.1. The number of probes (including the initial check) required for insertions and lookups increases as the load factor increases.
🔗

Note that the graph is relatively flat until the load factor reaches something around 0.7. After that point, the number of probes starts increasing rapidly. So, in order to maintain good performance, we want to keep the load factor below that amount. This means that instead of waiting until the table is completely full, we will want to grow the table when the load factor reaches a certain threshold.

🔗

In some respects, the growth process is similar to the growth process for an array list. When we grow the table, we will create a new array that is larger than the old one and copy items over from the old array to the new one. However, there is an important difference. When we grow an array list, we can simply copy items from the old array to the same index in the new array. When we grow a hash table, we cannot do that because the size of the array has changed, which means that the value of \(key \mod size\) has changed for every key. This means that we will need to rehash every item in the old array and insert it into the new array.

🔗

The basic strategy involves temporarily tracking the old array and then resetting the hash table itself to an empty state with a new, larger array. We then iterate over the old array as if it was a list of items we were inserting into the hash table for the first time. The animation below demonstrates the growth process.

🔗

Activity 33.8.1. Hash Table Growth.

We have triggered a growth on the hash table below. Use > to step through the growth process.

🔗

(Note that the load factor is currently only 4/8 = 0.5, so we did not really need to grow the table yet. But this way there are fewer items to rehash and insert.)

🔗

Instructions.

When the > button is highlighted, an animation is prepared to run. Use the Step button to step through the animation one step at a time.

🔗

When no animation is in progress, you can use the data controls to start a new animation.

🔗

You can click and drag on the animation area to pan around. Use the Ctrl + mouse wheel (or touchpad scroll gesture) while over the animation area to zoom in and out.

🔗

Core Controls.
- << : Skip to the beginning of the animation.
  
  🔗
- < : Step backwards one step in the animation.
  
  🔗
- > : Step forward one step in the animation.
  
  🔗
- >> : Skip to the end of the animation.
  
  🔗
- Auto Step Speed Set to anything other than off to automatically step through the animation at the selected speed.
  
  🔗
- Zoom Set the zoom level. You can also use Ctrl + Mouse Wheel to zoom in and out.
  
  🔗
🔗
🔗
Data Controls.
- Value Use this area to enter a value when inserting, finding, deleting, etc...
  
  🔗
🔗
🔗

🔗

An implementation of grow in a hash table might look like this:

🔗

Listing 33.8.2.

🔗

// assumes member variables
// Type* m_table  - the array storing items
// int m_size     - the number of items currently in the table
// int m_capacity - the size of the array
void grow() {
    oldCapacity = m_capacity                  // Keep track of the old capacity
    oldTable = m_table                        // Make a pointer to the old array
    m_capacity *= 2                           // Double the capacity of the table
    m_table = new Type[m_capacity]            // Create a new array with the new size
    m_size = 0                                // Reset size to 0 before re-inserting items
    for (i = 0 to oldCapacity - 1) {          // Iterate over the old array
        curItem = old[i]                      // Get the item at index i
        // If the cell is not empty or a tombstone, re-insert the item into the hash table
        if (curItem != EMPTY && curItem != TOMBSTONE) {
            insert(curItem);
            // insert should handle placing item, incrementing size, and handling collisions
        }
    }
    delete[] old;                             // Clean up the old array
}

Note 33.8.1.

As mentioned earlier, prime sizes for the array can help to reduce collisions. When growing, we can either do extra work (possibly by calculating a list of sizes to use ahead of time) to ensure that the new size is prime, or we can use a simple formula like \(2 \cdot oldSize + 1\) to ensure that the new size is at least odd.

🔗

Checkpoint 33.8.1.

🔗

Checkpoint 33.8.2.

What is a good load factor at which to trigger growth in a hash table?

🔗

0.50
0.70
0.90
1.00

🔗

Checkpoint 33.8.3.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next