Section 30.1 Priority Queues
In Section 29.3, we learned about queues. Their key feature of course is that they are first in, first out. This makes them useful for representing real-world queues (line ups), such as lines at a supermarket or storing things like a collection of incoming print jobs.
However, given a collection of “things to process”, we often want to be able to specify that some things are more important than others. For example, in a hospital emergency room, patients with life-threatening injuries should be treated before patients with minor injuries, even if the minor injury patients arrived first. In an operating system, responding to user input is generally more important than background tasks like file indexing.
The term for a queue that lets us specify the importance of items and process them in that order is a priority queue.
A priority queue acts like a queue in that we can add items to it and remove items from it. However, instead of removing items in the order they were added, we remove them in order of their priority. The item with the highest priority is removed first, then the item with the next highest priority, and so on. If two items have the same priority, they are removed in the order they were added (first in, first out).
We can imagine implementing a priority queue by using a sorted vector, where items are always kept in order of their priority. Given jobs of priorities
{4, 7, 11, 15, 19}, they would be stored in the list in that order. In this implementation, finding and removing the highest priority item is efficient because it is always at the end of the vector. We can just remove the last item in the vector in \(O(1)\) time. However, adding new items is less efficient. To add a new item, we have to find the correct position to insert it to maintain the sorted order. The worst case would be when we insert an item with very low priority, which would require shifting all the other items over to make room for the new item. This takes \(O(n)\) time.
Alternatively, we could implement a priority queue using an unsorted vector. Our list of items might look like
{19, 7, 4, 15, 11} In this case, adding new items is efficient because we can just add them to the end of the vector in \(O(1)\) time. However, removing the highest priority item is less efficient because we have to search through the entire vector to find it. This takes \(O(n)\) time.
Neither of these implementations is ideal because one of the two main operations (adding or removing items) is inefficient. However, there is a way to store the data using a structure known as a heap and achieve both operations in \(O(\log n)\) time.
| Implementation | Add Item | Remove Highest Priority Item |
|---|---|---|
| Sorted Vector | \(O(n)\) | \(O(1)\) |
| Unsorted Vector | \(O(1)\) | \(O(n)\) |
| Heap | \(O(\log n)\) | \(O(\log n)\) |
How big a win is this? Is it really worth devising a new representation to get \(O(\log n)\) time for both operations instead of \(O(n)\) for one task and \(O(1)\) for the other? (Does having an \(O(1)\) task cancel out the expense of a more expensive \(O(n)\) task?)
We can analyze this in the abstract using Big-O. Would we rather do \(O(1) + O(n)\) or \(O(\log
n) + O(\log n)\text{?}\) In the first expression, the \(O(n)\) term dominates as n gets large, so we can simplify it to just \(O(n)\text{.}\) In the second expression, the two \(O(\log n)\) terms combine to give us \(O(2 \cdot \log n)\text{,}\) which simplifies to just \(O(\log n)\text{.}\)
In concrete numbers, imagine a priority queue with 1,000,000 items. To add and then remove something, would we rather do 1 unit of work and then 1,000,000 units of work? Or do ~20 units of work twice? (\(\log_{2}{1,000,000}\) is approximately 20.)
You have attempted of activities on this page.
