Adapting Algorithms for Linked Lists

Section 28.6 Adapting Algorithms for Linked Lists

In Chapter 26, we examined a number of algorithms for working with data in a list. At the time, we were focused on their implementation in array-based lists. Most of those algorithms can be adapted to work with linked lists, but doing so sometimes requires a different approach and may have different performance characteristics.

🔗

An important feature array-based lists support is efficient random access to their elements. We can jump to any index in the list in constant time (\(O(1)\)). Linked lists lack this ability. Thus any algorithm that relies on jumping to arbitrary indexes in a list (instead of advancing one index at a time) will not be efficient when implemented with a linked list.

🔗

A good example of such an algorithm is binary search. Binary search calculates a middle index and then jumps to that location to compare the value there with the key being searched for. In an array based implementation, we can jump to any index in constant time. We do \(O(\log n)\) of these constant time checks for an overall Big O of \(O(\log n)\text{.}\)

🔗

In a linked list, we can’t instantly jump to the middle. Instead we have to traverse the nodes one by one to get there. So each time we try to "check the middle value", we would need to walk through half of the remaining elements.

🔗

Imagine doing a binary serach on a linked list of 100 elements. First, we would need to traverse 50 elements to get to the middle. If the target value is not there, we would then need to traverse another 25 elements to get to the middle of the remaining half. Then 12 steps to get to the middle of that half. And so on. We would end up traversing a total of 50 + 25 + 12 + 6 + 3 + 1 + 1 = 98 elements in the worst case. That is almost the entire list!

🔗

More generally, for a list of \(n\) elements, we would need to traverse approximately \(n/2 + n/4 + n/8 + ... + 1\) elements in the worst case. This series sums to \(O(n)\text{,}\) which is no better than a simple linear search. Thus, although we could implement binary search for a linked list, there is no performance benefit to doing so. It is better to just use a linear search algorithm, which is simpler and has the same Big-O performance.

🔗

Insight 28.6.1.

Binary search requires \(O(n)\) time on a linked list due to the lack of random access!

🔗

Other algorithms that might rely on random access in an array-based list can be modified to do their work differently in a linked list. For example, consider the sorting algorithm quicksort. Recall that the core function for quicksort looks like this:

🔗

QuickSort(list, startIndex, endIndex):
  if startIndex >= endIndex:
    return list  // size 0/1 already sorted
  else:
    pivotLocation = Partition(list)
    QuickSort(list, startIndex, pivotLocation - 1)  // Sort the first half
    QuickSort(list, pivotLocation + 1, endIndex)    // Sort the second half

All of the hard work has to happen in the Partition function, which rearranges the elements in the list so that all elements less than a chosen pivot are before it and all elements greater than the pivot are after it. In our array-based implementation of quicksort, this partitioning used indexed access to keep track of our location in the front and back of the array and to swap elements as needed. When it was done, it returned the final index of the pivot element.

🔗

The natural adaptation of quicksort for a linked list would be to implement the Partition function using iterators (or pointers) to keep track of our position in the list like i and j do in the array-based implementation. In a doubly linked list, we could have an iterator starting at the beginning of the list and another at the end, moving them towards each other as we partition the list. When the two iterators meet, we have completed the partitioning and can swap the pivot element to its final location. After the partitioning is done, we would need to return an iterator pointing to the pivot element so that the recursive calls can be made on the two halves of the list.

🔗

In a singly linked list, we would have to get even more creative since we can’t have a pointer moving backwards. But it is still possible to implement partitioning in a way that only requires a single pass through the list.

🔗

To do so, we would reimplement the partitioning logic to produce two sublists

A sublist of elements less than the pivot

🔗
A sublist of elements greater than or equal to the pivot

🔗

One by one for the \(n\) elements, we would remove the first element from the original list (\(O(1)\)) and append it to the appropriate sublist (\(O(1)\) if we have a tail pointer or maintain a temporary pointer to the tail of the lists while we build them). Thus, the entire partitioning process could take \(O(n)\) time, just like in the array-based implementation.

🔗

Exploration 28.6.1. Partitioning a Linked List.

(a)

We start with a linked list:

🔗

🔗

(b)

We break off the first value to be the pivot.

🔗

The first value, 6 is removed from the list and pointed to by pivot. The original list now has 4 nodes.🔗

Stealing the first node to be the pivot only requires updating three pointers, no matter how long the list is.

🔗

(c)

We compare the first remaining item (2) to the pivot. It is smaller, so it goes into the firstHalf.

🔗

A new list firstHalf now points to the node with value 2. The original list is down to 3 nodes.🔗

🔗

(d)

We compare the first remaining item (8) to the pivot. It is larger, so it goes into the secondHalf.

🔗

A new list secondHalf now points to the node with value 8. The original list is down to 2 nodes.🔗

🔗

(e)

We work our way through the remaining items in the original list, moving each one to either firstHalf or secondHalf.

🔗

We end up with an empty original list; pivot pointing at a node with 6; firstHalf that is a linked list with all the smaller values; and secondHalf that is the larger values.🔗

🔗

Once we have partitioned the list into two sublists, we can recursively sort each sublist and then concatenate them together with the pivot in between. Concatenating lists together can be done in \(O(1)\) time if we have tail pointers. If we don’t have tail pointers, we would need to traverse the first sublist to find its end before we can concatenate, which would take \(O(n)\) time. Or we could simply move items one at a time from the sublists to a new combined list, which would also take \(O(n)\) time.

🔗

Assuming we have a size function, the top level function might be adjusted to look like this:

🔗

QuickSort(list):
  if list.size() <= 1:
    return list  // size 0/1 already sorted
  else:
    List firstHalf()      // empty list to hold elements < pivot
    List secondHalf()     // empty list to hold elements >= pivot
    ListNode pivot =  remove first value from list
    // split list into two sublists based on pivot value. List will be empty after this.
    Partition(list, pivot, firstHalf, secondHalf)   
    QuickSort(firstHalf)         // Sort the first half
    QuickSort(secondHalf)        // Sort the second half
    // merge the lists back together with the pivot in between.
    // first and second half will be empty after this.
    list = Concatenate(firstHalf, pivot, secondHalf)
    return list

In terms of Big-O, we would expect the following work:

Partitioning the list: \(O(n)\)

🔗
Recursively sorting the two sublists: \(2T(n/2)\)

🔗
Concatenating the lists together: \(O(1)\) if done with pointers. \(O(n)\) if we have to traverse the first list or build a new combined list element by element.

🔗

Whether we do the concatenation in \(O(1)\) or \(O(n)\text{,}\) the overall recurrence relation remains \(T(n) = 2T(n/2) + O(n)\text{,}\) which solves to \(O(n \cdot \log n)\text{.}\)

🔗

Although we have a proposed implementation in \(O(n \cdot \log n)\) time, we are relying on using the first element as our pivot value. Recall that this can lead to poor performance on already sorted lists. In that case, the pivot does not separate the list into two halves, but instead produces one empty sublist and one sublist with all the remaining elements. This leads to a recurrence relation of \(T(n) = T(n-1) + O(n)\text{,}\) which solves to \(O(n^2)\text{.}\)

🔗

In array-based implementations, rather than using the first element of the range as the pivot, it is common to look at the first, middle, and last elements of the range and choose the median of those three as the pivot (swapping it with the first element before starting the partition). This helps avoid ending up with \(O(n^2)\) performance on already sorted lists.

🔗

However, in a linked list, there is no efficient way to access the middle element of the list and thus no way to efficiently implement this median-of-three pivot selection strategy. We could walk through the list to find the middle element, but that would add an extra \(O(n)\) pass through at the start of the partitioning process, increasing the constant factors. Thus, while it is possible to implement quicksort for a linked list in \(O(n \cdot \log n)\) time, we are probably better off using a different sorting algorithm that is more naturally suited to linked lists.

🔗

Checkpoint 28.6.1.

Design the pseudocode for the Partition function:

🔗

Partition(list, value, firstHalf, secondHalf):
---
  while list is not empty:
---
    element = remove first element from list
---
    if element < value:
---
      append element to firstHalf
---
    else:
---
      append element to secondHalf

🔗

You have attempted of activities on this page.

🔗

Prev Top Next