Split At

Section 23.14 Split At

One of the advantages of having built a data structure is that we can add exactly whatever features are important to us. One operation we might want to support is splitting an ArrayList into two separate ArrayLists. For example, if we have an ArrayList that contains {10, 20, 30, 40, 50} and we want to split it at index 2, we would end up with two ArrayLists: one containing {10, 20} and the other containing {30, 40, 50}.

🔗

Let’s start by designing the interface for this function. We will call it splitAt and will plan on it taking an index as a parameter. The original ArrayList we call it on will be modified to only contain the items up to that point. To store the items after the split point, we will return a new ArrayList that contains those items. So the function signature might look like this:

🔗

ArrayList<T> ArrayList<T>::splitAt(int index);

To implement this function, we can either focus on using existing functions or construct logic from scratch. Using existing functions is often easier and less error prone. However, their abstraction have the potential for hiding possible inefficiencies. So if we use existing functions, we need to be careful to choose ones that don’t do unnecessary work.

🔗

To make sure we understand what work really is required, let’s start with a fresh implementation that doesn’t use any existing functions. Once we have that designed, we can decide if existing functions can simplify the implementation without adding too much overhead.

🔗

Thinking about what the state of memory should look like before and after a call to split can help us decide what work to do. Imagine we have an ArrayList that contains {10, 20, 30, 40, 50} and we call splitAt(2) on it. Memory might look like the left hand at the start of the call and the right hand at the end of the call (Click to enlarge):

🔗

"this" object has a m_size of 5, m_capacity of 6, and a pointer to an array named m_arr. That array contains [10, 20, 30, 40, 50, ???]🔗 — Figure 23.14.1. Memory at the start of `splitAt(2)`.
🔗

"this" object has a m_size of 2, m_capacity of 6, and a pointer to an array named m_arr. That array contains [10, 20, ???, ???, ???, ???]🔗 "rear" object has a m_size of 3, m_capacity of 3, and a pointer to an array named m_arr. That array contains [30, 40, 50, ???, ???, ???]🔗 — Figure 23.14.2. Memory at the end of `splitAt(2)`. Changes are in blue.
🔗

There is no need to change the actual array that this is pointing to. Nor do we need to actually delete anything. We can just change the logical size of this to reflect that it now only contains the first two items. The new ArrayList we will call rear. It needs to have storage space for the number of items being split off (size - index) in a new array. It might be OK to have more space, but certainly not less. And we will need to copy the items being split off into that new array.

🔗

To make the new ArrayList for the rear half, we can call the default constructor. But it might not allocate enough space for the items we need to store. So we will need to check if the default capacity is sufficient and if not, we will allocate a new array with enough capacity. Let’s focus on setting up the new ArrayList first and then we can worry about copying items into it.

🔗

Aside

Listing 23.14.3.

🔗

template<typename T>
ArrayList<T> ArrayList<T>::splitAt(int index) {
    ArrayList<T> rear;
    int numItemsToSplit = m_size - index;
    if (rear.m_capacity < numItemsToSplit) {
        delete[] rear.m_arr; // free the default array
        // change capacity and allocate new array
        rear.m_capacity = numItemsToSplit;
        rear.m_arr = new T[numItemsToSplit]; 
    }

    // copy items from index to the end into rear's array

    // "remove" them from the original ArrayList
    m_size = index;

    return rear;
}

Lines 5-10 handle checking to see if the newly created rear has enough capacity and if not, allocates a new array with sufficient capacity. Line 15 handles “removing” the items from the original ArrayList by changing the logical size of this.

🔗

In between, we just need to copy items over. We have already calculated how many items we need to copy in the numItemsToSplit variable. We can use that to loop through the items being split off and copy them into the new array. The items will go into index 0 through numItemsToSplit - 1 of the new array. In the original array, the items we want to copy start at index index. So we need to offset the loop counter by index to get the correct item from the original array.

🔗

Listing 23.14.4.

🔗

template<typename T>
ArrayList<T> ArrayList<T>::splitAt(int index) {
   ... same as before  ...

    // copy items from index to the end into rear's array
    for (int i = 0; i < numItemsToSplit; ++i) {
        int oldArrayIndex = index + i;
        rear.m_arr[i] = m_arr[oldArrayIndex];
    }

    ... same as before ...
}

That implementation does all the required work. The only obvious “extra” work is possibly having to allocate a new array if the default constructor didn’t allocate enough space. But, unless we go implement a constructor with a size parameter, we will need to do that work in this function.

🔗

Aside from implementing new functions, can we use any existing functions to simplify this implementation? Maybe we could use removeAt and insertAt to move items. Doing that could look as simple as:

🔗

Listing 23.14.5.

🔗

template<typename T>
ArrayList<T> ArrayList<T>::splitAt(int index) {
    ArrayList<T> rear;
    int numItemsToSplit = m_size - index;
    for(int i = 0; i < numItemsToSplit; ++i) {
        int value = m_arr[index]; // get the value at the split point
        rear.insertEnd(value);
        removeAt(index); // remove the item at the split point from "this"
    }
    return rear;
}

This is certainly simpler to read. And it also is doing less new work, which means fewer opportunities for new bugs. We don’t even have to worry about the capacity as the insertEnd function will take care of growing if we need to!

🔗

However, it is doing a lot of extra work. It keeps removing the item at the split point, which means that every time we remove an item, all the items after it have to shift down. Given our example list of {10, 20, 30, 40, 50}, the first time we remove at index 2, we take the have to shift 40 and 50 down so that {10, 20, 40, 50} are compact in memory. The second time we remove at index 2, we have to shift 50 down. It would be even worse if we were splitting an ArrayList with 1,000 items at index 2 (997 items would need to shift, then 996, then 995, etc...)

🔗

It seems like we are doing a lot of extra work shifting items around that we don’t need to do if we just copy the items directly into the new array. (We will develop a language to quantify this extra work in Chapter 25.) So, let’s avoid using removeAt. We can just not remove during the loop and go back to changing the logical size after the loop is complete.

🔗

insertEnd is not as problematic. Because it always adds to the end of the ArrayList, we don’t have to do any shifting of existing items. Yes, it might need to grow a few times, but unless we build a way to create the new ArrayList with the correct capacity from the start, we will need to do that work somewhere.

🔗

Listing 23.14.6.

🔗

template<typename T>
ArrayList<T> ArrayList<T>::splitAt(int index) {
    ArrayList<T> rear;
    int numItemsToSplit = m_size - index;
    for(int i = 0; i < numItemsToSplit; ++i) {
        int oldArrayIndex = index + i;
        int value = this->m_arr[oldArrayIndex];
        rear.insertEnd(value);
    }
    m_size = index; // "remove" the items from the original ArrayList
    return rear;
}

Insight 23.14.1.

Great code is both efficient and easy to understand. But sometimes those two goals are at odds with each other. In that case, you have to strike a balance or decide which goal is more important for a given situation.

🔗

Checkpoint 23.14.1.

Design the code for a ArrayList<T> removeSlice(int index, int length) function for ArrayList. (ArrayList<T> is abbreviated as AL<T> in the code.)

🔗

It should remove a contiguous slice of items from the ArrayList starting at index. For example, if we have an ArrayList that contains {10, 20, 30, 40, 50, 60} and we call removeSlice(1, 3), we would remove and return a new list with {20, 30, 40}, leaving behind {10, 50, 60}.

🔗

We will build it without using removeAt by first copying items to the slicedItems array, and then moving each item after the slice down to fill the gap.

🔗

template<typename T> #tag:a; depends:;
---
AL<T> AL<T>::removeSlice(int index, int length) { #tag:b; depends:a;
---
  AL<T> slicedItems; #tag:c; depends:b;
---
  int endOfSliceIndex = index + length - 1; #tag:d; depends:b;
---
  for (int i = index; i <= endOfSliceIndex; ++i) { #tag:e; depends:b,d;
---
      slicedItems.insertEnd(m_arr[i]);
  } #tag:f; depends:e;
---
  for (int i = endOfSliceIndex + 1; i < m_size; ++i) { #tag:h; depends:f;
---
      m_arr[i - length] = m_arr[i];
  }  #tag:i; depends:h;
---
  m_size -= length; #tag:k; depends:i;
---
  return slicedItems; #tag:l; depends:;
---
} // end of function #tag:m; depends:;

🔗

You have attempted of activities on this page.

🔗

Prev Top Next