5.5. Numbers as Indices

Enough about movie budgets, it’s time to budget my time instead. Because I schedule my day to the minute, I like to be able to look up movies by their runtime, so that when I have a spare two hours and 34 minutes, I can find all the movies that would fit precisely in that time slot. (Popcorn-making time is budgeted separately).

Before you start, here is a refresher on the index operator in Pandas.

Selecting Columns of a DataFrame

Selecting Rows of a DataFrame

If you use an integer in any of the last four examples, it works just like the string, but the index values are numeric instead. What is important (and confusing) about this is that they use the index, not the position. So, if you create a data frame with 4 rows of some data, it will have an index that is created by default where the first row starts with 0, the next row is 1 and so on. However, if you sort the data frame such that the last row becomes the first and the first row becomes the last, using df.loc[0] on the sorted data frame will return the last row.

If you want to be strictly positional, you should use df.iloc[0], which will return the first row regardless of the index value. df.iloc[0:5] is the same as doing df.head(), and df.iloc[[1, 3, 5, 7]] will return four rows: the 2nd, 4th, 6th and 8th.

import pandas as pd
df = pd.DataFrame({'a':list("pythonrocks"), 'b':[1,2,3,4,5,6,7,8,9,10,11]})
df = df.set_index('a')
df.loc['p':'n']
b
a
p 1
y 2
t 3
h 4
o 5
n 6

OK, but what if we do this:

df.loc['p':'o']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 1
----> 1 df.loc['p':'o']

File ~/.local/lib/python3.11/site-packages/pandas/core/indexing.py:1153, in _LocationIndexer.__getitem__(self, key)
   1150 axis = self.axis or 0
   1152 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1153 return self._getitem_axis(maybe_callable, axis=axis)

File ~/.local/lib/python3.11/site-packages/pandas/core/indexing.py:1373, in _LocIndexer._getitem_axis(self, key, axis)
   1371 if isinstance(key, slice):
   1372     self._validate_key(key, axis)
-> 1373     return self._get_slice_axis(key, axis=axis)
   1374 elif com.is_bool_indexer(key):
   1375     return self._getbool_axis(key, axis=axis)

File ~/.local/lib/python3.11/site-packages/pandas/core/indexing.py:1405, in _LocIndexer._get_slice_axis(self, slice_obj, axis)
   1402     return obj.copy(deep=False)
   1404 labels = obj._get_axis(axis)
-> 1405 indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1407 if isinstance(indexer, slice):
   1408     return self.obj._slice(indexer, axis=axis)

File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:6602, in Index.slice_indexer(self, start, end, step)
   6558 def slice_indexer(
   6559     self,
   6560     start: Hashable | None = None,
   6561     end: Hashable | None = None,
   6562     step: int | None = None,
   6563 ) -> slice:
   6564     """
   6565     Compute the slice indexer for input labels and step.
   6566 
   (...)
   6600     slice(1, 3, None)
   6601     """
-> 6602     start_slice, end_slice = self.slice_locs(start, end, step=step)
   6604     # return a slice
   6605     if not is_scalar(start_slice):

File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:6825, in Index.slice_locs(self, start, end, step)
   6823 end_slice = None
   6824 if end is not None:
-> 6825     end_slice = self.get_slice_bound(end, "right")
   6826 if end_slice is None:
   6827     end_slice = len(self)

File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:6752, in Index.get_slice_bound(self, label, side)
   6750     slc = lib.maybe_booleans_to_slice(slc.view("u1"))
   6751     if isinstance(slc, np.ndarray):
-> 6752         raise KeyError(
   6753             f"Cannot get {side} slice bound for non-unique "
   6754             f"label: {repr(original_label)}"
   6755         )
   6757 if isinstance(slc, slice):
   6758     if side == "left":

KeyError: "Cannot get right slice bound for non-unique label: 'o'"

Pandas raises an error because there are two ‘o’s in the index. It doesn’t know which one you mean, first? last? If you argue it should use the last then consider the performance implications if this was a really large index? In that case it would be very time consuming to search the index for the last occurance.

On the other hand, if we sort the index then the last instance can be found quite quickly, and with a sorted index loc will work for this example.

df = df.sort_index()
df.loc['c':'o']
b
a
c 9
h 4
k 10
n 6
o 5
o 8

5.5.1. Practice Questions

Create a Series called time_scheduler that is indexed by runtime and has the movie’s title as its values. Note that you will need to use sort_index() in order to be able to look up movies by their duration. Base yourself on df rather than budget_df.

While you’re at it, remove any movie that is less than 10 minutes (you can’t get into it if it’s too short) or longer than 3 hours (who’s got time for that?).

Hint: You may have to use pd.to_numeric to force the runtimes to be numbers (instead of numbers in a string).

Here is a simpler example that shows the movies that are 7 minutes long

 import pandas as pd
 df = pd.read_csv("https://runestone.academy/ns/books/published/httlads/_static/movies_metadata.csv").dropna(axis=1, how='all')
time_scheduler = df.set_index('runtime')
time_scheduler = time_scheduler[['title', 'release_date']]
time_scheduler.loc[7].head()
title release_date
runtime
7.0 Balance 1989-01-01
7.0 Killer Bean 2: The Party 2000-08-08
7.0 The Employment 2008-01-01
7.0 Moscow Clad in Snow 1909-04-09
7.0 Paperman 2012-11-02

Now let’s find all those two-hour-and-34-minute movies.

But what is the 155th shortest movie in this collection?

Lesson Feedback

    During this lesson I was primarily in my...
  • 1. Comfort Zone
  • 2. Learning Zone
  • 3. Panic Zone
    Completing this lesson took...
  • 1. Very little time
  • 2. A reasonable amount of time
  • 3. More time than is reasonable
    Based on my own interests and needs, the things taught in this lesson...
  • 1. Don't seem worth learning
  • 2. May be worth learning
  • 3. Are definitely worth learning

Before you keep reading...

Making great stuff takes time and $$. If you appreciate the book you are reading now and want to keep quality materials free for other students please consider a donation to Runestone Academy. We ask that you consider a $10 donation, but if you can give more thats great, if $10 is too much for your budget we would be happy with whatever you can afford as a show of support.

    For me to master the things taught in this lesson feels...
  • 1. Definitely within reach
  • 2. Within reach if I try my hardest
  • 3. Out of reach no matter how hard I try
You have attempted 1 of 8 activities on this page