17.7. đ©âđ» Extracting from Nested Data¶
A common problem, especially when dealing with data returned from a web site, is to extract certain elements from deep inside a nested data structure. In principle, thereâs nothing more difficult about pulling something out from deep inside a nested data structure: with lists, you use [] to index or a for loop to get them them all; with dictionaries, you get the value associated with a particular key using [] or iterate through all the keys, accessing the value for each. But itâs easy to get lost in the process and think youâve extracted something different than you really have. Because of this, we have created a usable technique to help you during the debugging process.
Follow the system described below and you will have success with extracting nested data. The process involves the following steps:
Understand the nested data object.
Extract one object at the next level down.
Repeat the process with the extracted object
Understand. Extract. Repeat.
To illustrate this, we will walk through extracting information from data formatted in a way that itâs return by the Twitter API. This nested dictionary results from querying Twitter, asking for three tweets matching âUniversity of Michiganâ. As youâll see, itâs quite a daunting data structure, even when printed with nice indentation as itâs shown below.
17.7.1. Understand¶
At any level of the extraction process, the first task is to make sure you understand the current object you have extracted. There are few options here.
Print the entire object. If itâs small enough, you may be able to make sense of the printout directly. If itâs a little bit larger, you may find it helpful to âpretty-printâ it, with indentation showing the level of nesting of the data. We donât have a way to pretty-print in our online browser-based environment, but if youâre running code with a full Python interpreter, you can use the dumps function in the json module. For example:
import json
print(json.dumps(res, indent=2))
If printing the entire object gives you something thatâs too unwieldy, you have other options for making sense of it.
Copy and paste it to a site like https://jsoneditoronline.org/ which will let you explore and collapse levels
Print the type of the object.
- If itâs a dictionary:
print the keys
- If itâs a list:
print its length
print the type of the first item
print the first item if itâs of manageable size
17.7.2. Extract¶
In the extraction phase, you will be diving one level deeper into the nested data.
If itâs a dictionary, figure out which key has the value youâre looking for, and get its value. For example:
res2 = res['statuses']
If itâs a list, you will typically be wanting to do something with each of the items (e.g., extracting something from each, and accumulating them in a list). For that youâll want a for loop, such as
for res2 in res
. During your exploration phase, however, it will be easier to debug things if you work with just one item. One trick for doing that is to iterate over a slice of the list containing just one item. For example,for res2 in res[:1]
.
17.7.3. Repeat¶
Now youâll repeat the Understand and Extract processes at the next level.
17.7.3.1. Level 2¶
First understand.
Itâs a list, with three items, so itâs a good guess that each item represents one tweet.
Now extract. Since itâs a list, weâll want to work with each item, but to keep things manageable for now, letâs use the trick for just looking at the first item. Later weâll switch to processing all the items.
17.7.3.2. Level 3¶
First understand.
Then extract. Letâs pull out the information about who sent each of the tweets. Probably thatâs the value associated with the âuserâ key.
Now repeat.
17.7.3.3. Level 4¶
Understand.
Extract. Letâs print out the userâs screen name and when their account was created.
Now, we may want to go back and have it extract for all the items rather than only the first item in res2.
17.7.3.4. Reflections¶
Notice that each time we descend a level in a dictionary, we have a [] picking out a key. Each time we look inside a list, we will have a for loop. If there are lists at multiple levels, we will have nested for loops.
Once youâve figured out how to extract everything you want, you may choose to collapse things with multiple extractions in a single expression. For example, we could have this shorter version.
Even with this compact code, we can still count off how many levels of nesting we have extracted from, in this case four. res[âstatusesâ] says we have descended one level (in a dictionary). for res3 in⊠says we have descended another level (in a list). [âuserâ] is descending one more level, and [âscreen_nameâ] is descending one more level.