15.2. Parsing XML¶
Look at the code below and predict what will be printed.
Here is a simple application that parses some XML and extracts some data elements from the XML:
Run this to see what it prints.
The triple single quote ('''
), as well as the triple double quote ("""
),
allow for the creation of strings in Python that span multiple lines.
Calling fromstring
converts the string representation of
the XML into a “tree” of XML elements. When the XML is in a tree,
we have a series of methods we can call to extract portions
of data from the XML string. The find
function searches
through the XML tree and retrieves the element that matches
the specified tag. The get
gets the value of the attribute in that tag.
Using an XML parser such as ElementTree
has the advantage
that while the XML in this example is quite simple, it turns out there
are many rules regarding valid XML, and using ElementTree
allows us to extract data from XML without worrying about the rules of
XML syntax.
15.2.1. Using get
to get the value of an attribute¶
Why do we use get
to get the value of an attribute?
Look at the code below and predict what will be printed.
Run this to see what it prints.
Note
Just like with dictionaries we can use get
to get the value of an attribute and if the attribute isn’t there the default is to return None
.
15.2.2. Getting Data from the First Element of a Type in XML¶
You can use find
to get the first element of the XML of a specified type.
You can the use find
on that element to get children tags of that element.
Run the code to see what this prints.
What do you think would happen if we looked for the first ‘author’ in tree
rather than in book
? Modify the code to see what happens.
15.2.3. Fixing Errors in XML¶
If your XML has errors, what do you think will happen?
The following XML has errors. Try to run the code first to see what happens and then fix the XML so that the code runs correctly.
The following XML has errors. Try to run the code first to see what happens and then fix the XML so that the code runs correctly.