13.13. Group Work on BeautifulSoup with Requests¶
It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home.
Note
If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Learning Objectives
Students will know and be able to do the following.
Content Objectives:
Import the necessary libraries
Use requests to get the HTML from a URL
Create a soup object from the HTML
Use
find
andfind_all
to get data from a soup objectUse
class_
to find data with a particular CSS classGet the text of a tag using
tag.text
Get the value for an attribute from a tag using
tag.get(attribute)
Process Objectives:
Put code in order.
Modify code to produce the correct output.
13.13.1. Getting a tag from a soup object¶
BeautifulSoup makes it easy to extract
the data you need from an HTML or XML page. It creates a soup object that
contains all the tags in the page. You can use find
or find_all
to find
either the first of a type of a tag or a list of a type of tag.
We will use the requests
library to get a response object from a URL,
create a BeautifulSoup
object from the HTML content in the response,
use find
to find the first paragraph tag, and then
print the first paragraph tag.
This will find and print the first paragraph tag from the Michigan Daily site. It will interpret the tag as HTML and show just the text of the tag.
The html.parser
is the HTML parser that is included in the standard Python 3 library.
It is used to parse the HTML into a tree of tags.
Information on other HTML parsers is available at:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
Put the following blocks in order to print the second paragraph from the Michigan Daily website. It uses the find_all
method on
BeautifulSoup to get a list of all of the paragraphs.
13.13.2. Getting text from a tag¶
Some tags have text like a paragraph tag or a span tag. You can use tagName.text
to get the text.
You can also find a tag with a particular CSS class.
This will print the text for the site description paragraph.
Note
When you specify a CSS class you must use class_
as the keyword. This is becuase class
is already
a keyword that is used to define a new class in Python.
Put the following blocks in order to print the text for span tag which is a child of a h3 tag with a class of css-1pjbq1w.