14.2. Get news links from faculty webpages¶
Let’s say that you want to get the link to the first news article on your favorite umsi faculty’s webpages.
But clicking through to gather all those links would be a pain. Fortunately, we can do that task with BeautifulSoup!
Run the code below to see what it collects.
This code is made up of three plans. Click on each of the plans below to learn more about it.
Plan 3: Get a soup from multiple URLs# Load libraries for web scraping from bs4 import BeautifulSoup import requests # Get a soup from multiple URLs base_url = 'https://web.archive.org/web/20230128074139/https://www.si.umich.edu/people/' endings = ['barbara-ericson', 'steve-oney', 'paul-resnick'] for ending in endings: url = base_url + ending r = requests.get(url) soup = BeautifulSoup(r.content, 'html.parser')
Plan 4: Get info from a single tag# Get first tag of a certain type from the soup tag = soup.find('a', class_='item-teaser--heading-link') # Get info from tag info = tag.get('href')
Plan 9: Print info# Print the info print(info)
You have attempted of activities on this page