14.2. Get news links from faculty webpages¶
Let’s say that you want to get the link to the first news article on your favorite umsi faculty’s webpages.

But clicking through to gather all those links would be a pain. Fortunately, we can do that task with BeautifulSoup!
Run the code below to see what it collects.
Before you keep reading...
Making great stuff takes time and $$. If you appreciate the book you are reading now and want to keep quality materials free for other students please consider a donation to Runestone Academy. We ask that you consider a $10 donation, but if you can give more thats great, if $10 is too much for your budget we would be happy with whatever you can afford as a show of support.
This code is made up of three plans. Click on each of the plans below to learn more about it.
Plan 3: Get a soup from multiple URLs# Load libraries for web scraping from bs4 import BeautifulSoup import requests # Get a soup from multiple URLs base_url = 'https://web.archive.org/web/20230128074139/https://www.si.umich.edu/people/' endings = ['barbara-ericson', 'steve-oney', 'paul-resnick'] for ending in endings: url = base_url + ending r = requests.get(url) soup = BeautifulSoup(r.content, 'html.parser')
Plan 4: Get info from a single tag# Get first tag of a certain type from the soup tag = soup.find('a', class_='item-teaser--heading-link') # Get info from tag info = tag.get('href')
Plan 9: Print info# Print the info print(info)