14.15. Multiple Choice QuestionsΒΆ
- h1
- Try again! Each tag must be in quotes and this answer does not mention the class attribute.
- h1, class='sports'
- Try again! Each tag must be in quotes and the class has to be followed by an underscore.
- h1, class_='sports'
- Try again! Each tag must be in quotes.
- 'h1', class_='sports'
- Correct! Both the tag and attribute are important. The h1 tag needs to be in quotes, and class has to be followed by an underscore.
Q-1: Given the below html, how would this tag type be described in web scraping code?
<h1 class='sports'>Sports News</h1>
- first_item = items[0]
- Correct! Given that soup.find_all(class_='items') returns a list, in order to get the first item, all you need to do is index.
- first_item = items.find(0)
- Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
- first_item = items.get(0)
- Try again! Since soup.find_all(class_='items') returns a list, we cannot use get() as it is a dictionary method used to return the value of an item with specified key.
- first_item = items.find[0]
- Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
- first_item = soup.items[0]
- Try again! We already called the soup object to get items so all we need to do is index to the first item.
Q-2: Which line of code correctly gets the first item in items
and makes the most sense following the below code snippet?
soup = BeautifulSoup(response.content, 'html.parser')
items = soup.find_all(class_='items')
- soup = BeautifulSoup(response.text, 'html.parser')
- Correct! It is the correct way to parse content in UniCode.
- soup = BeautifulSoup(response.content, 'html.parser')
- Correct! It is the correct way to parse content in bytes.
- soup = BeautifulSoup(response.string, 'html.parser')
- Try again! .string returns None if there is more than one tag inside of the ``response`` object.
Q-3: How does one parse the HTML into a BeautifulSoup
object given a response
object?
- soup.p.get('id')
- Try again! If there is no tag 'id', this line will return an error.
- soup.p.get('id', None)
- Correct! This is the correct way to get the first p tag and get the value for the id in the p tag.
- soup.p[id]
- Try again! The correct way to find a tag is to use the get method().
- soup.p['id']
- Try again! The correct way to find a tag is to use the get method().
Q-4: Which of the following is the best way to get the value for the id
in the first p
tag?
- soup.h1
- Correct! The header 1 tag is h1, and this is the correct way to get the first header 1 tag after creating a soup object.
- soup.header1
- Try again! There is no tag called header1.
- soup.h1[0]
- Try again! h1 is a tag and index 0 will not give the correct output.
- soup.h1[1]
- Try again! h1 is a tag and index 1 will not give the correct output.
Q-5: How does one get the first header 1 tag after creating a soup
object?
- soup.a.attributes
- Try again! Attributes is not the correct way to get a dictionary of all attributes and values for a tag.
- soup.link.attrs
- Try again! There is no tag 'link', instead we use tag 'a' to find links.
- soup.a.attrs
- Correct! This is the correct way to get the first link tag (soup.a) and get a dictionary of all attributes and values for that link tag (.attrs).
- soup.link.attributes
- Try again! There is no tag 'link', instead we use tag 'a' to find links. Attributes is not the correct way to get a dictionary of all attributes and values for a tag.
Q-6: Which of the following gets the first link tag and returns a dictionary of all attributes and values for that link tag?
- all_links = soup.find('a')
- Try again! This will only find the first link.
- all_links = soup.findall('a')
- Try again! For Beautiful Soup, find_all requires an underscore.
- all_links = soup.findall('link')
- Try again! For Beautiful Soup, find_all requires an underscore. There is no tag called 'link'.
- all_links = soup.find_all('a')
- Correct! This is the correct way to find all link tags. In HTML, link tags are 'a' tags. For Beautiful Soup, find_all requires an underscore.
- all_links = soup.find_all('link')
- Try again! There is no tag called 'link', instead we use tag 'a' to get links.
Q-7: Which of the following finds all link tags?
- all_links = soup.find_all('p', class='b-soup')
- Try again! To find a class in Beautiful Soup, it requires an underscore (class_).
- all_links = soup.find_all('paragraph', class='b-soup')
- Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs. Also, to find a class in Beautiful Soup, it requires an underscore (class_).
- all_links = soup.find_all('p', class_='b-soup')
- Correct! This is the correct way to find all paragraph tags. In HTML, paragraph tags are 'p' tags. For Beautiful Soup, to find a class, class requires an underscore (class_).
- all_links = soup.find_all('paragraph', class_='b-soup')
- Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs.
Q-8: Which of the following finds all paragraph tags with class b-soup
?
- loop through the elements of the list and do dictionary[link_tag.text] = a.get('href', None)
- Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag.
- loop through the elements of the list and do dictionary[link_tag.text] = a['href']
- Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag. Also, using the format tag['attribute_name'] will cause an error if the tag is not there.
- loop through the elements of the list and do dictionary[link_tag.text] = link_tag.get('href', None)
- Correct! This is the correct way to create a dictionary with link_tag text as keys and href as values. Using .get('attribute_name', None) will not cause an error. It will set None as the default value and grab the value if there is one.
- loop through the elements of the list and do dictionary[link_tag.text] = link_tag[href]
- Try again! The attribute name is missing quotation marks, and using the format tag['attribute_name'] will cause an error if the tag is not there.
Q-9: After creating an empty dictionary and getting a list of all link tags, how does one put the link_tag
text as keys and the link_tag
href
attribute as values for the dictionary?
- html
- Correct! It returns html as it is the name of a tag that contains the letter 't'.
- title
- Correct! It returns title as it is the name of a tag that contains the letter 't'.
- Site
- Try again! 'Site' is not a tag.
- There is lots of content.
- Try again! This isn't a tag. This is the content inside a 'p' tag.
Q-10: Given the below html, after importing re, what will be returned after for tag in soup.find_all(re.compile("t")): print(tag.name)
is run?
<html>
<head>
<title>Site</title>
</head>
<body>
<p>There is lots of content.</p>
</body>
</html>
- retrieves and displays the webpage
- Try Again! This does not display the webpage. BeautufulSoup parses webpage retrieved by urllib.rquest.
- parses the html content of the "https://www.nytimes.com" webpage.
- Correct! This parses all html tags and contents of the webpage.
- downloads the webpage
- Try Again! This does not save files to the computer.
Q-11: What does the following block of code do?
url = "https://www.nytimes.com"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
- retrieves and displays the webpage
- Try Again! Urllib retrieves the webpage but does not display it.
- downloads the webpage
- Try Again! This does not save files to the computer.
- prints the images from 'www.nytimes.com'
- Try Again! BeautifulSoup and html.parser cannot display images
- prints all the 'img' sources under 'src' from 'www.nytimes.com'
- Correct! It prints out the image sources listed under 'src' of 'img' tags.
Q-12: What does the following block of code print?
url = "https://www.nytimes.com/"
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('img')
for tag in tags:
print(tag.get('src', None))