14.15. Multiple Choice Questions¶

Q-1: Given the below html, how would this tag type be described in web scraping code?

<h1 class='sports'>Sports News</h1>

h1
Try again! Each tag must be in quotes and this answer does not mention the class attribute.
h1, class='sports'
Try again! Each tag must be in quotes and the class has to be followed by an underscore.
h1, class_='sports'
Try again! Each tag must be in quotes.
'h1', class_='sports'
Correct! Both the tag and attribute are important. The h1 tag needs to be in quotes, and class has to be followed by an underscore.

Q-2: Which line of code correctly gets the first item in items and makes the most sense following the below code snippet?

soup = BeautifulSoup(response.content, 'html.parser')
items = soup.find_all(class_='items')

first_item = items[0]
Correct! Given that soup.find_all(class_='items') returns a list, in order to get the first item, all you need to do is index.
first_item = items.find(0)
Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
first_item = items.get(0)
Try again! Since soup.find_all(class_='items') returns a list, we cannot use get() as it is a dictionary method used to return the value of an item with specified key.
first_item = items.find[0]
Try again! Since soup.find_all(class_='items') returns a list, we cannot use find() as it is a string method that returns the first instance of a specified value in a string.
first_item = soup.items[0]
Try again! We already called the soup object to get items so all we need to do is index to the first item.

Q-3: How does one parse the HTML into a BeautifulSoup object given a response object?

soup = BeautifulSoup(response.text, 'html.parser')
Correct! It is the correct way to parse content in UniCode.
soup = BeautifulSoup(response.content, 'html.parser')
Correct! It is the correct way to parse content in bytes.
soup = BeautifulSoup(response.string, 'html.parser')
Try again! .string returns None if there is more than one tag inside of the ``response`` object.

Q-4: Which of the following is the best way to get the value for the id in the first p tag?

soup.p.get('id')
Try again! If there is no tag 'id', this line will return an error.
soup.p.get('id', None)
Correct! This is the correct way to get the first p tag and get the value for the id in the p tag.
soup.p[id]
Try again! The correct way to find a tag is to use the get method().
soup.p['id']
Try again! The correct way to find a tag is to use the get method().

Q-5: How does one get the first header 1 tag after creating a soup object?

soup.h1
Correct! The header 1 tag is h1, and this is the correct way to get the first header 1 tag after creating a soup object.
soup.header1
Try again! There is no tag called header1.
soup.h1[0]
Try again! h1 is a tag and index 0 will not give the correct output.
soup.h1[1]
Try again! h1 is a tag and index 1 will not give the correct output.

Q-6: Which of the following gets the first link tag and returns a dictionary of all attributes and values for that link tag?

soup.a.attributes
Try again! Attributes is not the correct way to get a dictionary of all attributes and values for a tag.
soup.link.attrs
Try again! There is no tag 'link', instead we use tag 'a' to find links.
soup.a.attrs
Correct! This is the correct way to get the first link tag (soup.a) and get a dictionary of all attributes and values for that link tag (.attrs).
soup.link.attributes
Try again! There is no tag 'link', instead we use tag 'a' to find links. Attributes is not the correct way to get a dictionary of all attributes and values for a tag.

Q-7: Which of the following finds all link tags?

all_links = soup.find('a')
Try again! This will only find the first link.
all_links = soup.findall('a')
Try again! For Beautiful Soup, find_all requires an underscore.
all_links = soup.findall('link')
Try again! For Beautiful Soup, find_all requires an underscore. There is no tag called 'link'.
all_links = soup.find_all('a')
Correct! This is the correct way to find all link tags. In HTML, link tags are 'a' tags. For Beautiful Soup, find_all requires an underscore.
all_links = soup.find_all('link')
Try again! There is no tag called 'link', instead we use tag 'a' to get links.

Q-8: Which of the following finds all paragraph tags with class b-soup?

all_links = soup.find_all('p', class='b-soup')
Try again! To find a class in Beautiful Soup, it requires an underscore (class_).
all_links = soup.find_all('paragraph', class='b-soup')
Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs. Also, to find a class in Beautiful Soup, it requires an underscore (class_).
all_links = soup.find_all('p', class_='b-soup')
Correct! This is the correct way to find all paragraph tags. In HTML, paragraph tags are 'p' tags. For Beautiful Soup, to find a class, class requires an underscore (class_).
all_links = soup.find_all('paragraph', class_='b-soup')
Try again! There is no tag called 'paragraph', instead we use tag 'p' to find paragraphs.

Q-9: After creating an empty dictionary and getting a list of all link tags, how does one put the link_tag text as keys and the link_tag href attribute as values for the dictionary?

loop through the elements of the list and do dictionary[link_tag.text] = a.get('href', None)
Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag.
loop through the elements of the list and do dictionary[link_tag.text] = a['href']
Try again! Although the 'a' tag is the link tag, the variable that contains the href attribute is link_tag. Also, using the format tag['attribute_name'] will cause an error if the tag is not there.
loop through the elements of the list and do dictionary[link_tag.text] = link_tag.get('href', None)
Correct! This is the correct way to create a dictionary with link_tag text as keys and href as values. Using .get('attribute_name', None) will not cause an error. It will set None as the default value and grab the value if there is one.
loop through the elements of the list and do dictionary[link_tag.text] = link_tag[href]
Try again! The attribute name is missing quotation marks, and using the format tag['attribute_name'] will cause an error if the tag is not there.

Q-10: Given the below html, after importing re, what will be returned after for tag in soup.find_all(re.compile("t")): print(tag.name) is run?

<html>
   <head>
      <title>Site</title>
   </head>
   <body>
      <p>There is lots of content.</p>
   </body>
</html>

html
Correct! It returns html as it is the name of a tag that contains the letter 't'.
title
Correct! It returns title as it is the name of a tag that contains the letter 't'.
Site
Try again! 'Site' is not a tag.
There is lots of content.
Try again! This isn't a tag. This is the content inside a 'p' tag.

Q-11: What does the following block of code do?

url = "https://www.nytimes.com"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

retrieves and displays the webpage
Try Again! This does not display the webpage. BeautufulSoup parses webpage retrieved by urllib.rquest.
parses the html content of the "https://www.nytimes.com" webpage.
Correct! This parses all html tags and contents of the webpage.
downloads the webpage
Try Again! This does not save files to the computer.

Q-12: What does the following block of code print?

url = "https://www.nytimes.com/"
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

tags = soup('img')
for tag in tags:
    print(tag.get('src', None))

retrieves and displays the webpage
Try Again! Urllib retrieves the webpage but does not display it.
downloads the webpage
Try Again! This does not save files to the computer.
prints the images from 'www.nytimes.com'
Try Again! BeautifulSoup and html.parser cannot display images
prints all the 'img' sources under 'src' from 'www.nytimes.com'
Correct! It prints out the image sources listed under 'src' of 'img' tags.

You have attempted of activities on this page