13.16. Multiple Choice Questions¶
- urllib
- urlib is a python library that contains several modules with URLs
- bs4
- bs4 is a python library pulling out data from HTML files.
- HTTP
- HTTP is a network protocol that is used to transmit different documents like HTML.
- GET
- GET is a HTTP request method from a specified resource in a server.
Q-1: What protocol can be used to retrieve web pages using python?
- socket
- A single socket is a program that can be used to send and receive data in a network.
- port
- A port represents an endpoint on a computer that can connect to different network nodes.
- http
- HTTP is a protocol used for transfer data from a web server.
- protocol
- protocol is a set of rules that determine how data is transmitted over a network.
Q-2: What provides two way communication between two different programs in a network.
- http
- http is a protocol and not a python library
- urllib
- urllib can be used to send and receive data over HTTP instead of manually doing it using a webbrowser.
- port
- port is an endpoint for a device to connect with other devices in a network to transmit similar types of data.
- header
- a header is additional information sent and received along with data.
Q-3: What is a python library that can be used to send and receive data over HTTP?
- scrape
- Scrape is the act of extraction of webpages
- parse
- Parse is breaking down scraped webpages to useful data
- BeautifulSoup
- BeautifulSoup is a python library for extracting HTML documents
- spider
- spider retrieves a webpage and then all the webpages linked to it to form a search index.
Q-4: What is the process by which search engines retrieve webpages and build a search index called?
- It sends a request to extract 'romeo.txt' from 'data.pr4e.org'
- this sends a GET request to the webserver over port 80
- It sends the 'romeo.txt' file to 'data.pr4e.org'
- This does not send a file to the webserver.
- It creates a file named 'romeo.txt'
- This does not create a file
- It throws an error because a socket cannot use HTTP.
- sockets can be used to connect with different types of servers using different protocols.
Q-5: What does the following block of code do?
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)
- It creates a file named 'romeo.txt' in 'data.pr4e.org'
- urllib.request cannot create files in a web server.
- It finds the urls linked to 'data.pr4e.org' and prints it.
- urllib.request is not a spider.
- It opens a file named 'http://data.pr4e.org/romeo.txt' in local storage
- urllib.request does not handle files in local storage
- It prints the contents of 'romeo.txt' after retrieving it from 'data.pr4e.org'
- urllib.request requests the file and then accepts it.
Q-6: What does the following block of code do?
import urllib.request
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
print(line.decode().strip())
- It retrieves 'cover3.jpg' and saves it to your computer.
- Running the code does not display any output because it saves the file to your computer.
- It displays the image 'cover3.jpg'.
- It does not output anything on the screen.
- It retrieves the url to download 'cover3.jpg'
- The urllib retrieves the file and parses it.
Q-7: What does the following block of code do?
import urllib.request, urllib.parse, urllib.error
img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg').read()
fhand = open('cover3.jpg', 'wb')
fhand.write(img)
fhand.close()
- Exact match to 'http[s]?://.+?'
- The regex uses wildcard characters and is not an exact match case.
- 'http://' or 'http[s]://' followed by one or more character
- the square brackets denotes a character class with 0 or 1 's'.
- 'http://' or 'https://' followed by one or more characters.
- the '[s]?' means 0 or 1 s and '.+?' means 1 or more characters
- 'https://' followed by one or more characters.
- the regex also accepts 'http://' because '[s]?' means 'http' followed by 0 or 1 's'
Q-8: What does the following regex match?
http[s]?://.+?
- retrieves and displays the webpage
- This does not display the webpage. BeautufulSoup parses webpage retrieved by urllib.rquest
- parses the html content of the "https://www.nytimes.com" webpage.
- This parses all html tags and contents of the webpage.
- downloads the webpage
- This does not save files to the computer
Q-9: What does the following block of code do?
url = "https://www.nytimes.com"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
- retrieves and displays the webpage
- urllib retrieves the webpage but does not display it
- downloads the webpage
- this does not save files to the computer
- prints the images from 'www.nytimes.com'
- BeautifulSoup and html.parser cannot display images
- prints all the 'img' sources under 'src' from 'www.nytimes.com'
- it prints out the image sources listed under 'src' of 'img' tags.
Q-10: What does the following block of code print?
url = "https://www.nytimes.com/"
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('img')
for tag in tags:
print(tag.get('src', None))
You have attempted of activities on this page