13.5. Retrieving web pages with urllib
¶
While we can manually send and receive data over HTTP using the socket
library, there is a much simpler way to perform this common task in
Python by using the urllib
library.
Using urllib
, you can treat a web page much like a file.
You simply indicate which web page you would like to retrieve and
urllib
handles all of the HTTP protocol and header details.
The equivalent code to read the romeo.txt file from the web
using urllib
is as follows:
Once the web page has been opened with urllib.urlopen
, we
can treat it like a file and read through it using a for
loop.
When the program runs, we only see the output of the contents of the
file. The headers are still sent, but the urllib
code
consumes the headers and only returns the data to us.
- True
- Try again.
- False
- The urllib code consumes the headers and only returns the data to us - the data being the contents.
Q-3: True or False? The program above shows the headers and contents of the file.
As an example, we can write a program to retrieve the data for
romeo.txt
and compute the frequency of each word in the
file as follows:
Again, once we have opened the web page, we can read it like a local file.
- True
- Correct! urllib makes it possible to treat a web page like a local file.
- False
- Try again.
Q-5: True or False? The urllib library opens a web page and allows it to be read like a file.