13.6. Reading binary files using urllib
¶
Sometimes you want to retrieve a non-text (or binary) file such as an
image or video file. The data in these files is generally not useful to
print out, but you can easily make a copy of a URL to a local file on
your hard disk using urllib
.
The pattern is to open the URL and use read
to download the
entire contents of the document into a string variable
(img
) then write that information to a local file as
follows:
This program reads all of the data in at once across the network and
stores it in the variable img
in the main memory of your
computer, then opens the file cover.jpg
and writes the data
out to your disk. The wb
argument for open()
opens a binary file
for writing only. This program will work if the size of the file is less than
the size of the memory of your computer.
- True
- Correct! The 'wb' argument stands for Write Binary.
- False
- Try again!
Q-2: True or False? The wb
argument opens a binary file for writing only.
However if this is a large audio or video file, this program may crash or at least run extremely slowly when your computer runs out of memory. In order to avoid running out of memory, we retrieve the data in blocks (or buffers) and then write each block to your disk before retrieving the next block. This way the program can read any size file without using up all of the memory you have in your computer.
In this example, we read only 100,000 characters at a time and then
write those characters to the cover.jpg
file before
retrieving the next 100,000 characters of data from the web.
- It deletes any characters past the first 100,000.
- This will not delete characters.
- It only reads 100,000 characters then stops.
- This will continue to read more characters, not stop after 100,000.
- It only looks at 100,000 images.
- This line deals with the data characters of the images, not the images themselves.
- It limits how many characters are read at a time.
- read(100000) limits the number of characters read at a time to 100,000.
Q-4: What is the purpose of the line info = img.read(100000)
in the following code?
img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg')
fhand = open('cover3.jpg', 'wb')
size = 0
while True:
info = img.read(100000)
if len(info) < 1: break
size = size + len(info)
fhand.write(info)