DEV Community

Cover image for Read And Write ZIP Files Without Extracting Using zipfile Module
Sachin
Sachin

Posted on • Originally published at geekpython.in

Read And Write ZIP Files Without Extracting Using zipfile Module

How often do you work with the ZIP files in your day-to-day life?

If you ever worked with ZIP files, then you would know that a lot of files and directories are compressed together into one file that has a .zip file extension.

So, in order to read that files, we need to extract them from ZIP format.

In this tutorial, we will implement some Pythonic methods for performing various operations on ZIP files without even having to extract them.

For that purpose, we'll use Python's zipfile module to handle the process for us nicely and easily.

What is a ZIP file?

As mentioned above, a ZIP file contains one or more files or directories that have been compressed.

ZIP is an archive file format that supports lossless data compression.

Lossless compression means that the original data will be perfectly reconstructed from the compressed data without even losing any information.

If you wonder what is an archive file, then it is nothing but computer files that are composed of one or more files along with their metadata.

This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP.Source

ZIP-64_Internal_Layout

Illustration to show how the files are placed on the disk.Source

What is the need for a ZIP file?

ZIP files can be crucial for those who work with computers and deal with large digital information because it allows them to

  • Reduce the storage requirement by compressing the size of the files without the loss of any information.

  • Improve transfer speed over the network.

  • Accumulate all your related files into one archive for better management.

  • Provides security by encrypting the files.

How to manipulate ZIP files using Python?

Python provides multiple tools to manipulate ZIP files which include some low-level Python libraries such as lzma, bz2, zlib, tarfile, and many others that help in compressing and decompressing files using specific compression algorithms.

Apart from these Python has a high-level module called zipfile that helps us to read, write, create, extract, and list the content of ZIP files.

Python's zipfile

zipfile module does provide convenient classes and functions for reading, writing, and extracting the ZIP files.

But it does have limitations too like:

  • The data decryption process is slow because it runs on pure Python.

  • It can't handle the creation of encrypted ZIP files.

  • The use of multi-disk ZIP files isn't supported currently.

Opening ZIP files for Reading & Writing

zipfile has a class ZipFile that allows us to open ZIP files in different modes and works exactly as Python's open() function.

There are four types of modes available -

  • r: Opens a file in reading mode. Default

  • w: Writing mode.

  • a: Append to an existing file.

  • x: Create and write a new file.

ZipFile is also a context manager and therefore supports the with statement.Source

import zipfile

with zipfile.ZipFile("sample.zip", mode="r") as arch:
    arch.printdir()

.........
File Name                                             Modified             Size
document.txt                                   2022-07-04 18:13:36           52
data.txt                                       2022-07-04 18:17:30        37538
hello.md                                       2022-07-04 18:33:02         7064
Enter fullscreen mode Exit fullscreen mode

Here, we can see that all the files present in the sample.zip folder have been listed.

Inside ZipFile, the first argument we provided is the path of the file which is a string.

Then the second argument we provided is the mode. Reading mode is default whether you pass it or not it doesn't matter.

Then we called .printdir() on arch which holds the instance of ZipFile to print the table of contents in a user-friendly format

  • File Name

  • Modified

  • Size

Error Handling by using Try & Except

We are going to see how zipfile handles the exceptions using the BadZipFile class that provides an easily readable error.

# Provided valid zip file
try:
    with zipfile.ZipFile("sample.zip") as arch:
        arch.printdir()
except zipfile.BadZipFile as error:
    print(error)

.........
File Name                                             Modified             Size
document.txt                                   2022-07-04 18:13:36           52
data.txt                                       2022-07-04 18:17:30        37538
hello.md                                       2022-07-04 18:33:02         7064


# Provided bad zip file
try:
    with zipfile.ZipFile("not_valid_zip.zip") as arch:
        arch.printdir()
except zipfile.BadZipFile as error:
    print(error)

.........
File is not a zip file
Enter fullscreen mode Exit fullscreen mode

The first code block ran successfully and printed the contents of the sample.zip file because the ZIP file we provided was a valid ZIP file, whereas the error was thrown when we provided a bad ZIP file.

We can check if a zip file is valid or not by using is_zipfile function.

# Example 1
valid = zipfile.is_zipfile("bad_sample.zip")
print(valid)

.........
False

# Example 2
valid = zipfile.is_zipfile("sample.zip")
print(valid)

........
True
Enter fullscreen mode Exit fullscreen mode

Returns True if a file is a valid ZIP file otherwise returns False.

# Print content if a file is valid
if zipfile.is_zipfile("sample.zip"):
    with zipfile.ZipFile("sample.zip") as arch:
        arch.printdir()

else:
    print("This is not a valid ZIP format.")

.........
File Name                                             Modified             Size
document.txt                                   2022-07-04 18:13:36           52
data.txt                                       2022-07-04 18:17:30        37538
hello.md                                       2022-07-04 18:33:02         7064
Enter fullscreen mode Exit fullscreen mode

if zipfile.is_zipfile("bad_sample.zip"):
    with zipfile.ZipFile("sample.zip") as arch:
        arch.printdir()

else:
    print("This is not a valid ZIP format file.")

.........
This is not a valid ZIP format file.
Enter fullscreen mode Exit fullscreen mode

Writing the ZIP file

To open a ZIP file for writing, use write mode w.

If the file you are trying to write exists, then w will truncate the existing file and writes new content that you've passed in.

import zipfile

# Adding a file
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
    myzip.write('geek.txt')

    myzip.printdir()

........
File Name                                             Modified             Size
geek.txt                                       2022-07-05 14:52:01           85
Enter fullscreen mode Exit fullscreen mode

geek.txt will be added to the geekpython.zip which is created just now after running the code.

Adding multiple files

import zipfile

# Adding multiple files
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
    myzip.write('geek.txt')
    myzip.write('program.py')

    myzip.printdir()

........
File Name                                             Modified             Size
geek.txt                                       2022-07-05 14:52:01           85
program.py                                     2022-07-05 14:52:01          136
Enter fullscreen mode Exit fullscreen mode

Note: The file you are giving as an argument to .write should exist.

If you try to create a directory or pass a file that does not exist, it will throw a FileNotFoundError.

import zipfile

# Passing a non-existing directory
with zipfile.ZipFile('hello/geekpython.zip', 'w') as myzip:
    myzip.write('geek.txt')

.........
FileNotFoundError: [Errno 2] No such file or directory: 'hello/geekpython.zip'
Enter fullscreen mode Exit fullscreen mode

import zipfile

# Passing a non-existing file
with zipfile.ZipFile('geekpython.zip', 'w') as myzip:
    myzip.write('hello.txt')

........
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'hello.txt'
Enter fullscreen mode Exit fullscreen mode

Appending files to the existing ZIP archive

To append the files into an existing ZIP archive use append mode a.

import zipfile

# Appending files to the existing zip file
with zipfile.ZipFile('geekpython.zip', 'a') as myzip:
    myzip.write("index.html")
    myzip.write("program.py")

    myzip.printdir()

.........
File Name                                             Modified             Size
geek.txt                                       2022-07-05 14:52:00           85
index.html                                     2022-07-05 15:32:35          176
program.py                                     2022-07-05 14:52:01          136
Enter fullscreen mode Exit fullscreen mode

Reading Metadata

There are some methods that help us to read the metadata of ZIP archives.

  • .getinfo(filename): It returns a ZipInfo object that holds information about member file provided by filename.

  • .infolist(): Return a list containing a ZipInfo object for each member of the archive.

  • .namelist(): Return a list of archive members by name.

There is another function which is .printdir() that we already used.

import zipfile

with zipfile.ZipFile("geekpython.zip", mode="r") as arch:
    myzip = arch.getinfo("geek.txt")


print(myzip.filename)
>>> geek.txt

print(myzip.date_time)
>>> (2022, 7, 5, 14, 52, 0)

print(myzip.file_size)
>>> 85

print(myzip.compress_size)
>>> 85
Enter fullscreen mode Exit fullscreen mode

Extracting information about the files in a specified archive using .infolist()

import zipfile
import datetime

date = datetime.datetime

with zipfile.ZipFile("geekpython.zip", "r") as info:
    for arch in info.infolist():
        print(f"The file name is: {arch.filename}")
        print(f"The file size is: {arch.file_size} bytes")
        print(f"The compressed size is: {arch.compress_size} bytes")
        print(f"Date of creation: {date(*arch.date_time)}")

        print("-" * 15)

.........
The file name is: geek.txt
The file size is: 85 bytes
The compressed size is: 85 bytes
Date of creation: 2022-07-05 14:52:00
---------------
The file name is: index.html
The file size is: 176 bytes
The compressed size is: 176 bytes
Date of creation: 2022-07-05 15:32:34
---------------
The file name is: program.py
The file size is: 136 bytes
The compressed size is: 136 bytes
Date of creation: 2022-07-05 14:52:00
---------------
Enter fullscreen mode Exit fullscreen mode

Let's see some more methods

import zipfile
with zipfile.ZipFile("sample.zip", "r") as info:
    for arch in info.infolist():
        if arch.create_system == 0:
            system = "Windows"
        elif arch.create_system == 3:
            system = "UNIX"
        else:
            system = "Unknown"

        print(f"ZIP version: {arch.create_version}")
        print(f"Create System: {system}")
        print(f"External Attributes: {arch.external_attr}")
        print(f"Internal Attributes: {arch.internal_attr}")
        print(f"Comments: {arch.comment}")

        print("-" * 15)

.........
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
ZIP version: 20
Create System: Windows
External Attributes: 32
Internal Attributes: 1
Comments: b''
---------------
Enter fullscreen mode Exit fullscreen mode

.create_system returned an integer

  • 0 - for Windows

  • 3 - for Unix

Example for showing the use of .namelist()

import zipfile

with zipfile.ZipFile("geekpython.zip", "r") as files:
    for files_list in files.namelist():
        print(files_list)

.........
geek.txt
index.html
program.py
Enter fullscreen mode Exit fullscreen mode

Reading and Writing Member files

Member files are referred to as those files which are present inside the ZIP archives.

To read the content of the member file without extracting it, then we use .read(). It takes name which is the name of the file in an archive and pwd is the password used for the encrypted files.

import zipfile

with zipfile.ZipFile("geekpython.zip", "r") as zif:
    for lines in zif.read("intro.txt").split(b"\r\n"):
        print(lines)

.........
b'Hey, Welcome to GeekPython!'
b''
b'Are you enjoying it?'
b''
b"Now it's time, see you later!"
b''
Enter fullscreen mode Exit fullscreen mode

We've added .split() to print the stream of bytes into lines by using the separator /r/n and added b as a suffix because we are working on the byte object.

Other than .read(), we can use .open() which allows us to read, write and add a new file in a flexible way because just like open() function, it implements context manager protocol and therefore supports with statement.

import zipfile

with zipfile.ZipFile("sample.zip", "r") as my_zip:
    with my_zip.open("document.txt", "r") as data:
        for text in data:
            print(text)

.........
b'Hey, I a document file inside the sample.zip folder.\r\n'
b'\r\n'
b'Are you enjoying it.'
Enter fullscreen mode Exit fullscreen mode

We can use .open() with write mode w to create a new member file and write content to it, and then we can append it to the existing archive.

import zipfile

with zipfile.ZipFile("sample.zip", "a") as my_zip:
    with my_zip.open("file.txt", "w") as data_file:
        data_file.write(b"Hi, I am a new file.")

with zipfile.ZipFile("sample.zip", mode="r") as archive:
    archive.printdir()
    print("-" * 20)
    for line in archive.read("file.txt").split(b"\n"):
        print(line)

.........
File Name                                             Modified             Size
data.txt                                       2022-07-04 18:17:30        37538
hello.md                                       2022-07-04 18:33:02         7064
document.txt                                   2022-07-06 17:08:36           76
file.txt                                       1980-01-01 00:00:00           20
--------------------
b'Hi, I am a new file.'
Enter fullscreen mode Exit fullscreen mode

Extracting the ZIP archive

There are 2 methods to extract ZIP archive

  • .extractall() - which allows us to extract all members of the archive in the current working directory. We can also specify the path of the directory of our choice.
import zipfile

with zipfile.ZipFile("geekpython.zip", "r") as file:
    file.extractall("files")
Enter fullscreen mode Exit fullscreen mode

All the member files will be extracted into the folder named files in your current working directory. You can specify another directory.

  • .extract() - allows us to extract a member from the archive to the current working directory. You must keep one thing in mind you need to specify the full name of the member or it must be a ZipInfo object.
import zipfile
with zipfile.ZipFile("geekpython.zip", "r") as file:
    file.extract("hello.txt")
Enter fullscreen mode Exit fullscreen mode

hello.txt will be extracted from the archive to the current working directory. You can specify the output directory of your choice. You just need to specify path="output_directory/" as an argument inside the extract().

Creating ZIP files

Creating ZIP files is simply writing existing files.

# Creating archive using zipfile module

files = ["hello.txt", "geek.md", "python.txt"]

with zipfile.ZipFile("archive_created.zip", "w") as archive:
    for file in files:
        archive.write(file)
Enter fullscreen mode Exit fullscreen mode

or you can simply add files by directly specifying the full name.

import zipfile

with zipfile.ZipFile("another_archive.zip", "w") as archive:
    archive.write("hello.txt")
    archive.write("geek.md")
    archive.write("python.txt")
Enter fullscreen mode Exit fullscreen mode

Creating ZIP files using shutil

We can use shutil to make a ZIP archive and it provides an easy way of doing it.

The Shutil module helps in performing high-level file operations in Python.

import shutil

shutil.make_archive("archive", "zip", "files")
Enter fullscreen mode Exit fullscreen mode

Here archive is the file name that will be created as a ZIP archive, zip is the extension that will be added to the file name, and files is a folder whose data will be archived.

Unpacking the ZIP archive using shutil

import shutil

shutil.unpack_archive("archive.zip", "archive")
Enter fullscreen mode Exit fullscreen mode

Here archive.zip is the ZIP archive and archive is the name of the file to be given after the extraction.

Compressing ZIP files

Usually, when we use zipfile to make a ZIP archive, the result we get is actually uncompressed because by default it uses ZIP_STORED compression method.

It's like member files are stored in a container that is archived.

So, we need to pass an argument compression inside ZipFile.

There are 3 types of constants to compress files:

  • zipfile.ZIP_DEFLATED - requires a zlib module and compression method is deflate.

  • zipfile.ZIP_BZIP2 - requires a bz2 module and the compression method is BZIP2.

  • zipfile.ZIP_LZMA - requires a lzma module and the compression method is LZMA.

import zipfile
with zipfile.ZipFile("compressed.zip", "w", compression=zipfile.ZIP_DEFLATED) as archive:
    archive.write("geek.md")
    archive.write("python.txt")
    archive.write("hello.txt")
Enter fullscreen mode Exit fullscreen mode

import zipfile
with zipfile.ZipFile("bzip_compressed.zip", "w", compression=zipfile.ZIP_BZIP2) as archive:
    archive.write("geek.md")
    archive.write("python.txt")
    archive.write("hello.txt")
Enter fullscreen mode Exit fullscreen mode

We can also add a compression level. We can give a value between 0 to 9 for maximum compression.

import zipfile

with zipfile.ZipFile("max_compressed.zip", "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9) as archive:
    archive.write("geek.md")
    archive.write("python.txt")
    archive.write("hello.txt")
Enter fullscreen mode Exit fullscreen mode

Did you know that zipfile can run from the command line?

Run zipfile from Command Line Interface

Here are some options which allow us to list, create, and extract ZIP archives from the command line.

-l or --list: List files in a zipfile.

python -m zipfile -l data.zip

.........
File Name                                             Modified             Size
Streamlit-Apps-master/                         2022-06-30 23:31:36            0
Streamlit-Apps-master/Covid-19.csv             2022-06-30 23:31:36         1924
Streamlit-Apps-master/Covid-Banner.png         2022-06-30 23:31:36       538140
Streamlit-Apps-master/Procfile                 2022-06-30 23:31:36           40
Streamlit-Apps-master/Readme.md                2022-06-30 23:31:36          901
Streamlit-Apps-master/WebAppPreview.png        2022-06-30 23:31:36       145818
Streamlit-Apps-master/app.py                   2022-06-30 23:31:36         3162
Streamlit-Apps-master/requirements.txt         2022-06-30 23:31:36           46
Streamlit-Apps-master/setup.sh                 2022-06-30 23:31:36          220
Enter fullscreen mode Exit fullscreen mode

It just works like .printdir().


-c or --create: Create zipfile from source files.

python -m zipfile --create shell.zip python.txt hello.txt
Enter fullscreen mode Exit fullscreen mode

It will create a ZIP archive named shell.zip and add the file names specified above.

Creating a ZIP file to archive the entire directory

python -m zipfile --create directory.zip source/

python -m zipfile -l directory.zip

.........
File Name                                             Modified             Size
source/                                        2022-07-07 17:22:42            0
source/archive/                                2022-07-07 10:58:06            0
source/archive/hello.txt                       2022-07-07 10:58:06           62
source/archive/index.html                      2022-07-07 10:58:06          176
source/archive/intro.txt                       2022-07-07 10:58:06           86
source/archive/program.py                      2022-07-07 10:58:06          136
source/geek.md                                 2022-07-07 15:50:28           45
source/hello.txt                               2022-07-07 12:21:22           61
Enter fullscreen mode Exit fullscreen mode

-e or --extract: Extract zipfile into the target directory.

python -m zipfile --extract directory.zip extracted/
Enter fullscreen mode Exit fullscreen mode

directory.zip will be extracted into the extracted directory.


-t or --test: Test whether the zipfile is valid or not.

python -m zipfile --test bad_sample.zip

.........
Traceback (most recent call last):
...
BadZipFile: File is not a zip file
Enter fullscreen mode Exit fullscreen mode

Conclusion

Phew, that was a long module to cover, and this article still haven't covered everything.

However, it is sufficient to get started with the zipfile module and manipulate ZIP archives without extracting them.

ZIP files do have some benefits like they save disk storage and faster transfer speed over a network and more.

We certainly learned some useful operations that we can perform on ZIP archives with the zipfile module, such as:

  • Read, write, and extract the existing ZIP archives

  • Reading the metadata

  • Creating ZIP archives

  • Manipulating member files

  • Running zipfile from command line


🏆Other articles you might be interested in if you liked this one

✅What is so special about Python generators and how they work?

How to convert bytes into a string in Python?

Understanding the different uses of asterisk(*) in Python.

Different ways to display web and local images in Jupyter Notebook?

How to access list items within the dictionary in Python?

What is the difference between sort() and sorted() in Python?

How to use super() function in Python classes?

What are context managers in Python?


That's all for now

Keep Coding✌✌

Top comments (4)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.