DEV Community

Cover image for How to merge PDF files using the PyPDF2 module in python
Kojo Ben
Kojo Ben

Posted on

How to merge PDF files using the PyPDF2 module in python

Have you ever had multiple PDF files that you need to merge into one single document? It is easier than you might think to merge or combine two or more PDF's into one single file in python using the PyPDF2 module.

PyPDF2 is a python library used to work with PDF files. You can use it to extract document information, split document page by page, merge multiple pages, encrypt and decrypt, etc. In this tutorial, you will learn how to merge multiple files using this module.

A program to merge multiple PDF files

You first need to install the package using pip:
pip install PyPDF2

Open any editor of your choice and create a new file "pdfMerger.py". Make sure the PDF files to be appended are in the same directory as the python file.

The following block of code allows you to merge two or more PDF files:

import PyPDF2 

mergeFile = PyPDF2.PdfFileMerger()

mergeFile.append(PyPDF2.PdfFileReader('file1.pdf', 'rb'))

mergeFile.append(PyPDF2.PdfFileReader('file2.pdf', 'rb'))

mergeFile.write("NewMergedFile.pdf")

Enter fullscreen mode Exit fullscreen mode

Line 1: Import the PdfFileReader class and PdfFileWriter class from the PyPDF2 module.

Line 2: Created an object of the PdfFileMerger class and assign it to mergeFile

Line 3 and 4: Used the append method to concatenate all pages onto the end of the file

Line 5: Writes all data that has been merged to NewMergedFile

The code block above looks very simple but what if you would like to merge more than two files? You would have to repeat line 3 for each file you want to add and this will make your program very long. You can use a for loop in this situation.
The following block of code is another way to merge mutliple PDF files

import PyPDF2 

def merge_pdfs(_pdfs):

    mergeFile = PyPDF2.PdfFileMerger()

    for _pdf in _pdfs:

        mergeFile.append(PyPDF2.PdfFileReader(_pdf, 'rb'))

    mergeFile.write("New_Merged_File.pdf")

if __name__ == '__main__':

    _pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

    merge_pdfs(_pdfs)

Enter fullscreen mode Exit fullscreen mode

Line 2: Define a function merge_pdfs which takes a list _pdfs as a parameter.

Line 4: A for loop to loop through the list _pdfs and concatenate the pages.

Line 7: Check if the python file is the main module or it's been imported.

Line 8: Specify the list of files

Line 9: Call the function

I hope you enjoyed this short and simple tutorial! 😎

Top comments (1)

Collapse
 
ahsanul111 profile image
Ahsanul Kabir

nice tutorial!
after merging the booksmarks of pdfs dont update to correct pagenumber,have u faced the issue?