DEV Community

Brayan Vasquez
Brayan Vasquez

Posted on

Create PDFs using Python and xhtml2pdf

Hi Everyone!. In this post, I want to share with you a little guide that will show you how to create pdf files using Python and xhtml2pdf.

The xhtml2pdf lib is used to create pdf files from HTML files. It's just a guide that I made for myself, but I want to share it with you.

WARNING: Also, this is a guide I did with experimental purposes, I didn't use it in a production environment. Therefore, you must have in mind that its use in production could have issues. A better alternative to create PDF files is http://weasyprint.org/.

Requirements

  • Python (I use Python 3.8.2)
  • Pip (I use Pip 20.1.1)

Setup Project

To create our project we are going to use virtualenv, to create an isolated python environment for this project. However, you can also use pyenv or venv.

So, first, we have to install venv.



pip3 install virtualenv


Enter fullscreen mode Exit fullscreen mode

Now, we have to create the project folder and set up the virtualenv.



# Creating a project folder
mkdir pdfs-example
cd pdfs-example

# Creating the virtual environment
virtualenv env

# Activate the virtual environment
source env/bin/activate

# Create our main file
touch main.py


Enter fullscreen mode Exit fullscreen mode

NOTE: To exit the environment you just have to write deactivate.

Install dependencies

To create our PDF files we need to install the xhtml2pdf library. This library, also depends on html5lib and reportlab.



pip install reportlab # https://pypi.org/project/reportlab/
pip install html5lib # https://pypi.org/project/html5lib/
pip install xhtml2pdf


Enter fullscreen mode Exit fullscreen mode

NOTE: We need an xhtml2pdf version higher than 0.1a1 to work with Python3.

We can see the installed dependencies with the following command.



# Installed dependencies
pip freeze

# The above mentioned command will list something like the following
html5lib==1.1
Pillow==7.2.0
PyPDF2==1.26.0
reportlab==3.5.50
six==1.15.0
webencodings==0.5.1
xhtml2pdf==0.2.4


Enter fullscreen mode Exit fullscreen mode

We can also export our dependencies.



pip freeze > requirements.txt


Enter fullscreen mode Exit fullscreen mode

And install our dependencies from a requirements.txt file.



pip install -r requirements.txt


Enter fullscreen mode Exit fullscreen mode

Generate PDF from string

Now that we have the necessary modules installed, we can start writing code. First, we must import the xhtml2pdf module, which will help us to create our PDF files.



# main.py
# import section ....
from xhtml2pdf import pisa             # import python module
# ....


Enter fullscreen mode Exit fullscreen mode

Now, we can define some constants.



# main.py

# Constants section ....
# Content to write in our PDF file.
SOURCE = "<html><body><p>PDF from string</p></body></html>"

# Filename for our PDF file.
OUTPUT_FILENAME = "test.pdf"
# ....


Enter fullscreen mode Exit fullscreen mode

Ok, We will create a base function to reuse in the other functions and avoid code duplication.



# main.py

# Methods section ....
def html_to_pdf(content, output):
    """
    Generate a pdf using a string content

    Parameters
    ----------
    content : str
        content to write in the pdf file
    output  : str
        name of the file to create
    """
    # Open file to write
    result_file = open(output, "w+b") # w+b to write in binary mode.

    # convert HTML to PDF
    pisa_status = pisa.CreatePDF(
            content,                   # the HTML to convert
            dest=result_file           # file handle to recieve result
    )           

    # close output file
    result_file.close()

    result = pisa_status.err

    if not result:
        print("Successfully created PDF")
    else:
        print("Error: unable to create the PDF")    

    # return False on success and True on errors
    return result

# ....


Enter fullscreen mode Exit fullscreen mode

Once we have the base function we can create our from_text function.



# main.py

# Methods section ....
def from_text(source, output):
    """
    Generate a pdf from a plain string

    Parameters
    ----------
    source : str
        content to write in the pdf file
    output  : str
        name of the file to create
    """
    html_to_pdf(source, output)

# ....


Enter fullscreen mode Exit fullscreen mode

Our main function will be the following.



# main.py
# import section ....
import sys

# Main section ...
if __name__ == "__main__":
    if len(sys.argv)> 1 :
        if sys.argv[1] == '--help':
            print('Info: ')
            print('--help List the options to send an email')
            print('--text Create a PDF file from a string')
            print('--template Create a PDF file from a template')
        elif sys.argv[1] == '--text':
            print("Creating a PDF file from a string")
            from_text(SOURCE, OUTPUT_FILENAME)
    else:
        print("Please give the type of message to send.")
        print("For help execute `python main.py --help`")


Enter fullscreen mode Exit fullscreen mode

We can test our function by executing the following command in our terminal.



python main.py --text

# Creating a PDF file from a string
# Successfully created PDF


Enter fullscreen mode Exit fullscreen mode

Alt Text

Generate PDF from template

Here, we will generate a PDF file using an HTML template. We have to keep in mind that xhtml2pdf supports until HTML4. So, first, we have to create an HTML file that will behave as a template for our PDF file.



touch template.html


Enter fullscreen mode Exit fullscreen mode

And we will define a simple html template.



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>PDF Generator</title>
</head>
<body>
    <h1 style="color:red;">First PDF</h1>
    <h2 style="color:blue;">PDF with html template</h2>
    <p>John</p>
    <p>Snow</p>
    <p>35</p>
</body>
</html>


Enter fullscreen mode Exit fullscreen mode

Create a new constant to define our template file.



# main.py

# Constants section ....
# Template file name
TEMPLATE_FILE = "template.html"
# ....


Enter fullscreen mode Exit fullscreen mode

Now, We can create our function to read the template and create the PDF file.



# main.py

# Methods section ....
def from_template(template, output):
    """
    Generate a pdf from a html file

    Parameters
    ----------
    source : str
        content to write in the pdf file
    output  : str
        name of the file to create
    """
    # Reading our template
    source_html = open(template, "r")
    content = source_html.read() # the HTML to convert
    source_html.close() # close template file

    html_to_pdf(content, output)

# ....


Enter fullscreen mode Exit fullscreen mode

Add the option to our main function.



# main.py

# Main section ...
if __name__ == "__main__":
    # ....

    if len(sys.argv)> 1 :
        # if ....
        elif sys.argv[1] == '--template':
            print("Creating a PDF file from a template")
            from_template(TEMPLATE_FILE, OUTPUT_FILENAME)
    else:
        # ....


Enter fullscreen mode Exit fullscreen mode

We can test our function by executing the following command in our terminal.



python main.py --template

# Creating a PDF file from a template
# Successfully created PDF


Enter fullscreen mode Exit fullscreen mode

Alt Text

Final Words

Thanks for reading this post and you can find the code of this guide here.

Top comments (1)

Collapse
 
krunalrana profile image
Krunal Rana • Edited

What about page bracking what if we have dynamic table length how you will hendle that in pdf page brack