DEV Community

Cover image for How-to : HTML to PDF conversion using Python + QtWebEngine
Jeevachaithanyan Sivanandan
Jeevachaithanyan Sivanandan

Posted on

How-to : HTML to PDF conversion using Python + QtWebEngine

Converting an HTML page into a PDF file is frequently a crucial use case for businesses, particularly in tasks such as invoice and report generation.

Converting an HTML page into a PDF file is frequently a crucial use case for businesses, particularly in tasks such as invoice and report generation. However, relying on the print option from a web browser might not always be practical, especially when the PDF generation needs to occur as a background activity. Typically, this task is accomplished using tools like wkhtmltopdf or by running a headless Chrome browser in the background.

In this context, let's endeavor to create a basic solution that can convert a web page to a PDF. It's important to note that this implementation covers only fundamental features and is not intended for advanced functionalities.

We shall use the QtWebEngine which is a web rendering engine that can render the html page, and generate PDF files. We also use Qt for Python that offers the official Python bindings for Qt.

Before we get started, make sure you have Python installed on your system. Additionally, install the necessary Python packages using the following command:

pip install PySide6
Enter fullscreen mode Exit fullscreen mode

Importing Necessary Modules

Let's begin by importing the required modules. In your Python script, include the following lines:

import sys
from PySide6 import QtCore, QtWidgets, QtWebEngineCore, QtWebEngineWidgets, QtGui
Enter fullscreen mode Exit fullscreen mode

These modules provide the foundation for creating a headless browser and handling GUI components.

Defining the Conversion Function

Now, let's define the function. This function will take a URL and a PDF file name as parameters and automate the process of converting the web page to PDF.

def url_to_pdf(url, pdf):
    # Create a QApplication instance
    app = QtWidgets.QApplication(sys.argv)

    # Set desktop user agent string
    profile = QtWebEngineCore.QWebEngineProfile.defaultProfile()
    profile.setHttpUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")

    # Create the QWebEngineView with viewport size
    view = QtWebEngineWidgets.QWebEngineView()
    view.resize(1920, 1080)  # Adjust viewport size as needed
    page = QtWebEngineCore.QWebEnginePage(view)
Enter fullscreen mode Exit fullscreen mode

Within the function, customize the headless browser by setting the desktop user agent string and adjusting the viewport size

Define callback functions within the function to handle load and print events:

Callback function for handling print finished event
    def handle_print_finished(filename, status):
        print("finished", filename, status)
        app.quit()  # Quit the application after printing

    # Callback function for handling load finished event
    def handle_load_finished(status):
        if status:
            # Adjust print layout
            layout = QtGui.QPageLayout()  # Import from QtGui
            layout.setPageSize(QtGui.QPageSize.A4)  # Or desired page size
            layout.setOrientation(QtGui.QPageLayout.Landscape)  # If content is wider
            layout.setMargins(QtCore.QMarginsF(0, 0, 0, 0))  # Set zero margins
            page.printToPdf(pdf, layout)
        else:
            print("Failed to load page")
            app.quit()
Enter fullscreen mode Exit fullscreen mode

These functions will be triggered upon the completion of loading the web page and finishing the PDF printing process.

Command-Line Usage

Enable command-line usage by checking the number of provided arguments and extracting the URL and PDF file name:

if __name__ == "__main__":
    # Check if the correct number of command-line arguments is provided
    if len(sys.argv) != 3:
        print("Usage: python application.py <url> <name_of_pdf_file>")
        sys.exit(1)

    # Extract URL and PDF file name from command-line arguments
    url = sys.argv[1]
    pdf = sys.argv[2]

    # Call the function to convert the web page to PDF
    url_to_pdf(url, pdf)
Enter fullscreen mode Exit fullscreen mode

now, here is the full script below :

# Import necessary modules
import sys
from PySide6 import QtCore, QtWidgets, QtWebEngineCore, QtWebEngineWidgets, QtGui

# Function to convert a web page to PDF
def url_to_pdf(url, pdf):
    # Create a QApplication instance
    app = QtWidgets.QApplication(sys.argv)

    # Set desktop user agent string
    profile = QtWebEngineCore.QWebEngineProfile.defaultProfile()
    profile.setHttpUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")

    # Create the QWebEngineView with viewport size
    view = QtWebEngineWidgets.QWebEngineView()
    view.resize(1920, 1080)  # Adjust viewport size as needed
    page = QtWebEngineCore.QWebEnginePage(view)

    # Callback function for handling print finished event
    def handle_print_finished(filename, status):
        print("finished", filename, status)
        app.quit()  # Quit the application after printing

    # Callback function for handling load finished event
    def handle_load_finished(status):
        if status:
            # Adjust print layout
            layout = QtGui.QPageLayout()  # Import from QtGui
            layout.setPageSize(QtGui.QPageSize.A4)  # Or desired page size
            layout.setOrientation(QtGui.QPageLayout.Landscape)  # If content is wider
            layout.setMargins(QtCore.QMarginsF(0, 0, 0, 0))  # Set zero margins
            page.printToPdf(pdf, layout)
        else:
            print("Failed to load page")
            app.quit()

    # Connect signals and load the page
    page.pdfPrintingFinished.connect(handle_print_finished)
    page.loadFinished.connect(handle_load_finished)
    page.load(QtCore.QUrl(url))

    # Start the application event loop
    sys.exit(app.exec())

if __name__ == "__main__":
    # Check if the correct number of command-line arguments is provided
    if len(sys.argv) != 3:
        print("Usage: python application.py <url> <name_of_pdf_file>")
        sys.exit(1)

    # Extract URL and PDF file name from command-line arguments
    url = sys.argv[1]
    pdf = sys.argv[2]

    # Call the function to convert the web page to PDF
    url_to_pdf(url, pdf)
Enter fullscreen mode Exit fullscreen mode

Now, you can save it as say, converter.py then run this script as below

The complete code can be found in this repository. I encourage you to test it out, share your feedback through comments, and consider enhancing it by adding additional features if you find opportunities for improvement.

cheers .

Top comments (0)