During my year 3 polytechnic internship, I was given an extremely mundane and repetitive task. I won't go into specifics but I knew I could automate the process.
This was my solution; a python app that reads from a .csv file and automatically fills a PDF form with data from the list.
Initially I wrote a simple local app to be run on command line but I further improved it to be a full fledged web app.
This post is part one of two: Command Line Interface app
The app essentially makes use of the pdfjinja library together with a read from .csv function.
Disclaimer: I am a security student with no professional programming/ software engineer experience so my code may not be following best practices...but it works
Contents
All files needed can be found in my github repository. Higher resolution versions of all images used can be found in there too.
Prerequisites
Python
This post is targeted to novice Python programmers with some experience so pip, virtual environments, dependencies, these are to name of but a few of the many Python concepts needed.
The app is written with Python 3.9.6. You need to have Python installed together with all other libraries and dependencies used. You could create an environment, but I didn’t bother so my dependencies are global hahaha.
And of course, ensure you have pip installed as well.
pdfjinja
Download pdfjinja from pip. This is to enable variable interpolation within PDF files.
pip install pdfjinja
PDFtk
Short for PDF toolkit, the PDFjinja github states that PDFtk is needed. Download the PDFtk app.
Download the PDFtk library from pip as well
pip install pypdftk
PDF Form
And of course, you will need a PDF form to fill. I have included examples in my github repo. The PDF form doesn't actually have to be like an actual form for example, an application form or a particulars form. It can be any document that has fixed fields to fill, in the format of a PDF form.
To elaborate what I meant, the app can be used to create name cards that need to be filled with names from a list. It just depends on how the PDF form is designed.
However to create a PDF form, you need to use Adobe Acrobat Pro. Acrobat Reader does not have the function to create PDF forms. Perhaps there are free alternatives out there but for my case I used Acrobat Pro.
Jinja Templates
After creating the PDF form, you will need to set "variable names" for the fields you want to programmatically fill.
This is I did it with Adobe Acrobat Pro.
Right click on the form field and open its properties. In the "Tooltip" field, insert your desired 'variable' name and enclose it with
{{ }}
So for instance I want to name the variable "name", I would insert
{{name}}
Dataset
Next, you need to create a .csv with the PDF form field names as the column names. I have included examples in my github repo.
The Code
Oh boy here comes the spaghetti. This part is a bit more complicated. You need to modify the code to specify where the files will be read and output to. And to make the input/ output simpler, run the .py relative to where you want the I/O to be.
I developed this app in Windows 10 and ran it with PowerShell. You might run into issues if you're using other operating systems. Contact me if you do, I'll try to help.
# Auto PDF filler
import os
import csv
import sys
import pprint
from pdfjinja import PdfJinja
import shutil
import pathlib
import pypdftk
# glabal variables
datasetPath="ds1.csv"
templatePath="form1.pdf"
group="groupA"
#### <-------[ START OF FUNCTIONS]------->
# create list of dictionaries. One dict woud be one dataset. The whole list carries all data
def lister(csvPath):
try:
print("\nreading CSV from\n"+csvPath, file=sys.stderr)
reader = csv.DictReader(open(csvPath, 'r'))
theList = []
for line in reader:
theList.append(line)
pprint.pprint(theList)
return theList
except Exception as e:
print(e, file=sys.stderr)
# takes in list and writes PDFs of them
def PDFer(allData, pdfPath, group):
try:
print("\nreading PDF from\n"+pdfPath, file=sys.stderr)
thePDF = PdfJinja(pdfPath)
print("\ncreating filled PDFs ...")
# will always overwrite if existing group name folder exists
shutil.rmtree("./filled/"+group, ignore_errors=True)
pathlib.Path('./filled/'+group).mkdir(parents=True)
count = 0
for x in allData:
count = count + 1
pdfout = thePDF(x)
pdfout.write(open("./filled/"+group+"/filled" +"-"+
group+"-"+str(count)+".pdf", "wb"))
print(str(count)+" files created for "+group)
except Exception as e:
print(e, file=sys.stderr)
# Reads PDFs from specified directory and compiles them into single PDF with many pages
def masher(pdfdir, group):
allPDF = []
for file in os.listdir(pdfdir):
allPDF.append(os.path.normpath(os.path.join(pdfdir, file)) )
try:
# creates the "compiled" folder. If it already exists, do nothing
pathlib.Path('./compiled').mkdir(parents=True, exist_ok=True)
outFilePath = "./compiled/all-forms-"+group+".pdf"
pypdftk.concat(allPDF, outFilePath)
print("\n\nCompleted compiling PDFs!")
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
outFile = "all-forms-"+group+".pdf"
return outFile
except Exception as e:
print(e)
print("\nerror compiling PDFs\n")
#### <-------[END OF FUNCTIONS]------->
# Calling the functions
PDFer(lister(datasetPath), templatePath, group)
outFileName = masher("./filled/"+group, group)
print("output file name: "+outFileName+"\n\n")
First, you need to have the pdf form and the csv dataset in the same directory as the .py app. It should look something like this.
Next, modify the global variables to fit your pdf form and csv dataset file names. You can also change the group name if you're using this app for several groups. For instance filling forms for Client A and Client B.
You could take the fillerz.py file from mu github repo instead.
I know, this is hardcoded and could be better. The web application improves on this!
Run!
With everything set up, you can finally run the app! I used PowerShell on Windows 10. I believe the code can run on Linux too you might just need to use a different library for modifying files and directories. Contact me if you need help with this.
python fillerz.py
You should see the output file being output in a folder called "compiled". I know the app and its process is quite clunky. This was my first attempt to get the job done. See part 2 for a beautified web app solution!
That's it!
Thank you for reading. As mentioned in my disclaimer, I'm still learning, I am definitely no expert but this solved my issue and I hope it helps someone out there :)
Top comments (1)
Could you help me out with altering the code slightly.