loading...

Generate a PDF from HTML with puppeteer

damcosset profile image Damien Cosset Originally published at damiencosset.com Updated on ・2 min read

Introduction

This is one of those frustrations post where I just spent hours working on something and I finally managed to have a working solution. I learned quite a bit but I feel like it should not have taken me that much time...

Anyway, the goal was to generate a PDF from HTML, then send it back to the browser so the user could download it. I tried a lot of different things, and it's more than likely my solution is not the most elegant, or fast, but fuck it, it works.

I consider this post to be a place where I can store this solution, juste in case I forget it in the future. I'll know where to look. Let's jump into the actual solution.

The solution!

Front-end

Let's start with the front-end.

const downloadPDF = () => {
        fetch('/api/invoices/create-pdf', {
            data: {
                invoiceDetails,
                invoiceSettings,
                itemsDetails,
                organisationInfos,
                otherDetails,
                clientDetails
            },
            method: 'POST'
        }).then(res => {
            return res
                .arrayBuffer()
                .then(res => {
                    const blob = new Blob([res], { type: 'application/pdf' })
                    saveAs(blob, 'invoice.pdf')
                })
                .catch(e => alert(e))
        })
    }

This is the function that does everything. We are generating an invoice in my case.

1) A fetch with the POST method. This is the part where we generate our PDF with the proper data and generate our PDF on the server. (server code will follow)

3) The response we get needs to be converted into an arraybuffer.

4) We create a Blob ( Binary Large Objects ) with the new Blob() constructor. The Blob takes a iterable as the first argument. Notice how our response turned arraybuffer is surrounded by square braquets( [res] ). To create a blob that can be read as a PDF, the data needs to be an iterable into a binary form ( I think...). Also, notice the type application/pdf.

5) Finally, I'm using the saveAs function from the file-saver package to create the file on the front end!

Back-end

Here is the back-end things. There is a whole express application and everything. I juste show you the controller where the two methods reside for this PDF problem.

module.exports = {
    createPDF: async function(req, res, next) {
        const content = fs.readFileSync(
            path.resolve(__dirname, '../invoices/templates/basic-template.html'),
            'utf-8'
        )
        const browser = await puppeteer.launch({ headless: true })
        const page = await browser.newPage()
        await page.setContent(content)
        const buffer = await page.pdf({
            format: 'A4',
            printBackground: true,
            margin: {
                left: '0px',
                top: '0px',
                right: '0px',
                bottom: '0px'
            }
        })
                await browser.close()
        res.end(buffer)
    }
}

1) I am using puppeteer to create a PDF from the HTML content. The HTML content is read from an HTML file I simply fetch with readFileSync

2) We store the buffer data returned by page.pdf() and we return it to the front-end. This is the response converted to an arraybuffer later.

Done

Well, looking at the code, it really looks easier now that it actually did when I tried to solve this problem. It took me close to 10 hours to find a proper answer. 10 FREAKING HOURS!!!!

Note to self: if you get frustrated, walk away from the computer, get some fresh air, and come back later...

Happy Coding <3

Discussion

pic
Editor guide
Collapse
anduser96 profile image
Andrei Gatej

I've gone through something similar a few days ago... and it took me a while to figure it out.

Here's my approach, without using puppeteer.
I'm using Vue in this case, but I'm pretty sure the concept is applicable in other cases as well.

// Client
<template>
    // Use CSS to give it full width & height
    <div class="c-file">
        <iframe
            class="c-file__display"
            v-if="src"
            :src="src"
        />
    </div>
</template>

<script>
/* ... */
 async created () {
  const config = { headers: new Headers({
            'Content-Type': 'application/json',
            }), 
            method: "POST",
            body: JSON.stringify(body) // Dynamic data
        }
        fetch(url, config)
            .then(res => res.blob())
            .then(res => {

                const blob =  new Blob([res], { type: 'application/pdf' });
                this.src = URL.createObjectURL(blob, { type: 'application/pdf' })
            })
 }
</script>

  // Server
  /* ... */
   const pdf = require('html-pdf')
   // Generate HTML string
   const content = getPDFContent(req.body.data);


   res.setHeader('Content-Type', 'application/pdf')
   pdf.create(content).toStream( (err, stream) => {
      stream.pipe(res);
   });

Collapse
vladejs profile image
Vladimir López Salvador

html-pdf uses phantomjs under the hood. Where is your app hosted? It didn't work for me on Docker based cloud providers like Now

Collapse
anduser96 profile image
Andrei Gatej

The app isn’t hosted (yet), it is all on localhost. I haven’t tried docker nor used cloud providers too often, but I’m really curious about why this approach wouldn’t work.

Please let me know if you find a solution!

Thread Thread
mrsaints profile image
Ian L.

We've had success hosting a similar Puppeteer-based converter using Google Cloud Functions (I don't work for Google): github.com/Courtsite/shuttlepdf. There is a bit of latency, but it is a reasonable trade-off for ease of deployment, scalability, and reliability.

Thread Thread
vladejs profile image
Vladimir López Salvador

Well, in my case, phantomjs wasn't found by the library on my docker based hosting.

There is a dockerized phantom available but you should have full access the deployment Dockerfile in order to tell it to install on the process.

Overall I would recommend going away from html-pdf because it's not maintained anymore

Thread Thread
mrsaints profile image
Ian L.

Generally speaking, you should probably avoid Phantomjs. With headless Chromium, there really isn't any need for it. Indeed, I think it is no longer maintained.

Collapse
vladejs profile image
Vladimir López Salvador

Lolly Post.

I understand your frustration.
In my case, I spent "One freaking Week" coding a serverless microservice that takes {html,options} and returns back the buffer, a generic solution.

The challenge was that I needed to embed puppeteer in 50mb which is the maximum size a serverless function can take on Now.

Second challenge was debugging why the f**** HTML didn't render at all.

After ours of trial and error I found that for some reason, if the html string contains the # character, it becomes somehow "invalid" and puppeteer fails silently 0_0

Third, in my particular case, the HTML uses bootstrap and just by using puppeteer.setContent didn't wait for it to load correctly. I needed to use a workaround by
puppeteer.go('text/html://${html}') which do waits for external resources to load.

Four, (and this one is the reason why I almost throws the computer through the window), the HTML markup in my case was rendered at runtime using react Dom and the go to hell react-inline-css library, which ALWAYS WRAPS the generated CSS
with this selector #ReactInlineCss ....

See the # character there? Goto point #2 above (x_x)

Well, but thanks god the pain is gone. If you wanna learn how to make it serverless like I did, hop to now's blog (zeit.co/blog)

Collapse
calag4n profile image
calag4n

Sorry, that made me laugh 😁

Collapse
antoniovassell profile image
Antonio Vassell

Hey Vladimir, how were you able to fix the second point?

Collapse
tonymezzolesta profile image
Tony Mezzolesta

Sweet lord thank you for this.
I was using the html-pdf package and was having no issues until I to either deploy to azure app service or run it in a docker container.
Didn't realize that phantomJS is pretty much no longer supported so this was a life saver.

Also, if anyone was curious, I was having issues running puppeteer in a docker container. Found this gem in order to make chromium work in the container:

const browser = await puppeteer.launch({ headless: true,  args: ['--no-sandbox', '--disable-setuid-sandbox'], ignoreHTTPSErrors: true, dumpio: false });
Collapse
igghera profile image
Andrea Gherardi

Interesting post, thanks for sharing!

I have a question: where is the code in which you inject the data into the template?
I mean, I can see where you load the HTML template and render it into a PDF, and I can see the front-end sending some data to customise the PDF like invoiceDetails, clientDetails etc. but how do you actually put that data inside the template?

Collapse
damcosset profile image
Damien Cosset Author

I have a "react" template on the front end, the one you see.
There is an Html template on the back, with placeholders for the future data. After that, to populate the html template on the back, it's just String.replace() functions

Collapse
pavelloz profile image
Paweł Kowalski

And it even works on aws lambda ;-)

Collapse
bhabad1 profile image
bhabad1

please provide the complete code

Collapse
nomishah profile image
nouman shah

Bootstrap is not working in my pdf.html file.
if there is any solution please let me know