DEV Community

Pacharapol Withayasakpunt
Pacharapol Withayasakpunt

Posted on • Edited on • Originally published at polv.cc

A reliable way to create PDF from HTML/markdown, with PDF specific features

Indeed, the way includes

  • Don't just simply convert a HTML file to PDF, one-to-one. Otherwise, you can never control page breaks.
  • Nonetheless, HTML rendering will be web-browser dependent. (Therefore, not sure about Pandoc.)
  • CSS is powerful, but are there exceptions?

Therefore, I suggest a way of using a web driver + a PDF library, that can READ and MODIFY pdf.

The web driver is currently best either Puppeteer, or Chrome DevTools Protocol.

Additionally, it might be possible to distribute PDF generator via Electron + Puppeteer-in-Electron.

12

I got this code from another Stackoverflow Question:

import electron from "electron";
import puppeteer from "puppeteer-core";

const delay = (ms: number) =>
  new Promise(resolve => {
    setTimeout(() => {
      resolve();
    }, ms);
  });

(async () => {
  try {
    const app = await puppeteer.launch({
      executablePath: electron,
      args: ["."],
      headless: false,

The PDF manager, that can read-and-merge PDF, is traditionally either PDFtk (binary) or pdfbox (Java), I think; but I have just recently found,

GitHub logo Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment

About CSS, yes CSS can also detect page margins.

  body {
    position: fixed;
    width: 100vw;
    height: 100vh;
    display: flex;
    align-items: center;
    justify-content: center;
  }
Enter fullscreen mode Exit fullscreen mode

This is my attempt so far.

GitHub logo patarapolw / make-pdf

Beautifully make a pdf from couples of image files

So, the answer to the question is, no, do not convert a single HTML or Markdown file, to one PDF file; but do combine within a folder. Also,

  • Running a web server might be better than using file:// protocol and relative paths
  • Choosing a web browser might affect result.

Also, consider alternatives to PDF, that easily allow editing. Might be odt or docx?

Top comments (2)

Collapse
 
olexandrpopov profile image
Oleksandr Popov

Our team has been working on a document generation project and we convert HTML to PDF using wkhtmltopdf. For example we generate documents like this using only HTML and CSS. wkhtmltopdf has a great CSS support. Regarding page breaks we can control them using page-break-before and page-break-after properties.

As for alternatives, recently we started to use docx templates and process them with docxtemplater and convert to PDF with libreoffice headless.

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt • Edited

Apparently, I find that pandoc alone can be powerful enough.

New page is as easy as \newpage. (I know, LaTeX syntax in Markdown.)

Also, geometry: margin=1cm in YAML frontmatter.

Also, LaTeX can be used to host and join PDF.

But, is there a best tool that can easily do all these?

BTW, I found Puppeteer unreliable.