DEV Community

Krishna Pankhania
Krishna Pankhania

Posted on

Convert first page of PDF to Image| AWS lambda,S3

Recently I was developing a functionality where there was a requirement to get the first page of PDF (stored on s3) and convert it to an image. I have dug up the internet for this one but couldn't find anything to the point which will guide me on how to do this for AWS lambda. So here I am sharing my workaround.

Things you need to do before moving onto the code section

Here are the steps to be followed (I will write steps for code only)

1 => Getting a file from S3 and saving it temporarily.

function getFile(bucket, objectname) {
  return new Promise((res, rej) => {
    var params = { Bucket: bucket, Key: objectname };
    s3.getObject(params, function (err, data) {
      if (err) {
        console.log(err);
        res(null);
      }
      const name = `/tmp/${objectname}`;
      fs.writeFile(name, data.Body, function (err) {
        if (err) res(null);
        res(name);
      });
    });
  });
}
Enter fullscreen mode Exit fullscreen mode
const filepath = await getFile(bucket, key);
Enter fullscreen mode Exit fullscreen mode

2 => Create a helper file for conversion code, name it pdf2Img.js. This code will convert the tmp pdf file to a jpeg image. The code is inspired from pdf2png which is generating png image.

const exec = require("child_process").exec;
const fs = require("fs");
const tmp = require("tmp");

// ghostscript executables path
let projectPath = __dirname.split("\\");
projectPath.pop();
projectPath = projectPath.join("\\");

exports.ghostscriptPath = projectPath + "\\executables\\ghostScript";

exports.convert = (pdfPath, options) => {
  return new Promise((resolve, reject) => {
    if (!options.useLocalGS) {
      process.env.Path += ";" + exports.ghostscriptPath;
    }
    options.quality = options.quality || 100;
    // get temporary filepath
    tmp.file({ postfix: ".jpeg" }, function (err, imageFilepath, fd) {
      if (err) {
        resolve({
          success: false,
          error: "Error getting second temporary filepath: " + err,
        });
        return;
      }

      exec(
        "gs -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT -sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r" +
          options.quality +
          " -dFirstPage=1 -dLastPage=1 -sOutputFile=" +
          imageFilepath +
          " " +
          pdfPath,
        (error, stdout, stderr) => {
          if (error !== null) {
            resolve({
              success: false,
              error: "Error converting pdf to png: " + error,
            });
            return;
          }
          const img = fs.readFileSync(imageFilepath);
          resolve({ success: true, data: img });
        }
      );
    });
  });
};
Enter fullscreen mode Exit fullscreen mode

To generate a jpeg, use the below command in exec

"gs -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT -sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r" +
          options.quality +
          " -dFirstPage=1 -dLastPage=1 -sOutputFile=" +
          imageFilepath +
          " " +
          pdfPath
Enter fullscreen mode Exit fullscreen mode

To generate png use the below command in exec

"gs -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r" +
          options.quality +
          " -dFirstPage=1 -dLastPage=1 -sOutputFile=" +
          imageFilepath +
          " " +
          pdfPath
Enter fullscreen mode Exit fullscreen mode

More details about Ghostscript options you can find it here https://www.ghostscript.com/doc/current/Use.htm

3 => Use helper function code in index file. Also set ghostscriptPath path to "/opt/bin/gs"

const pdf2Img = require("./pdf2Img");
pdf2Img.ghostscriptPath = "/opt/bin/gs";
Enter fullscreen mode Exit fullscreen mode

Create a function that will execute the conversion code;

async function pdfToImage(pdfPath) {
  try {
    const response = await pdf2Img.convert(pdfPath, {});
    if (!response.success) {
      console.log("Error in pdfToImage", response.error);
      return response;
    }
    return {
      contentType: "image/jpeg",
      data: response.data,
    };
  } catch (e) {
    console.log("Error in pdfToImage", e.message);
  }
}
Enter fullscreen mode Exit fullscreen mode
const pdfToImageRes = await pdfToImage(filepath);
Enter fullscreen mode Exit fullscreen mode

4 => Upload the converted image to the bucket.

function uploadFile(bucket, objectname, contentType, data) {
  return new Promise((res, rej) => {
    var params = {
      Bucket: bucket,
      Key: `${somePath}/${objectname}`,
      Body: data,
      ContentType: contentType,
    };
    s3.putObject(params, function (err, data) {
      if (err) {
        console.log(err);
        res(null);
      }
      res(true);
    });
  });
}
Enter fullscreen mode Exit fullscreen mode
const responseUpload = await uploadFile(
  bucket,
  imageName,
  pdfToImageRes.contentType,
  pdfToImageRes.data
);
Enter fullscreen mode Exit fullscreen mode

That's it!

Top comments (0)