Generating PDFs with Node, PDFkit, and Serverless on AWS Lambda 2

Generating PDFs with Node, PDFkit, and Serverless on AWS Lambda

There are a few blog posts out there already covering this subject, but most of them include extra packages or steps than are actually necessary. In this post, I’ll cover only the minimum to create a Serverless function on AWS Lambda which generates PDFs using Node and PDFKit. No need for Express, no HTML parsing, and no uploading to S3.

Setting up AWS Lambda with Serverless

Getting started with serverless functions with Node on AWS Lambda is pretty straight forward. We will need to use a configuration file called serverless.yml (for more details on the file options, see here). This file should look something like so:

service: generatePdf

provider:
  name: aws
  region: us-west-1
  runtime: nodejs10.x

functions: 
  generatePdf:
    handler: src/index.generatePdf
    events:
      - http:
          path: /pdf
          method: get

This configuration assumes we have a function called generatePdf which is exported from the file called index.js located inside a folder called src.

Each serverless function handler is provided with 3 parameters:

  1. The request “event” containing all sorts of details like the route requested, the request method, the request headers, and more.
  2. The lambda “context” which provides details about the context of the function, as well as some methods for the response.
  3. A Node.js style, error-firstcallback” function used to send back the response data.

Here is a very basic handler example. Note that the callback function expects an object for the response (not a Response object), which must have a “body” key:

exports.generatePdf = (event, context, callback) => {
  console.log('details about the event: /n, event)
  console.log('details about the context: /n, event)

  const response = {
    body: "hello world"
  }
  callback(false, response)
}

If you’re like me and prefer promises, we can convert it to use async/await like so:

exports.generatePdf = async (event, context) => {
  console.log('details about the event: /n, event)
  console.log('details about the context: /n, event)

  const response = {
    body: "hello world"
  }
  
  return response
}

Cool. Our basic lambda function is working.

Generating PDFs in Node with PDFKit

Next, we will look at generating a PDF in Node. There are a few options, but the one I found most common was PDFKit. You can install it in your project with “npm install pdfkit“.

A basic “hello world” example for generating a PDF in memory requires us to use buffers. It looks something like this:

const PDFDocument = require("pdfkit")

const doc = new PDFDocument()

doc.text('hello world', 100, 50)

doc.end()

const buffers = []
doc.on("data", buffers.push.bind(buffers))
doc.on("end", () => {
    const pdfData = Buffer.concat(buffers)
    console.log(pdfData)
  })
})

This is fine, but since we are using async/await, we want to use a Promise instead of a callback:

const PDFDocument = require("pdfkit")

const pdfPromise = new Promise(resolve => {
  const doc = new PDFDocument()

  doc.text('hello world', 100, 50)
  doc.end()

  const buffers = []
  doc.on("data", buffers.push.bind(buffers))
  doc.on("end", () => {
    const pdfData = Buffer.concat(buffers)
    resolve(pdfData)
  })
})

Adding PDFs as a serverless response

We’re almost done. We have a serverless endpoint that runs a Node function, and we have a Node function that generates a PDF. However, there is a little bit of configuration needed in the AWS Gateway in order to server PDFs to the browser as binary data.

First, we need to install the Serverless plugins serverless-apigw-binary and serverless-apigwy-binary (it’s not a typo, they are close, but not the same). We can do so with npm install serverless-apigw-binary serverless-apigwy-binary.

With those installed, we also need to make a few changes to our serverless.yml file. We need to tell the AWS Gateway to include binary media types, tell our generatePdf function to serve the content as binary, include the aforementioned plugins, and tell AWS which content type to serve as binary based on the HTTP header it receives:

service: generatePdf

provider:
  name: aws
  region: us-west-1
  runtime: nodejs10.x
  # This is new
  apiGateway:
    binaryMediaTypes:
      - "*/*"

functions: 
  generatePdf:
    handler: src/index.generatePdf
    events:
      - http:
          path: /pdf
          method: get
          # This is new
          contentHandling: CONVERT_TO_BINARY

# This is new
plugins:
  - serverless-apigw-binary
  - serverless-apigwy-binary

# This is new
custom:
  apigwBinary:
    types:
      - "application/pdf"

With that in place, we can edit our previous “hello world” serverless function to use the PDFKit generation. We also have to make sure to base64 encode our PDF buffer, send the appropriate “application/pdf” content-type response header, and set the isBase64Encoded flag for the response to true:

const PDFDocument = require("pdfkit")

exports.generatePdf = async () => {
  const pdfBuffer = await new Promise(resolve => {
    const doc = new PDFDocument()

    doc.text('hello world', 100, 50)
    doc.end()

    const buffers = []
    doc.on("data", buffers.push.bind(buffers))
    doc.on("end", () => {
      const pdfData = Buffer.concat(buffers)
      resolve(pdfData)
    })
  })

  return {
    headers: {
      "content-type": "application/pdf",
    },
    body: pdfBuffer.toString("base64"),
    isBase64Encoded: true,
  }
}