I was looking for a way to crop a pdf to visible bounds. I wanted to remove the unnecessary empty space surrounding the contents of the PDF.
I knew there was a Linux tool called pdfcrop
that worked well for my requirement (it uses Ghostscript
behind the scenes). The thing is, how could I call pdfcrop
if my service (that needs the crop) is a .NET WebAPI service?
We have classes like ProcessStartInfo
and Process
in .NET, that allow us to start a process on the host. But it felt hacky to use that in a WebAPI service, and I was not sure if it could leave zombie processes in the OS.
I wanted to have something separate from the service, so I decided to try an Azure Function with Python. A Function has the benefit of being consumption-based, event-driven, and it can do rapid bursts of stateless custom code at scale.
My proof of concept
The first step was to initialize the Function App:
func init PdfCropFunctionApp --docker --worker-runtime python
Then the Function itself:
cd .\PdfCropFunctionApp\
func new --name PdfCropFunction --template "HTTP trigger" --language python
The Dockerfile
is necessary, because we need to install pdfcrop
there:
RUN apt-get update
RUN apt-get install -y texlive-extra-utils
Finally the scritpt, __init__.py
:
import logging
import base64
import uuid
import subprocess
import json
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
#read base64 from request
req_body = req.get_json()
pdf_base64 = req_body.get('pdf_base64')
#convert to bytes
bytes = base64.b64decode(pdf_base64)
#generate a unique Id
uniqueId = str(uuid.uuid4())[:8]
logging.info(f'Length in bytes: {len(bytes)}. Unique Id: {uniqueId}')
#save the file
with open(f"/tmp/pdf_{uniqueId}.pdf", "wb") as binary_file:
binary_file.write(bytes)
#call pdfcrop
subprocess.call(f"/usr/bin/pdfcrop --margins '5 5 5 5' /tmp/pdf_{uniqueId}.pdf", shell=True)
#get the resulting file as base64
with open(f"/tmp/pdf_{uniqueId}-crop.pdf", "rb") as croppedFile:
pdf_base64_cropped = base64.b64encode(croppedFile.read())
#return the cropped PDF as base64
croppedJson = { "pdf_base64_cropped": (pdf_base64_cropped.decode("utf-8")) }
return func.HttpResponse(
status_code=200,
mimetype="application/json",
body=json.dumps(croppedJson),
)
It can be built and run (locally) as usual:
docker build -t pdfcropimage .
docker run -p 8080:80 -it --rm pdfcropimage
The request should be sent like this:
POST http://localhost:8080/api/PdfCropFunction
{
"pdf_base64": "JVBERi0xLjcKCjQgMC..."
}
And the response will come in a similar format:
{
"pdf_base64_cropped": "JVBERi0xLjcKJdDUxdgKN..."
}
This is what it looks like ๐
Original PDF | Cropped PDF |