PDF Generation At Scale

·May 22, 2023

May 22, 2023

Great article Ali Hussam, lot of learning. Here's a question, instead of passing the image as base64 from caller service as interpolation payload. Can we store the images in some DB and then load them from inside the pupeeter service when actually generating and interpolating the PDF? This will also potentially reduce the debt of 10MB data limit for cloud functions

·1 reply

Ali Hussam

Author

·Jan 17, 2024

Author

·Jan 17, 2024

Good suggestion, we have moved to using Google Cloud Storage to share files among the functions for the feature 🙌

Abhinav Kumar

·Nov 18, 2023

Nov 18, 2023

Very well written Ali Hussam! I have some questions/suggestions about the image loading part and the architecture. Hope it will help.

If the caller service has to pass the image data then does it already have a puppeteer-like implementation that fetches webpages and then the image data?
Generally the size of all the images combined is equivalent to the webpage size, Is there any reason why the entire webpage is sent as an event to another processor? Saving it into a blob and passing the address can be a better alternative.
One can use node streams to write data into storage. Instead of converting the entire file to PDF, saving it to memory, and then storing it. Chunks of data can be converted and written into storage using (createPDFStream). With this, a large number of files can be associated for conversion and even a considerably smaller worker (with smaller mem and power) can process all the files

·1 reply

Ali Hussam

Author

·Jan 17, 2024

Author

·Jan 17, 2024

Great insights.

No puppeteer in caller function, caller fetch the images from Google Cloud Storage.
As mentioned our requirements for image use is just a highly compressed logo (roughly 300KB as base64 string) and given there are some other business processes among the callers it made sense for us to pass it in event payload. As mentioned if your requirement is to use heavier media files it always better to pass a URL to media hosted in some CDN or storage.
This is a great suggestion. It will increase the performance. 🙌

PDF Generation At Scale

4 comments