How to split PDF file or PDF stream data using HummusJS and Node.JS ? — Part 2

Mahek Chhabra
2 min readJun 21, 2020

--

This is a continuation of a tutorial that was started over here. I recommend you go through that tutorial which describes splitting raw PDF files into pages. Let’s continue to discuss how to work on PDF stream data instead of a raw file.

Introduction

In a cyber driven world, it becomes a priority to deal with data security and integrity. There are times where having a raw downloaded PDF file is not possible or may hinder the safety of the data owner; at such times working on stream and storing in the buffer is one of the workarounds.

Streams read data in the form of bytes piece by piece and process the data without downloading. Streams provide both memory and time efficiency.

Read a PDF stream data and split into individual streams pagewise :

After completing the installation and setup, continue with the following steps:

Step 1: Import ‘hummus’ and ‘memory-stream’ node modules into a file.

Step 2: Read file data in form of stream from buffer using PdfStreamForBuffer(bufferData). The buffer data used can be read from a hosted file using createReadStream() on the file.

Step 3: To extract the number of pages present in a file, use getPagesCount().

Step 4: The next step is to create a write stream for each page looping based on total pages.

Step 5: To create a write stream, use the WritableStream() object of memory-stream.

Step 6: Use createPDFCopyingContext() and appendPDFPageFromPDF() function to append data to write file from the actual file page wise.

Step 7: After completing to append the stream data, end the stream data, so no new data or next page data gets added to the same stream

Figure 1: Read and Split PDF Stream Data

The writable streams created at the end of this execution can be used to process further or can be written to the hosted location by using createWriteStream() on the stream data to store in a file.

Result

Using a step by step guide, you can split the PDF stream data into individual pages without using any external split tool along with maintaining security and integrity of the original data.

--

--

No responses yet