Blog

Back to blog posts

Nodejs / Javascript Tutorial – Convert PDF to HTML

Published Jul 14, 2021

Intro

For those who are nodejs developers, you might have a niche requirement to convert a PDF into HTML, or extract text content from a PDF for indexing purposes. Here at API2PDF, we have a PDF to HTML endpoint that does a best effort to extract the text from a PDF and output an HTML document.

Our API will take your .pdf file and convert it to html. Just make sure your PDF is saved as a .pdf file and accessible at a URL that our service can ingest. For example, see this: http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf — Ideally a file storage provider like S3 or Azure Blob Storage. See the code sample below.

Convert PDF to HTML with Node / Javascript

Step 1) Open up your package manager and run the command

npm install –save api2pdf

Step 2) Grab an API key from https://portal.api2pdf.com. Only takes 60 seconds.

Step 3) Use the sample code below and replace “YOUR-API-KEY” with the api key you acquired in step 2.

var Api2Pdf = require('api2pdf');   
var a2pClient = new Api2Pdf('YOUR-API-KEY');

a2pClient.libreOfficePdfToHtml('http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf').then(function(result) { console.log(result); });

And that’s it! Modify the code as you see fit. Hopefully this saves you time and makes converting PDF to HTML files easy and painless for those writing node / javascript code.

See full github library

We have a whole nodejs based client library for our API that does a lot more than just this. Check out the full library capabilities here: https://github.com/Api2Pdf/api2pdf.node