Intro – HTML to PDF with Multiple Languages
A frequent question we receive at API2PDF is if it is possible to generate PDFs with foreign languages, or languages that contain special characters in their text. The answer is yes, and it is possible to support these languages in both wkhtmltopdf and Headless Chrome rendering engines.
Include Special Language Characters in PDFs
We support most of what is supported by unicode characters, or UTF-8. The trick is to include <meta charset=”UTF-8″> in the <head> section of your HTML. If you do this, unicode characters should properly appear. We ran a test by copying and pasting the HTML below into the Try It Out section on the home page of api2pdf.com which uses Headless Chrome.
Those languages include: Albanian, Arabic, Armenian, Bulgarian, Traditional and Simplified Chinese, Croatian, Czech, Danish, Dutch, Esperanto, Finnish, French, Georgian, German, Greek, Hebrey, Hungarian, Icelandic, Igbo, Interlingua, Italian, Japanese, Korean, Lithuanian, Macedonian, Maltese, Mongolian, Occitan, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, Thai, Ukrainian, Upper Sorbian, Turkish, Uyghur, Vietnamese, and Welsh.
Using standard UTF-8 in your charset should be perfectly sufficient for most use-cases. However, you may find that you need to use a language that is not supported out of the box from the list above. That is where you can leverage a custom font in the next section.
Custom Fonts and Foreign Language in PDF
We live in the USA, so any language other than English is foreign to us – forgive our ignorance, but for simplicity, that is the terminology we will be using here. We received an inquiry from a customer who had a requirement to use Thai language characters with the Roboto font. They were testing various fonts using our approach for including custom fonts in PDFs here. They ran into the issue as shown here:
Now why would the Thai language work on some custom fonts, but not Roboto? After a little research, we discovered the wikipedia article on Roboto here: https://en.wikipedia.org/wiki/Roboto which states that the font only supports Latin, partial Greek, and Cyrillic. Unfortunately, no Thai support for Roboto.
So what do you do?
As it turns out Google has been working on a project called Noto which is an alternative to Roboto. Their goal is to support and standardize as many languages as possible. And so all one needs to do is download their Thai Noto font here: https://www.google.com/get/noto/#sans-thai and you’re good to go.
See a working HTML sample here.
This should work for any language that Noto offers.
What is API2PDF?
API2PDF is a web service that helps you generate PDF files at massive scale. Convert HTML to PDF, Web Pages to PDF, MS Office files to PDF, and merge PDFs together. Grab an API key from here and check out our documentation to get started in minutes!Tags: foreign language html to pdf, html to pdf arabic, html to pdf chinese, html to pdf danish, html to pdf dutch, html to pdf thai, html to pdf turkish, html to pdf utf-8, multi lingual headless chrome, multi lingual wkhtmltopdf