Blog

Back to blog posts

Multi-lingual support for HTML to PDF (wkhtmltopdf / Headless Chrome)

Published Mar 25, 2019

Intro – HTML to PDF with Multiple Languages

A frequent question we receive at API2PDF is if it is possible to generate PDFs with foreign languages, or languages that contain special characters in their text. The answer is yes, and it is possible to support these languages in both wkhtmltopdf and Headless Chrome rendering engines.

Include Special Language Characters in PDFs

We support most of what is supported by unicode characters, or UTF-8. The trick is to include <meta charset=”UTF-8″> in the <head> section of your HTML. If you do this, unicode characters should properly appear. We ran a test by copying and pasting the HTML below into the Try It Out section on the home page of api2pdf.com which uses Headless Chrome.

 

<html style="color:green;">
<head>
<meta charset="UTF-8">
</head>
<body>
<p>Ç'është Unicode?, in Albanian (36 letters)</p>
<p>ما هي الشفرة الموحدة "يونِكود" ؟ in Arabic (36 letters)</p>
<p>Ի՞նչ է Յունիկոդը ? in Armenian(36 letters)</p>
<p>Какво е Unicode ? in Bulgarian (30 letters)</p>
<p>什麽是Unicode(統一碼/標準萬國碼)? in Trad'l Chinese</p>
<p>什么是Unicode(统一码)? in Simplified Chinese</p>
<p>Što je Unicode? in Croatian (30 letters)</p>
<p>Co je Unicode? in Czech (48 letters)</p>
<p>Hvad er Unicode? in Danish(29 letters)</p>
<p>Wat is Unicode? in Dutch(26 letters)</p>
<p>Kio estas Unikodo? in Esperanto(31 letters)</p>
<p>Mikä on Unicode? in Finnish(29 letters)</p>
<p>Qu'est ce qu'Unicode? in French</p>
<p>რა არის უნიკოდი? in Georgian</p>
<p>Was ist Unicode? in German</p>
<p>Τι είναι το Unicode; in Greek (Monotonic)</p>
<p>Τί εἶναι τὸ Unicode; in Greek (Polytonic)</p>
<p>מה זה יוניקוד (Unicode)? in Hebrew</p>
<p>Mi az Unicode? in Hungarian</p>
<p>Hvað er Unicode? in Icelandic</p>
<p>Gịnị bụ Yunikod? in Igbo</p>
<p>Que es Unicode? in Interlingua</p>
<p>Cos'è Unicode? in Italian</p>
<p>ユニコードとはか?in Japanese</p>
<p>유니코드에 대해? in Korean</p>
<p>Kas tai yra Unikodas? in Lithuanian</p>
<p>Што е Unicode? in Macedonian</p>
<p>X'inhu l-Unicode? in Maltese</p>
<p>Unicode гэж юу вэ? in Mongolian</p>
<p>Unicode, qu'es aquò? in Occitan</p>
<p>يونی‌کُد چيست؟ in Persian</p>
<p>Czym jest Unikod? in Polish</p>
<p>O que é Unicode? in Portuguese</p>
<p>Ce este Unicode? in Romanian</p>
<p>Что такое Unicode? in Russian</p>
<p>Šta je Unicode? in Serbian (Latin)</p>
<p>Шта je Unicode? in Serbian</p>
<p>Kaj je Unicode? in Slovenian</p>
<p>¿Qué es Unicode? in Spanish</p>
<p>Vad är Unicode? in Swedish</p>
<p>Unicode คืออะไร? in Thai</p>
<p>Що таке Юнікод? in Ukrainian</p>
<p>Što je Unicode? in Upper Sorbian</p>
<p>Evrensel Kod Nedir? in Turkish</p>
<p>ﻳﯘﻧﯩﻜﻮﺩ ﺩﯨﮕﻪﻥ ﻧﯩﻤﻪ؟ in Uyghur</p>
<p>Unicode dégen néme? in Uyghur (Latin)</p>
<p>Unicode là gì? in Vietnamese</p>
<p>Beth yw Unicode? in Welsh</p>
</body>
</html>

 

Those languages include: Albanian, Arabic, Armenian, Bulgarian, Traditional and Simplified Chinese, Croatian, Czech, Danish, Dutch, Esperanto, Finnish, French, Georgian, German, Greek, Hebrey, Hungarian, Icelandic, Igbo, Interlingua, Italian, Japanese, Korean, Lithuanian, Macedonian, Maltese, Mongolian, Occitan, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, Thai, Ukrainian, Upper Sorbian, Turkish, Uyghur, Vietnamese, and Welsh.

Using standard UTF-8 in your charset should be perfectly sufficient for most use-cases. However, you may find that you need to use a language that is not supported out of the box from the list above. That is where you can leverage a custom font in the next section.

Custom Fonts and Foreign Language in PDF

We live in the USA, so any language other than English is foreign to us – forgive our ignorance, but for simplicity, that is the terminology we will be using here. We received an inquiry from a customer who had a requirement to use Thai language characters with the Roboto font. They were testing various fonts using our approach for including custom fonts in PDFs here. They ran into the issue as shown here:

Now why would the Thai language work on some custom fonts, but not Roboto? After a little research, we discovered the wikipedia article on Roboto here: https://en.wikipedia.org/wiki/Roboto which states that the font only supports Latin, partial Greek, and Cyrillic. Unfortunately, no Thai support for Roboto.

So what do you do?

As it turns out Google has been working on a project called Noto which is an alternative to Roboto. Their goal is to support and standardize as many languages as possible. And so all one needs to do is download their Thai Noto font here: https://www.google.com/get/noto/#sans-thai and you’re good to go.

See a working HTML sample here.

 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<style>
@font-face {
font-family: 'Noto Sans Thai';
src: url('https://s3.amazonaws.com/vo-random/ShareX/2019/02/NotoSansThai-Regular.ttf') format('truetype');
}
</style>
</head>
<body>
<div>
<h1 style="font-family:Noto Sans Thai,sans-serif">Noto Sans Thai: ใครไปมีตน้องแบมที่ภูเก็ตไปรั<wbr>บกันได้น๊า</h1>
</div>
</body>
</html>

 

This should work for any language that Noto offers.

What is API2PDF?

API2PDF is a web service that helps you generate PDF files at massive scale. Convert HTML to PDF, Web Pages to PDF, MS Office files to PDF, and merge PDFs together. Grab an API key from here and check out our documentation to get started in minutes!