Google-sheets – How to get Google Sheets to correctly export Japanese to PDF

fontgoogle sheetsunicode

Whether I choose "Download as PDF" or I choose "Print" and "Save as PDF" Google Sheets messes up the formatting on the PDF it generates

Here's what it's supposed to look like and what it looks like in the browser before exporting.

enter image description here

And here was what it exports

enter image description here

If it's not clear it has separated every accent

enter image description here

Note: this appears to be a problem with the way the characters are entered. There are at least 2 ways

  1. A single character that contains the accent
  2. As 2 characters. The character without the accent and then the accent.

    Example: 基づき

If you copy the text above, paste it somewhere and press delete you'll see it goes

   基づき
   基づ
   基つ
   基

Where as this text 基づき will go

   基づき
   基づ
   基

Still, in order to be useful in Japanese it needs to export the first format correctly which it's not doing now.

As a test I put the same code in TextEdit on MacOS, used the delete test to check it was the first version, exported that to PDF and the result was correct.

NOTE: (actually you'll have to edit this message to get the 4 character version of the text above. In the editor pressing delete requires 4 presses where as if you copy and paste from the non-editor version you get the 3 character version)

Best Answer

You can fix this by using NFKC normalization, which reduces sequences of combining characters to single unicode characters where possible. So it won't work on あ゛ but it will work on examples like that above.

In Python:

import unicodedata
a = '[bad text]'
fixed = unicodedata.normalize('NFKC', a)

If you search "nfkc normalization online" there are also some web forms you can paste into that will do it for you.

It's unfortunate you have to do this, but it's probably easier than waiting on Google to fix things.

I am curious how you ended up with documents with a lot of characters like that - while it's true that they won't normally cause problems in most Japanese applications, I would expect most documents to use composed characters by default.