Google Takeout – Fix Non-ASCII Character Issues

gmailgoogle-takeout

I have downloaded emails from Gmail via Google Takeout, one label to be exact. Looking closer at the downloaded contents (in MBOX format), it turns out that all non-ASCII characters have been replaced by the byte sequence EF BF BD: the UTF-8 encoded Unicode REPLACEMENT CHARACTER (U+FFFD: �)!

I would hope there is a solution to that…?

Edit: The emails do not include any Content-Type header, so Gmail cannot know which character set they contain. Gmail displays the emails just fine. It probably guesses that it's ISO-Latin-1, or whatever works best for the specific email. Takeout then just assumes UTF-8 and converts every weird character into the replacement character…

I have not found any way around this, using Takeout. Except making sure the original emails are in UTF-8, but that's not always in my control.

Best Answer

I don't know if there is a way to avoid this annoying problem with takeout, but you may be able to work around it by setting up and using an IMAP or POP interface to download the mail onto your computer (which could be running something like the Thunderbird client).