Php – What does the lack of Unicode support in PHP mean

PHPunicode

How can the lack of Unicode support in PHP affect a PHP web app?

Best Answer

Any website that purports to be multi-lingual or to deal with documents or content that is not representable in Latin-1 is likely to be problematic if you don't have Unicode support.

  • For example, http://amazon.jp would be toast without Unicode.

Another problematic use-case is when content might contain mathematical and other symbols.


However, your example of Facebook suggests that in fact you can in fact "do" Unicode in PHP. Alternatively, http://facebook.jp is not implemented in PHP. Either way, the home page says:

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

and has lots of UTF-8 content.


OK, here's what the PHP doc for "String" says:

"A string is series of characters, therefore, a character is the same as a byte. That is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some basic Unicode functionality."

So PHP does have Unicode support. It is just that "native strings" are not Unicode based.

So what it means is that if you need to deal with any language (or set of languages) that cannot be encode in an 8-bit character set, your PHP code is going to be more cumbersome at any point where it needs to process content as (real) characters.

Related Topic