Php – Why exactly can’t PHP have full unicode support

Architecturelanguage-designopen sourcePHPunicode

Everybody knows, that PHP has problems with Unicode. Version 6 is effectively abandoned, because of Unicode implementation difficulties. But I wonder if anyone knows what are the exact reasons? Architecture/design problems, performance concerns, community problems (I bet not), something other?

Best Answer

PHP as a language definitely can have it, but I think the problem is with compatibility with existing programs. Unicode support can break them in subtle ways, which is the most annoying kind of bug to have.

Currently most string-processing functions in PHP are "binary-safe", which means you can use them to process any file in any encoding as well as binary formats like image data, etc.

With addition of Unicode strings you'd have to be very careful not to mix Unicode strings with binary strings (pretty hard when your strings come from different sources and you never had to worry about it before). And you couldn't be ignorant about encodings any more (and lots of scripts are ignorant about this!)

Another hard, but solvable problem is random access in Unicode strings. Implementation of $string[$offset] changes from trivial to either very slow or little slow and very complex.

Also I think it was a mistake to choose UTF-16 as internal encoding for PHP. It has same problems as UTF-8 (variable width because of surrogate pairs) and inefficiency of UCS-2. Maybe they should scrap that and start again with UTF-8?

</speculation>

Related Topic