VB.NET Unicode – How Does CHRW Produce Unicode Codes?

unicodevb.net

http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html says there are 1 million unicode characters and around 240k of which are already assigned.

1 million > 240k > 65k

However,

http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.strings.chrw says that chrw accept 65k characters.

Not only that chrw accept integer. Integer is 4 bytes in vb.net right and can store way more than 65k characters.

The numbers do not match up and so what am I missing?

Best Answer

Chances are that this method only supports the Basic Multilingual Plane of Unicode. That Plane contains the lower 64k of codepoints and can be represented with a 16 bit data type.

There was a time when the BMP was all the Unicode standard defined and at that time many languages and/or runtimes added "Unicode support". They thought that 16 bit will always be enough and therefore "Unicode" equals "16 bit characters" in many places (even though this is wrong these days). To be fair: The Unicode consortium also thought that 16 bit ought to be enough for everybody.

Unicode 2.0 however introduced additional planes and it was clear that 16 bit are no longer enough for representing every possible Unicode codepoint.

The "solution" to this is usually to use UTF-16 instead of UCS-2. I'm not only faulting .NET for this: Java has fallen into the same trap, having a 16-bit char data type and now having to support String instances that need 2 "characters" to represent a single codepoint.

Best Answer

Related Solutions

Unicode – Efficient Trie Implementation for Unicode Strings

A Unicode sentinel value I can use

Related Topic