Twitter – How Did Someone Hack the 140-Character Limit?

twitter

Today I was surprised to see that a guy has hacked Twitter's 140-character limit. The message consists of 930 characters. How could this be possible?

The direct link to this tweet is here. For convenience, I'm copying the screenshot of full tweet below:

enter image description here

Best Answer

The message contains Unicode surrogate code points that are improperly encoded as UTF-8. This kind of improper encoding is also called CESU-8. It appears that some Twitter interfaces will accept the CESU-8 encoded surrogate code points as characters (for the purpose of the 140 character limit), but for display purposes it expects valid UTF-8 and these are not valid UTF-8 sequences. So it instead displays the 3 bytes of each of these sequences as 3 C-style octal escape sequences of 4 characters each, and each surrogate code point ends up being displayed using 12 characters.

For example \355\240\265\355\263\220 when decoded as C-escaped UTF-8, without rejecting surrogates as would normally be done when decoding UTF-8, decodes to the surrogate pair U+D835 U+DCD0. Treating this surrogate pair as UTF-16, as would be done when decoding CESU-8, produces the Unicode character U+1D4D0 MATHEMATICAL BOLD SCRIPT CAPITAL A (𝓐).

If the C-style octal escaping is decoded and then the result is interpreted as CESU-8, it comes out to:

π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ Π’Π²ΠΈΡ‚Ρ‚ΠΈΠΌ ΠΈ Π½Π΅ ограничиваСмся людиии!!!!!! 140 Π½Π΅ ΠΏΡ€Π΅Π΄Π΅Π»!=)))) π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨

Here it is as an image, for those without a full set of Unicode fonts installed:

π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ Π’Π²ΠΈΡ‚Ρ‚ΠΈΠΌ ΠΈ Π½Π΅ ограничиваСмся людиии!!!!!! 140 Π½Π΅ ΠΏΡ€Π΅Π΄Π΅Π»!=)))) π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨ π“π“›π“œπ“π“£π“¨