What’s the point of adding Unicode identifier support to various language implementations

unicode

I personally find reading code full of Unicode identifiers confusing. In my opinion, it also prevents the code from being easily maintained. Not to mention all the effort required for authors of various translators to implement such support. I also constantly notice the lack (or the presence) of Unicode identifiers support in the lists of (dis)advantages of various language implementations (like it really matters). I don't get it: why so much attention?

Best Answer

When you think unicode, you think Chinese or Russian characters, which makes you think of some source code written in Russian you've seen on the internet, and which was unusable (unless you know Russian).

But if unicode can be used in a wrong way, it doesn't mean it's bad by itself in source code.

When writing code for a specific field, with unicode, you can shorten your code and make it more readable. Instead of:

const numeric Pi = 3.1415926535897932384626433832795;
numeric firstAlpha = deltaY / deltaX + Pi;
numeric secondAlpha = this.Compute(firstAlpha);
Assert.Equals(math.Infinity, secondAlpha);

you can write:

const numeric π = 3.1415926535897932384626433832795;
numeric α₁ = Δy / Δx + π;
numeric α₂ = this.Compute(α₁);
Assert.Equals(math.∞, α₂);

which may not be easy to read for an average developer, but is still easy to read for a person who uses mathematical symbols daily.

Or, when doing an application related to SLR photography, instead of:

int aperture = currentLens.GetMaximumAperture();
Assert.AreEqual(this.Aperture1_8, aperture);

you can replace the aperture by it's symbol ƒ, with a writing closer to ƒ/1.8:

int ƒ = currentLens.GetMaximumƒ();
Assert.AreEqual(this.ƒ1¸8, ƒ);

This may be inconvenient: when typing general C# code, I would prefer writing:

var productPrices = this.Products.Select(c => c.Price);
double average = productPrices.Average()
double sum = this.ProductPrices.Sum();

rather than:

var productPrices = this.Products.Select(c => c.Price);
double average = productPrices.x̅()
double sum = productPrices.Σ();

because in the first case, IntelliSense helps me to write the whole code nearly without typing and especially without using my mouse, while in the second case, I have no idea where to find those symbols and would be forced to rely on the mouse to go and search them in the auto-completion list.

This being said, it's still useful in some cases. currentLens.GetMaximumƒ(); of my previous example can rely on IntelliSense and is as easy to type as GetMaximumAperture, being shorter and more readable. Also, for specific domains with lots of symbols, keyboard shortcuts may help typing the symbols quicker than their literal equivalents in source code.

The same, by the way, applies to comments. No one wants to read code full of comments in Chinese (unless you know well Chinese yourself). But in some programming languages, unicode symbols can still be useful. One example is footnotes¹.


¹ I certainly wouldn't enjoy footnotes in C# code where there is a strict set of style rules of how to write comments. In PHP on the other hand, if there are lots of things to explain, but those things are not very important, why not putting them at the bottom of the file, and create a footnote in the PHPDoc of the method?