What is a good way to tell whether a string contains text in a Right To Left language.
I have found this question which suggests the following approach:
public bool IsArabic(string strCompare)
{
char[] chars = strCompare.ToCharArray();
foreach (char ch in chars)
if (ch >= '\u0627' && ch <= '\u0649') return true;
return false;
}
While this may work for Arabic this doesn't seem to cover other RTL languages such as Hebrew. Is there a generic way to know that a particular character belongs to a RTL language?
Best Answer
Unicode characters have different properties associated with them. These properties cannot be derived from the code point; you need a table that tells you if a character has a certain property or not.
You are interested in characters with bidirectional property "R" or "AL" (RandALCat).
Here's the complete list as of Unicode 3.2 (from RFC 3454):
Here's some code to get the complete list as of Unicode 6.0:
Note that these values are Unicode code points. Strings in C#/.NET are UTF-16 encoded and need to be converted to Unicode code points first (see Char.ConvertToUtf32). Here's a method that checks if a string contains at least one RandALCat character: