Google Sheets – Syntax for Unicode Characters in Regex

formulasgoogle sheetsregexunicode

What's the syntax to use for general Unicode characters in regular expressions in Google Sheets?

Use case: When using REGEXREPLACE with user submitted strings in Google Sheets, I have come across a special Unicode white-space character (No-break space, hex 0xA0 or dec 160) that resisted my efforts to be matched by the standard white-space character class \s.

Google products use RE2 for regular expressions. Google Sheets supports RE2 except Unicode character class matching. Learn more on how to use RE2 expressions. Source

The reference is quite dense and a bit confusing, I haven't managed to make it work yet using it.

Best Answer

There are three ways of matching Unicode characters according to Google Sheets' regular expression documentation:

  • Using exactly two digit hex code: \xA0
  • Using up to three digits octal code: \240
  • Using any length of hex: \x{A0} or \x{0A0} or \x{0000000A0} etc. (any reasonable number of leading 0's is allowed apparently)

Other ways of specifying characters that belong to RE2 are sadly not supported by Google Sheets, e.g. \%d160