C++ – When did C++ compilers start considering more than two hex digits in string literal character escapes

cescapingliteralsstring

I've got a (generated) literal string in C++ that may contain characters that need to be escaped using the \x notation. For example:

char foo[] = "\xABEcho";

However, g++ (version 4.1.2 if it matters) throws an error:

test.cpp:1: error: hex escape sequence out of range

The compiler appears to be considering the Ec characters as part of the preceding hex number (because they look like hex digits). Since a four digit hex number won't fit in a char, an error is raised. Obviously for a wide string literal L"\xABEcho" the first character would be U+ABEC, followed by L"ho".

It seems this has changed sometime in the past couple of decades and I never noticed. I'm almost certain that old C compilers would only consider two hex digits after \x, and not look any further.

I can think of one workaround for this:

char foo[] = "\xAB""Echo";

but that's a bit ugly. So I have three questions:

  • When did this change?

  • Why doesn't the compiler only accept >2-digit hex escapes for wide string literals?

  • Is there a workaround that's less awkward than the above?

Best Answer

GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."

Related Topic