Nginx – HTTP Redirect of Cyrillic URLs: this example works without URLs being encoded. Why

301-redirectnginxredirectredirection

As per the https://www.rfc-editor.org/rfc/rfc5987, HTTP header fields should be returned using the ISO-8859-1 character encoding.

This applied also to the Location field used for the redirection.
However, looking at the following example, I can't figure out how the redirect work despite the URL is not encoded.

http://goo.gl/m5fDF0

I ran different tools including the Google Chrome Developer, and the location field is definitely is encoded, but using a CURL or software like Screaming Frog return a location written with cyrillic characters.
In theory the redirect should resolve in a 404, but I got a 200.

Any idea of how this is possible?

Best Answer

Yes, in fact the redirect contains octets that are above 7-bit (greater than 0x80 hexadecimal). Various application will convert those octets to various visual representation on your screen; that depends on what encoding they decide to use.

If somebody would use UTF-8 they would likely get a fine cyrillic text, but that's accidental; off-topic to the question.

https://www.rfc-editor.org/rfc/rfc7230#section-3.2 states quite precisely that:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

and

obs-text = %x80-FF

This means, that practically any octets can be sent. The software that displays a header, for example a browser that converts the octets to some visible representation on your screen, should use ISO-8859-1 for this conversion.

But the server that receives the data in a HTTP session is also free to use the octets for his operations, which is something that does not involve displaying any visual representation on any screen. In this case HTTP server uses the octets to serve you a page. Since HTTP server just gets some octets of input and produces some octets of output, the "encoding" does not really apply here (the HTTP server never needs to convert bytes into something that it shows on a screen or on a printer).

Related Topic