Character encoding: UTF8 vs iso-8859-1

charset

I'm maintaining two generally parallel sites based on a recent release of a well-known php-based CMS. One site is in English, one in Polish. (Polish localization is a standard option for the CMS.) Both are operating normally.

In particular, the Polish site correctly renders Polish diacritic characters as well as a sprinkling of "special" German and Cyrillic characters. When I examine the CMS-generated headers, I see

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

exactly as I would expect. Unicode is the way to go.

The English site renders English characters correctly, of course, plus a similar sprinkling of "special" German and Cyrillic characters are rendered correctly. When I examine the CMS-generated headers, I see

<meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1' />

which is not what I expect, as iso-8859-1 –as far as I can tell– is incapable of rendering Polish diacritics and any Cyrillic. (I suppose I must except the non-diacritic Polish characters and Cyrillic characters that look like Latin ones, but the overlaps are beside the point.)

Q1: On a page declared in the header to be iso-8859-1 encoded, how is it that the Polish diacritics and Cyrillic characters render correctly? Could the browser be reading the BOM or doing an analysis of the actual content and overriding the header declaration? Or what?

Q2: Is there a good technical reason that the default English installation of the CMS should still use iso-8859-1 encoding instead of utf-8? I think all installations should use utf-8 encoding, but there's no pressing reason to convert the English version. Maybe someone can here can think of a good reason?

Best Answer

A1: Probably your web server is configured to send UTF-8 encoding in HTTP header, before HTML is sent. I think you can inspect HTTP headers with Firebug or Chrome developer tools (Resources->http://...->Headers->Response Headers).

A2: Maybe they're still using 8859-1 because they didn't have time to switch to UTF8?