MySQL – Ensuring Database is Fully in UTF8

charsetdatabaseMySQLunicodeutf-8

After some problems with UTF8 and none-UTF8 strings, we're standardising on UTF8. One thing I need to do is check that everything is in UTF8 in the MySQL database? What do I need to check?

  • Server default characterset
  • Default character set of each database
  • Does each text column have a character set? How do I check for that?

I was thinking of putting all this in a nagios plugin to check that everything is in UTF8. Suggestions?

Best Answer

Rory, First of all, you are correct for wanting to monitor what gets created in you databases. While we all implement steps to prevent mistakes, you cannot assume that the mistakes won't creep in. I do a very similar thing as most of our infrastructure demands UTF8.

The following queries are good for checking stats:

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.SCHEMATA GROUP BY DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME;
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_COLLATION, COUNT(0) AS COUNT FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL GROUP BY TABLE_COLLATION;
# to filter schema use TABLE_SCHEMA in the where clause
SELECT CHARACTER_SET_NAME, COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME IS NOT NULL AND COLLATION_NAME IS NOT NULL GROUP BY CHARACTER_SET_NAME, COLLATION_NAME;
# to filter schema use TABLE_SCHEMA in the where clause

The following queries would be good from a cron job that emails you if any results are found:

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE DEFAULT_CHARACTER_SET_NAME NOT LIKE '%utf8%' OR DEFAULT_COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL AND TABLE_COLLATION NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME NOT LIKE '%utf8%' OR COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause

These queries work for MySQL >= 5.0. If you need help writing the cron job, let me know.

I also have a problem with my developers (mostly the Ruby on Rails team) creating InnoDB tables when they don't need them. To keep this under wraps I monitor the staging database with a cron that calls:

SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'InnoDB';