Rory,
First of all, you are correct for wanting to monitor what gets created in you databases. While we all implement steps to prevent mistakes, you cannot assume that the mistakes won't creep in. I do a very similar thing as most of our infrastructure demands UTF8.
The following queries are good for checking stats:
SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.SCHEMATA GROUP BY DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME;
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_COLLATION, COUNT(0) AS COUNT FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL GROUP BY TABLE_COLLATION;
# to filter schema use TABLE_SCHEMA in the where clause
SELECT CHARACTER_SET_NAME, COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME IS NOT NULL AND COLLATION_NAME IS NOT NULL GROUP BY CHARACTER_SET_NAME, COLLATION_NAME;
# to filter schema use TABLE_SCHEMA in the where clause
The following queries would be good from a cron job that emails you if any results are found:
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE DEFAULT_CHARACTER_SET_NAME NOT LIKE '%utf8%' OR DEFAULT_COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL AND TABLE_COLLATION NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME NOT LIKE '%utf8%' OR COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
These queries work for MySQL >= 5.0. If you need help writing the cron job, let me know.
I also have a problem with my developers (mostly the Ruby on Rails team) creating InnoDB tables when they don't need them. To keep this under wraps I monitor the staging database with a cron that calls:
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'InnoDB';
mysqldump
and show create table
are usually pretty reliable workhorses and shouldn't change encodings on their own. If they use a "binary" character set, chances are that is actually the character set given to the table (not necessarily the columns, though). Can you re-check the current table character set in a database browser?
I would try a plain, simple mysqldump
into a file, restoring the dump into a mirror database, and take a close look at the special characters. Different from backups taken using phpMyAdmin, those from the command line tend to work very well in my experience.
Best Answer
Use the
option when creating your database, like so:
also, read the docs...