Mysql – InnoDB recovery

innodbMySQL

My MySQL server run to crash due to (page file write error). I did a disk diagnosis but no errors found. I restarted MySQL server. It scanned the bin log like for 1-2 hours and recorded log as below:

InnoDB: Doing recovery: scanned up to log sequence number 51 2341175808
InnoDB: Doing recovery: scanned up to log sequence number 51 2346418688
InnoDB: Doing recovery: scanned up to log sequence number 51 2351661568
InnoDB: Doing recovery: scanned up to log sequence number 51 2356904448
InnoDB: Doing recovery: scanned up to log sequence number 51 2362147328

After that, I see a lot of errors as below:

InnoDB: Number of pending reads 128, pending pread calls 0
InnoDB: Error: InnoDB has waited for 50 seconds for pending
InnoDB: reads to the buffer pool to be finished.
InnoDB: Number of pending reads 128, pending pread calls 0
InnoDB: Error: InnoDB has waited for 50 seconds for pending
InnoDB: reads to the buffer pool to be finished.
InnoDB: Number of pending reads 128, pending pread calls 0
InnoDB: Error: InnoDB has waited for 50 seconds for pending
InnoDB: reads to the buffer pool to be finished.

I have wait a few hours but it just adds those lines to error log repeatly with no sign to stop.

What does above log messages mean? How can I make my MySQL server running again? I have a 80GB InnoDB database running on this server. Is there a way to force recover it without undo any uncommited transactions or try to recover the data from page file which caused the crash? I have no problem drop those data.

I tried "sudo -u mysql /usr/sbin/mysqld –innodb_force_recovery=6" to restart MySQL server but got new repeated errors as below:

InnoDB: stored checksum 2440779633, prior-to-4.0.14-form stored checksum 3425185587
InnoDB: Page lsn 51 2450779673, low 4 bytes of lsn at page end 2450779673
InnoDB: Page number (if stored to page already) 10824,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an index page where index id is 4294967295 0
InnoDB: (index "CLUST_IND" of table "SYS_IBUF_TABLE_0")
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 10824.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.

What should I do now?

Best Answer

Without the error that caused the crash, version of MySQL, and your my.cnf, it is difficult to determine what exactly the problem is....That being said, here is some generic advice....

You can find the table that has the corrupt page in a number of ways. The easiest way to shut down the server, and run innochecksum against all of the tables in your database. If it finds any problems, you can start the database in recovery mode by setting innodb_force_recovery and attempt to run a SELECT INTO OUTFILE of the table to dump the contents, then LOAD DATA FROM INFILE to load it into a new table. Start innodb_force_recovery at 1 and if you get a crash while dumping the table(s), continue to increment the innodb_force_recovery value until you can dump the data without it crashing. Make sure no clients are connecting while you do this.

Percona also has the Percona Data Recovery Tool for InnoDB that might minimize downtime, but they require some expertise to use and has potential to mess things up even more. Also, in reality, you can run innochecksum on a live database, but you will be likely to get false positives for corrupt pages. In this case, you can then down the server and only innochecksum the tables that returned page errors while it was live. On a live server, a successful innochecksum will mean that the table is fine, while a failed innochecksum might not be accurate.

When you get this situated, I would recommend creating a slave of this database so that if this happens again, you can easily just begin using the slave and not worry about complex recovery procedures.

Related Topic