Mysql – XFS Check/Repair Fails

MySQLraidsoftware-raidxfs

I have a mysql database server on an EC2 instance with 3 drives in a Raid 0 array. This morning the server crashed and after inspecting the logs, I noticed a Structure Needs Cleaning error(i.e. System Error Code 117). With that I attempted to run an xfs_check on the effected drive but the following is what returned:

xfs_repair: /dev/md0 contains a mounted filesystem

fatal error -- couldn't initialize XFS library

I have all the necessary tools/packages installed, so I checked the syslog and it returned this:

Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file /build/buildd/linux-ec2-2.6.32/fs/xfs/xfs_da_btree.c.  Caller 0xffffffff81261bb5

After attempting a xfs_repair I still see the same output as above and same syslog entry.

As the issue is only in one database directory, is there a way to either fix the issues mentioned or have the mysql ignore the database directory so as to continue in operation (i.e. manually delete the database while leaving lib directory intact). Any suggestions would be helpful.

Best Answer

unmount whatever is currently mounted. xfs_repair is a bit picky about that at times. lsof /mountedpartition to see what might still be holding that partition open if you are unable to unmount it.

Depending on how it shut down, you might need to do -L to clear the logs, and regardless of your preference, use -P on the xfs_repair - xfs_repair will usually run through and get stuck and just sit there, the -P allows it to continue.

If you want to check progress, from another terminal session, strace -p (pid of xfs_repair) and you can see if it is still doing something.

I don't know what version of xfsprogs/xfsdump you're running, but, see if there is an upgrade available for your OS - there are a number of somewhat recent upgrades (last 8 months) that address some of the odd buffer overflows. XFS Tools are currently at 3.1.5.

You won't be able to selectively delete directories during a repair very easily - though, that error message suggests you will see some data loss, or, will find the files in /lost+found. Depending on how bad the structure is, you might find little data loss. If you really wanted to go in depth, you could purposely traverse the meta structure and clear certain bits, but, I think you might find that to be extraordinarily complex.

Alternatively, mkfs and restore from backups.

In rereading, it appears that the filesystem is mounted, what you can do is shut down mysql, move the one bad database directory out of the mysql directory, restart mysql. The filesystem should be cleaned at some point, but, if the only issue is that one directory and you are able to mount the filesystem, you should be able to write/move that directory out of the way.