Redhat – How to diagnose very bad and slow ext3 behavior

ext3redhat

I'm managing an old admin server running Redhat WS4 update 3, and we have an ext3 volume where I had a large (30GB) sqlite database mounted on /opt.

Everytime I do large queries/inserts into this database it raises the IO waits so high that we cannot login to the server anymore, nor sudo to another user, nor edit a crontab file (vi never quits).

I'm replacing sqlite with mysql and while backuping the 19GB or mysql directory, I encounter the same problem.

Note that these operations are done with a regular user.
The server is a PROLIANT DL385 G1 with kernel 2.6.9-34.ELsmp in 64bits.

I'm now considering remounting the volume as ext2 to see if journaling is the source to my problem, but I honestly don't really know what to check next.

Every serious file copy ends up blocking the server for other users trying to log on, and server gets back to normal once the copy ends.

I need to pointers to where to look next to explain such behavior (old disk getting slower ? bad kernel with known bug ? corrupt journaling which triggers thousands of superfluous reads/writes ? etc…)

Thanks in advance.

Best Answer

Replying to my own question, as I finally found the real source of the problem.

1_ syslog.conf was configured to log in files and immediatly flush 2_ our proxies where recently configured to use this server syslog to log LDAP authentication attempts. These happen at a rate of several per second because of stupid (or misconfigured) update programs, a-la Adobe updater.

In fine, the server was CONSTANTLY flushing buffers to disk and that showed everytime we tried to write to big files.