Since the leading cause seemed to be journaling, that would have been my next step. In order to remove journaling, however, I would need to attach the EBS volume to another instance. I decided to test the procedure out using a (day old) snapshot, however, before removing journaling, I re-ran the 10 minute iotop test (on the test instance). To my surprise, I saw normal (i.e. non-elevated) values, and this was the first time that flush-202
didn't even show up on the list. This was a fully functional instance (I restored snapshots of my data as well) - there had been no changes to the root volume in the 12 hours or so since it was taken. All tests showed that the same processes were running on both servers. This led me to believe that the cause must come down to some requests that the 'live' server is processing.
Looking at the differences between the iotop outputs of the server displaying the problem and the seemingly identical server that had no problem, the only differences were flush-202
and php-fpm
. This got me thinking that, while a long shot, perhaps it was a problem related to the PHP configuration.
Now, this part wasn't ideal - but since none of the services running on the live server would suffer from a few minutes of downtime it didn't really matter. To narrow down the problem, all the major services (postfix, dovecot, imapproxy, nginx, php-fpm, varnish, mysqld, varnishncsa) on the live server were stopped, and the iotop test rerun - there was no elevated disk i/o. The services were restarted in 3 batches, leaving php-fpm until the end. After each batch of restarts, the iotop test confirmed that there was no issue. Once php-fpm was started the issue returned. (It would have been easy enough to simulate a few PHP requests on the test server, but at this point, I wasn't sure it was actually PHP).
Unfortunately, the server would be rather pointless without PHP, so this wasn't an ideal conclusion. However, since flush-202
seemed to suggest something relating to memory (despite having ample free memory), I decided to disable APC. Rerunning the iotop test showed that disk i/o levels were normal. A closer look into the matter showed that mmap was enabled, and that apc.mmap_file_mask
was set to /tmp/apc.XXXXXX
(the default for this install). That path sets APC to use file-backed mmap. Simply commenting this line out (therefore using the default - anonymous memory backed) and rerunning the iotop test showed the problem was resolved.
I still do not know why none of the diagnostics run did not identify the writes as coming from php and going to the apc files in the /tmp directory. The only test that even mentioned the /tmp directory was lsof
, however, the files it listed were non-existent.
I've worked in building a system for automation on boats, and there was a prerequisite: in every moment the power could go down and everything must boostrap again correctly.
My solution was to build a Gentoo-based initramfs system, with only a rw folder for application and configurations (this is the approach used by every router/firewall vendors). This solution add an additional layer of complexity when dealing with system upgrades, but assure you that the system will ALWAYS boot.
Regarding your specific question, you should keep EXT4 journal enabled for having faster fsck (of a few secods), use the data=journal mount option, lower the commit option or use sync option to keep buffers always empty.
Refs: http://www.kernel.org/doc/Documentation/filesystems/ext4.txt
Best Answer
Honestly, I'd hold off on ext4 right now for production use.
There are other options if you're running into real performance problems with the filesystem (and I can understand that situation, at my last job we had performance limitations in an application due to ext3). Depending on your chosen distribution, you might be able to use jfs, xfs, or reiserfs. All three will generally outperform ext3 in different ways, and all three are much more tested and stable than ext4 right now.
So, my recommendation would be multiple parts. First, investigate thoroughly to make sure you're optimizing in the right place. Test your application on different filesystems and ensure that the performance is improved enough to make a filesystem change valid.
Also, depending on your application, adding more RAM might improve performance. Linux, by default, will make use of any RAM that isn't committed to applications as disk cache. Sometimes having a few GB of "unused" RAM can have a significant performance increase on boxes with heavy disk activity.
Finally, what's your timeline requirement here? If ext3 wasn't cutting it and I had to build a machine with a different filesystem today, I'd probably use xfs or jfs. If I could push it off for 6-8 months, I'd probably wait and see how ext4 has shaped up.