Because EC2 is a shared hosting environment (your host shares the same hardware with other hosts), you can see substantial variability in I/O operations. EBS volumes are essentially NAS and share the same NIC with network traffic. Each physical host has only a 1Gb connection to the backbone. So, not only do you have contention with other customer's network operations, you also have network contention with their and your disks. In practice, the network contention is not ordinarily a problem unless you are sharing the box with many other high-traffic customers. You can get around some of that by using larger instances (larger instances take up a larger percentage of the box and thus have fewer shared resources).
What kind of iops are you experiencing at peak and during these problem periods? (sar -d tps column)
What is your steal time during these periods? (iostat -x 1 or sar -u).
You can increase your IOP capacity, which should help your iowait time, by software RAIDing multiple EBS volumes together. It sounds kludgy, but it actually works. This will not solve network contention problems, but with your traffic, I highly doubt you are saturating the link. It is possible that another customer is, however, and causing you some pain.
Sometimes, unfortunately, a simple solution to this type of problem is to simply respin the instance. It will likely come up on a different host with different shared customers. It is somewhat common practice for EC2 customers to spin instances, run some benchmarks, and respin if they are unhappy with the results.
Another recommendation is to split your web and database tiers into different servers. A single server with web/db is usually a bad idea for a large number of reasons and in this case is probably making it even more difficult to diagnose the bottleneck.
Disk IOs per device (IOs/second)
With traditional hard drives this is a very important number. I/O operation is a read or write operation to disk. With rotational spindles you can get around from dozens to perhaps 200 IOPS per second, depending on the disk speed and its usage pattern.
This is not all to it: modern operating systems do have I/O schedulers which try to merge several I/O requests as one and make things faster that way. Also the RAID controllers and so on do perform some smart I/O request reordering.
Disk latency per device (Average IO wait)
How long it took from performing the I/O request to an individual disk to actually receive the data from there. If this hovers around couple of milliseconds, you are OK, if it's dozens of ms, then you are starting to see your disk subsystem sweating, if it's hundreds of more ms, you are in big trouble, or at least have a very, very slow system.
IO Service Time
How your disk subsystem (possibly containing lots of disks) is performing overall.
IOStat (blocks/second read/written)
How many disk blocks were read/written per second. Look for spikes and also the average. If average starts to near the maximum throughput of your disk subsystem, it's time to plan for performance upgrade. Actually, plan that way before that point.
Available entropy (bytes)
Some applications do want to get "true" random data. Kernel gathers that 'true' randomness from several sources, such as keyboard and mouse activity, a random number generator found in many motherboards, or even from video/music files (video-entropyd and audio-entropyd can do that).
If your system runs out of entropy, the applications wanting that data stall until they get their data. Personally in the past I've seen this happening with Cyrus IMAP daemon and its POP3 service; it generated a long random string before each login, and on a busy server that consumed the entropy pool very quickly.
One way to get rid of that problem is to switch the applications to use only semi-random data (/dev/urandom), but that's not among this topic anymore.
VMStat (running/I/O sleep processes)
Not thought about this one before, but I would think that this tells you about per-process I/O statistics, or mainly if they are running some I/O or not, and if that I/O is blocking I/O activity or not.
Disk throughput per device (bytes/second read/written)
This is purely bytes read/written per second, and more often this is more human-readable form than blocks, which may vary. Block size may differ because of the disks used, file system (and its settings) used, and so on. Sometimes the block size might be 512 bytes, other times 4096 bytes, sometimes something else.
inode table usage
With file systems having dynamic inodes (such as XFS), nothing. With file systems having static inodes maps (such as ext3), everything. If you have combination of static inodes, a huge file system and huge number of directories and small files, you might encounter a situation where you cannot create more files on that partition, even though in theory there would be lots of free space left. No free inodes == bad.
Best Answer
I was trying to find an answer and finally I've reached my point.
Solution was to change MySQL tmpdir parameter in my.cnf from /tmp/ (which is allocated on disk) to /dev/shm (this one is on RAM)
Because of 48.7% [Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100] of all temporary tables are being written to disk and when tmpdir is allocated on disk, then IOPS increased.