Linux – How to tell if linux disk IO is causing excessive (> 1 second) application stalls

linuxperformanceredhatstorage-area-networkveritas

I have a Java application performing a large volume (hundreds of MB) of continuous output (streaming plain text) to about a dozen files a ~~ext3~~ SAN filesystem. Occasionally, this application pauses for several seconds at a time. I suspect that something related to ~~ext3~~ vsfs (Veritas Filesystem) functionality (and/or how it interacts with the OS) is the culprit.

What steps can I take to confirm or refute this theory? I am aware of iostat and /proc/diskstats as starting points.

Revised title to de-emphasize journaling and emphasize "stalls"

I have done some googling and found at least one article that seems to describe behavior like I am observing: Solving the ext3 latency problem

Additional Information

Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Kernel: 2.6.18-194.32.1.el5
Primary application disk is fiber-channel SAN: lspci | grep -i fibre >> 14:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
Mount info: type vxfs (rw,tmplog,largefiles,mincache=tmpcache,ioerror=mwdisable) 0 0
cat /sys/block/VxVM123456/queue/scheduler >> noop anticipatory [deadline] cfq

Best Answer

My guess is that there's some other process that hogs the disk I/O capacity for a while. iotop can help you pinpoint it, if you have a recent enough kernel.

If this is the case, it's not about the filesystem, much less about journalling. It's the I/O scheduler the responsible to arbitrate between conflicting applications. An easy test: check the current scheduler and try a different one. It can be done on the fly, without restarting. For example, on my desktop to check the first disk (/dev/sda):

cat /sys/block/sda/queue/scheduler
=>  noop deadline [cfq]

shows that it's using CFQ, which is a good choice for desktops but not so much for servers. Better set 'deadline':

echo 'deadline' > /sys/block/sda/queue/scheduler
cat /sys/block/sda/queue/scheduler
=>  noop [deadline] cfq

and wait a few hours to see if it improves. If so, set it permanently in the startup scripts (depends on distribution)

Related Solutions

Linux Hard Disk Load – How to Monitor Hard Disk Load on Linux

You can get a pretty good measure of this using the iostat tool.

% iostat -dx /dev/sda 5

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.78    11.03    1.19    2.82    72.98   111.07    45.80     0.13   32.78   1.60   0.64

The disk utilisation is listed in the last column. This is defined as

Percentage of CPU time during which I/O requests were issued to the device (band-width utilization for the device). Device saturation occurs when this value is close to 100%.

Linux – How to run a server on port 80 as a normal user on Linux

Short answer: you can't. Ports below 1024 can be opened only by root. As per comment - well, you can, using CAP_NET_BIND_SERVICE, but that approach, applied to java bin will make any java program to be run with this setting, which is undesirable, if not a security risk.

The long answer: you can redirect connections on port 80 to some other port you can open as normal user.

Run as root:

# iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080

As loopback devices (like localhost) do not use the prerouting rules, if you need to use localhost, etc., add this rule as well (thanks @Francesco):

# iptables -t nat -I OUTPUT -p tcp -d 127.0.0.1 --dport 80 -j REDIRECT --to-ports 8080

NOTE: The above solution is not well suited for multi-user systems, as any user can open port 8080 (or any other high port you decide to use), thus intercepting the traffic. (Credits to CesarB).

EDIT: as per comment question - to delete the above rule:

# iptables -t nat --line-numbers -n -L

This will output something like:

Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    REDIRECT   tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:8080 redir ports 8088
2    REDIRECT   tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 redir ports 8080

The rule you are interested in is nr. 2, so to delete it:

# iptables -t nat -D PREROUTING 2

Best Answer

Related Solutions

Linux Hard Disk Load – How to Monitor Hard Disk Load on Linux

Linux – How to run a server on port 80 as a normal user on Linux

Related Topic