Linux poll() with 1 fd slow after a write, fast after a read

linuxredhat

Problem

I am observing that poll() calls are taking milliseconds when immediately called after a write() operation, yet taking only microseconds when immediately called after a read() operation.

This is in code that uses the ACE C++ communication library.

I'm trying to understand why there is such a significant difference in the system CPU time taken to call poll() after a write() call compared to a read() call.

System

[root@host ~]# lsb_release -a
LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 5.9 (Tikanga)
Release:    5.9
Codename:   Tikanga

strace

write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000048>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.006036>
read(234, "\4\0S\0\0\0\0\0\1\0\0\0
write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000192>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.004996>
read(234, "\4\20S\0\0\0\0\0\1\0\0\0\200\2\0\0\0\0\3\0\1\0\0\0\0\10\1\0\200\2\0\0"..., 256) = 256 <0.000076>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.000022>
read(234, "\2\0\0\0\241\352\1\0\0\0\0\0\25\0\0\0\7\0\0\0\242\1\0\0\25\0\0\0\251\1\0\0"..., 640) = 640 <0.000337>
write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000096>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.004102>
read(234, "\4\20S\0\0\0\0\0\1\0\0\0!\1\0\0\0\0\3\0\1\0\0\0\0\10\0\0!\1\0\0"..., 256) = 256 <0.000037>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.000157>
read(234, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\10\1\0\0\0\0\0\0\0\0\0\0"..., 289) = 289 <0.000047>
write(234, "\4\0S\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\3\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256 <0.000036>
poll([{fd=234, events=POLLIN}], 1, -1)  = 1 ([{fd=234, revents=POLLIN}]) <0.001882>

Discussion

Even though the ACE library has an option to use it there is probably little benefit in moving to epoll() because poll() is only being called with one file descriptor in this case.

As the system becomes loaded with 600 odd threads this poll() call is consuming around 98% of the system call CPU time of the process involved. So I'm looking for ways to make this more efficient. It does appear, however, that calling poll() after a write() is the issue as this is consistently in the order of 100x slower than after a read() call.

Best Answer

The difference between the time between one syscall and another is simply the difference of time between the precedent and the next one. It is simply a measurement purpose timestamp so that users will know how much time a syscall takes after another and to find out whether the app is hanging on one particular syscall.

However I am a bit befuddled to see that reads are quicker here. Simply because writes are often async but reads tend to be in straight line and if we are hopping across different filesystem blocks, disk seek can make it slow. But see the revents (requested events - third argument in poll() call) is saying POLLIN meaning data to be read. Perhaps that might explain the apparent slowness. Albeit, I will need the entire strace to be sure.