CPU load (what you have) is not the same as CPU utilization (what you're trying to get). Load is a measurement of the average number of processes waiting on the processor(s), whilst utilization is the amount of time the processor was doing work during a given time snapshot. You probably want to look at the counter:
perf_counter[\Processor(_Total)\% Processor Time]
It's been a while since I've used zabbix, so syntax might have changed.
I think that the bottleneck is the disc. Here are my reasons for this:
You have a pretty busy web server.
Zabbix is slow, I suspect to be reads from the disk (can be from the network too).
Run again the strace, and find the file descriptor in Zabbix
Then find if the file descriptor is a file or a socket:
ls -l /prod/<PID_of_straced_process>/fd/<FD_from_strace>
EDIT1:
You should not change the TIME_WAIT timeouts. The problem with small HTTP keep-alive, or with no HTTP keep-alive is that you increase the latency and bandwidth. Instead you should increase a little bit the HTTP keep-alive and install/enable SPDY.
EDIT2:
Use dstat -ta 10
and compare the first line with the rest. The first line is the average since boot. Next lines are 10 seconds average (the last parameter).
EDIT3:
Check if you do not have packets lost, use something like smokeping to monitor the server and the website from outside your network. You have a significant number of connections in CLOSING, FIN_WAIT1, FIN_WAIT2, SYN_RECV, LAST_ACK. I think your network is congested or you have a lot of short lived connections (confirmed by the high TIME_WAIT/ESTABILISHED ratio). See: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Protocol_operation
Best Answer
from what i gather, this might require some coding. basically, this question boils down to "how to create custom perfcounter" - when it's done and working, zabbix agent should be able to query it without any additional work.
there's some simplistic-looking howto at ms site : http://support.microsoft.com/kb/317679