The situation: I have a server, on which we have 2-3 projects. Starting not long ago, the server started hanging up (We could not connect to it by ssh, and the connected clients had to wait 20 minutes for top to give results)
Early today I managed to execute gstat while it was in this state and saw, that it stays on 100% on da0, da0s1 and da0s1f. I dont quite know what those ids meen, but I understand that some processes just kill the HD by bombing it down with requests.
I ask of some propositions. I dont know how to find the culpit and can't prevent this.
I have freebsd on server.
Best Answer
If your version of FreeBSD is relatively modern,
top
has a-m
option that shows the top I/O talkers if you supply it with the "io
" parameter:In this case, I'd also use the
-S
option (to show system processes, in case one of them is the culprit). To behave better under load, I would use-q
(to renice it to run at a higher priority), and-u
(to skip reading/etc/passwd
, which should help it load faster).Since it's taking so long to run
top
, I'd tell it to display just two passes of its output (-d 2
), and then run in batch mode (-b
), so it will automatically exit.The first moment that you run
top
in this way, its first section of output will show cumulative I/O counts for a number of process for quite a ways back (maybe since boot time? I'm actually not sure about this). In the first display, you can see who your top talkers have been over time. In the second display, you can see your top talkers in the past two seconds.So, putting it all together, and running a
find
so that some actual I/O is happening:Once you narrow down which process is doing all of the I/O, you can use
truss
or thedevel/strace
orsysutils/lsof
ports to see what your disk-hungry processes are doing. (if your system is very busy, of course, you won't be able to install the ports easily):For example, to see what files and other resources my
ntpd
process is using:... and what system calls it's making (note that this can be resource-intensive):
sysutils/strace
is similar totruss
, but you'll need to have the/proc
filesystem mounted:... and then it will work:
Good luck - let us know what you discover! Once you have the process(es) identified, I may be able to assist further.
EDIT: Note that running
lsof
,truss
andstrace
can themselves be intensive. I've done some minor updates to try to reduce their impact. Also, if a process is spawing many children quickly, you may have to telltruss
orstrace
to follow child processes with the-f
argument.