Java – What are the best practices for monitoring and alerting for low level JVM metrics

javamonitoring

I'm looking to set up monitoring and alerting for java server based app and want to find some best practices for monitoring JVM specific metrics and for designing alerts based on those metrics.

So what are the key JVM metrics to monitor? Some possible contenders:

  • Heap space used
  • CPU usage
  • GC frequency
  • Time spent in GC
  • Thread count
  • Class count
  • Object count

And once you start watching some metrics, what are good alerting strategies for said metrics? CPU usage seems like an easy one, but something like heap space seems good to monitor and be able to view, but it doesn't translate so well into an alertable metric as you expect it will grow to capacity, triggering GC. But something like time spent in GC, especially as a ratio to overall time seems like it has good alerting potential.

I'm not looking for a tool per se (ie. Hyperic or Nagios) to perform the monitoring, but if there is one that has an especially good Java template/default graph/rule set, that would be a handy pointer.

Best Answer

I have used hprof before which bundled together with JRE. It does HEAP and CPU monitoring. I usually use it to monitor CPU usage and check which thread is taking majority of CPU. http://java.sun.com/developer/technicalArticles/Programming/HPROF.html

I also used JProbe before which is a commercial software. http://www.quest.com/jprobe/