Windows – Monitoring server uptime with telegraf / grafana on windows

grafanamonitoringuptimewindows

I'm aware that there are countless solutions for monitoring uptime on windows servers but I want to specifically ask about Influx's server agent, telegraf.

We currently have a nice influxdb/grafana/telegraf stack monitoring our linux machines for basic system metrics like cpu, mem, disk, uptime etc and recently I've started to incorporate some of our windows boxes into this setup too.

This is easy enough and simply enabling the input plugins in the telegraf.conf and updating the counters for your needs file works as you would expect. I can query the data in the influxdb from grafana's UI.

The one I'm having difficulty with is the 'system' input plugin. On the linux machines this plugin provides metrics that are essentially the same as the output of the unix 'uptime' command – uptime, no. of users, load average etc. We can then have a nice colour coded 'singlestat' uptime chart in the grafana UI for our individual Linux machines. ie, coloured green if the machine has been up for more than second, red if not…

Can anybody suggest whether or not I can do something similar for measuring and displaying the uptime of the windows boxes using the telegraf agent and grafana UI?

I can post up the telegraf.conf on request.

Thanks,
Sam

Best Answer

You can use the Windows "System" object, with the "System Up Time" counter for server up time in seconds. For example, add this to your monitored Windows host's telegraf.conf:

[[inputs.win_perf_counters]]
...
    #####  System  #####
    [[inputs.win_perf_counters.object]]
        ObjectName = "System"
        Counters = ["System Up Time"]
        Instances = ["------"]
        Measurement = "win_system"
        #IncludeTotal=false #Set to true to include _Total instance when querying for all (*).

See win_perf_counters plugin

Related Topic