Prometheus Alert Rule for Absent Discovered Target

prometheus

I'm trying to write a general rule to fire alert when a discovered target goes missing. In particular kubernetes pods annotated for scraping and auto-discovered using kubernetes_sd_configs.

Expressions of the form: absent(up{job="kubernetes-pods"}==1) do not return any additional labels which were available as part of the up time series. If a pod is deleted (say by mistake), it disappears as a target from prometheus. An alert based on absent() is fired, but I have no information about what pod has gone missing.

I think the same happens for auto-discovered kubernetes services. If it's deleted by mistake, it just disappears as a monitored target. I'm not sure if the behavior is the same for target_groups (https://prometheus.io/blog/2015/06/01/advanced-service-discovery/) with ip range – that is if the physical node is turned off the metrics just stop and up == 0 is not available.

What is the correct way to detect when an auto-discovered target is gone in a general way? Or do I need to hard code rules for each service/node/pod explicitly, even though it was auto discovered?

Best Answer

Or do I need to hard code rules for each service/node/pod explicitly, even though it was auto discovered?

Yes, you need a rule for every individual thing to you to alert on being missing as Prometheus doesn't know about their labels from anywhere - service discovery is not returning it.

The usual alert is absent(up{job="kubernetes-pods"})

Related Topic