Detecting Kubernetes OOMKilled Events in GKE Logs

google-kubernetes-enginegoogle-stackdriverkubernetes

I'd like to set up instrumentation for OOMKilled events, which look like this when examining a pod:

Name:   pnovotnak-manhole-123456789-82l2h
Namespace:  test
Node:   test-cluster-cja8smaK-oQSR/10.x.x.x
Start Time: Fri, 03 Feb 2017 14:34:57 -0800
Labels:   pod-template-hash=123456789
    run=pnovotnak-manhole
Status:   Running
IP:   10.x.x.x
Controllers:  ReplicaSet/pnovotnak-manhole-123456789
Containers:
  pnovotnak-manhole:
    Container ID: docker://...
    Image:    pnovotnak/it
    Image ID:   docker://sha256:...
    Port:
    Limits:
      cpu:  2
      memory: 3Gi
    Requests:
      cpu:    200m
      memory:   256Mi
    State:    Running
      Started:    Fri, 03 Feb 2017 14:41:12 -0800
    Last State:   Terminated
      Reason:   OOMKilled
      Exit Code:  137
      Started:    Fri, 03 Feb 2017 14:35:08 -0800
      Finished:   Fri, 03 Feb 2017 14:41:11 -0800
    Ready:    True
    Restart Count:  1
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tder (ro)
    Environment Variables:  <none>
Conditions:
  Type    Status
  Initialized   True
  Ready   True
  PodScheduled  True
Volumes:
  default-token-46euo:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-tder
QoS Class:  Burstable
Tolerations:  <none>
Events:
  FirstSeen LastSeen  Count From                SubObjectPath       Type    Reason    Message
  --------- --------  ----- ----                -------------       --------  ------    -------
  11m   11m   1 {default-scheduler }                      Normal    Scheduled Successfully assigned pnovotnak-manhole-123456789-82l2h to test-cluster-cja8smaK-oQSR
  10m   10m   1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Created   Created container with docker id xxxxxxxxxxxx; Security:[seccomp=unconfined]
  10m   10m   1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Started   Started container with docker id xxxxxxxxxxxx
  11m   4m    2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Pulling   pulling image "pnovotnak/it"
  10m   4m    2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Pulled    Successfully pulled image "pnovotnak/it"
  4m    4m    1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Created   Created container with docker id yyyyyyyyyyyy; Security:[seccomp=unconfined]
  4m    4m    1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole}  Normal    Started   Started container with docker id yyyyyyyyyyyy

All I get from the pod logs is;

{
 textPayload: "shutting down, got signal: Terminated
"
 insertId: "aaaaaaaaaaaaaaaa"
 resource: {
  type: "container"
  labels: {
   pod_id: "pnovotnak-manhole-123456789-82l2h"
   ...
  }
 }
 timestamp: "2017-02-03T22:34:48Z"
 severity: "ERROR"
 labels: {
  container.googleapis.com/container_name: "POD"
  ...
 }
 logName: "projects/myproj/logs/POD"
}

And the kublet logs;

{
 insertId: "bbbbbbbbbbbbbb"   
 jsonPayload: {
  _BOOT_ID: "ffffffffffffffffffffffffffffffff"    
  MESSAGE: "I0203 22:41:11.925928    1843 kubelet.go:1816] SyncLoop (PLEG): "pnovotnak-manhole-123456789-82l2h_test(a-uuid)", event: &pleg.PodLifecycleEvent{ID:"another-uuid", Type:"ContainerDied", Data:"..."}"
 ...

Which doesn't seem like quite enough to uniquely identify this as an OOM event. Any other ideas?

Best Answer

Although the OOMKilled event isn't present in the logs, if you can detect that a pod was killed you can then use kubectl get pod -o go-template=... <pod-id> to determine the reason. As an example straight from the docs:

[13:59:01] $ ./cluster/kubectl.sh  get pod -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}'  simmemleak-60xbc
Container Name: simmemleak
LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]]

If you're doing this programmatically a better alternative to relying on kubectl output is to use the Kubernetes REST API GET /api/v1/pods method. Methods for accessing the API are also given in the documentation.