Cron – In Kubernetes, how can a container created from a CronJob find out when it was scheduled

cronkubernetes

We have a Kubernetes CronJob resource that runs a Job every minute. I need the container that results from the Job to know when it was scheduled to run. Is this possible?


Background info just in case it's useful

The hierarchy of Kubernetes resources is this: CronJob -> Job -> Pod -> initContainer -> Container -> PHP CLI command.

I have noticed the container will start between 20 seconds and 10 minutes after the minute which we wanted it to run. There are 2 reasons for the significant startup time:

  1. The init-container does quite a few things including pulling docker images, and it also does a cp of the >300MB app sourecode into a volume.
  2. The container resource limits are rather high, e.g 2G memory, meaning there sometimes isn't enough capacity in the cluster so the cluster-autoscaler has to provision a new node to run the Job Pod, and the bootstrapping process can take a while before the new node joins the cluster.

This start up delay can have interesting effects such as a Job scheduled later can beat a Job scheduled earlier to run because it happens to have the capacity it needs to run whereas the previous one didn't so is waiting for a new node to start.


Some things I've looked at so far to solve my problem

  1. I looked at the DownWards API so the pod can look in /etc/labels to know what labels it has but unfortunately it does not provide pod startup time, only pod name.
  2. I looked at using a dynamic value in a Pod label, i.e. the current timestamp. But as far as I know, this isn't possible?

Versions

Kubernetes v1.10, running in AWS EKS. Job is a PHP 7.2 CLI command.


Steps to reproduce

  1. Set up a Kubernetes cluster with a cluster-autoscaler installed and enabled.
  2. Create a CronJob. Put the following yaml in a file called cron_test.yaml

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: tomtest
      labels:
        app: test
        tier: test
        tester: tom

    spec:
     schedule: "* * * * *"
     jobTemplate:
       metadata:
         name: tomtest-crons

       spec:
         template:
           metadata:
             labels:
                app: test
                tier: test
                tester: tom
                build_id: tom3

           spec:

             containers:
             - name: cron
               image: giantswarm/tiny-tools
               imagePullPolicy: IfNotPresent
               env:
               - name: TOMTEST
                 value: "3"
               args:
               - /bin/sh
               - -c
               - date;echo hi;sleep 600;echo bye;date
             restartPolicy: Never
             resources:
               requests:
                 cpu: "1"
                 memory: "2G"
               limits:
                 cpu: "1"
                 memory: "2G"
  1. Start the CronJob on your cluster: kubectl create -f cron_test.yaml
  2. This will launch a container each minute which does nothing but sleep for 10 minutes.
  3. Wait a few minutes, the containers will start to stack up and because they have high resource limits it's likely the cluster-autoscaler will kick in and add a new node or 2. If not, increase the resource limits further.
  4. Do kubectl get pods to find Pods that seem late – i.e. that have a startup time that isn't exactly a minute from the previous one.
  5. Inspect the Pod info: kubectl get pod tomtest-123-456 -o=yaml – notice there is a creationTimestamp field and a startTime but these are not when the Pod was actually scheduled.
  6. When finished, clean up: kubectl delete CronJob tomtest (This removes all Jobs and Pods as well)

Best Answer

I need the container that results from the Job to know when it was scheduled to run. Is this possible?

The short answer is yes.

You can get information about the job and the pod starting/creating timestamp from API-server.

All you need is to call $api-server-ip:port/api/v1/namespace/$namespace-name/pods/$podname

It will receive JSON with details about pod. You can parse this JSON and get the timestamp. The only thing necessary is a pod name (which is usually its hostname). That's all you need to get the timestamp. For parsing JSON you may use any JSON library for any programing language.

Related Topic