I have a GKE cluster which, for the sake of simplicity runs just Prometheus, monitoring each member node. Recently I recently upgraded the API server to 1.6 (which introduces RBAC), and had no issues. I then added a new node, running version 1.6 kubelet. Prometheus could not access the metrics API of this new node.
So, I added a ClusterRole
, ClusterRoleBinding
and a ServiceAccount
to my namespace, and configured the deployment to use the new ServiceAccount. I then deleted the pod for good measure:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
secrets:
- name: prometheus-token-xxxxx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
name: prometheus-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
spec:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
serviceAccount: prometheus
serviceAccountName: prometheus
...
But the situation remains unchanged.
The metrics endpoint returns HTTP/1.1 401 Unauthorized
, and when I modify the Deployment to include another container with bash + curl installed and make the request manually, I get:
# curl -vsSk -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$NODE_IP:10250/metrics
* Trying $NODE_IP...
* Connected to $NODE_IP ($NODE_IP) port 10250 (#0)
* found XXX certificates in /etc/ssl/certs/ca-certificates.crt
* found XXX certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* server certificate verification SKIPPED
* server certificate status verification SKIPPED
* common name: node-running-kubelet-1-6@000000000 (does not match '$NODE_IP')
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: CN=node-running-kubelet-1-6@000000000
* start date: Fri, 07 Apr 2017 22:00:00 GMT
* expire date: Sat, 07 Apr 2018 22:00:00 GMT
* issuer: CN=node-running-kubelet-1-6@000000000
* compression: NULL
* ALPN, server accepted to use http/1.1
> GET /metrics HTTP/1.1
> Host: $NODE_IP:10250
> User-Agent: curl/7.47.0
> Accept: */*
> Authorization: Bearer **censored**
>
< HTTP/1.1 401 Unauthorized
< Date: Mon, 10 Apr 2017 20:04:20 GMT
< Content-Length: 12
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host $NODE_IP left intact
- Why doesn't that token allow me to access that resource?
- How does one check the access granted to a ServiceAccount?
Best Answer
I run into the same issue and created ticket https://github.com/prometheus/prometheus/issues/2606 for this and out of it's discussion updated the configuration examples via PR https://github.com/prometheus/prometheus/pull/2641.
You can see the updated relabeling for the kubernetes-nodes job at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L76-L84
Copied for reference:
For RBAC itself you need to run Prometheus with it's own service account which you create with
Make sure to pass that service account into the pod with the following pod spec:
And then the Kubernetes manifests for setting up the appropriate RBAC role and binding to give the prometheus service account access to the required API endpoints at https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml
Copied for reference
Replace the namespace in all manifests to correspond to the one you run Prometheus in and then apply the manifest with an account with Cluster Admin rights.
I haven't tested this in a cluster without ABAC fallback, so the RBAC role might still be missing something essential.