The command being run inside the containers is:
echo never | tee /sys/kernel/mm/transparent_hugepage/enabled
Both containers run as privileged. But in the kubernetes docker container the command fails with error:
tee: /sys/kernel/mm/transparent_hugepage/enabled: Read-only file system
and under just plain docker run -it --privileged alpine /bin/sh
the command works fine.
I have used docker inspect
on both k8s and non-k8s containers to verify privileged status and don't see anything else listed that should cause this problem – I've run diff
between both outputs and then used docker run
with modifications to try and reproduce the problem in plain docker but failed (it stays working). Any idea why the kubernetes docker container fails and the plain docker container succeeds?
This is reproducible with the pod definition here:
apiVersion: v1
kind: Pod
metadata:
name: sys-fs-edit
spec:
containers:
- image: alpine
command:
- /bin/sh
args:
- -c
- echo never | tee /sys/kernel/mm/transparent_hugepage/enabled && sysctl -w net.core.somaxconn=8192 vm.overcommit_memory=1 && sleep 9999999d
imagePullPolicy: Always
name: sysctl-buddy
securityContext:
privileged: true
Workaround
While I still don't know the cause for the discrepancy, the problem can be mitigated by remounting /sys as read-write.
apiVersion: v1
kind: Pod
metadata:
name: sys-fs-edit
spec:
containers:
- image: alpine
command:
- /bin/sh
args:
- -c
- echo never | tee /sys/kernel/mm/transparent_hugepage/enabled && sysctl -w net.core.somaxconn=8192 vm.overcommit_memory=1 && sleep 9999999d
imagePullPolicy: Always
name: sysctl-buddy
securityContext:
privileged: true
volumeMounts:
- mountPath: /sys
name: sys
readOnly: false
volumes:
- hostPath:
path: /sys
name: sys
Best Answer
On kubernetes it works a bit differently. Setting
privileged: true
in asecurityContext
of acontainer
is not enough to be able to modify anysysctl
of such container.Take a look at this section of the official kubernetes docs that describes Using sysctls in a Kubernetes Cluster. As you can read here:
So in short, there are safe and unsafe sysctls. Most of them are considered as unsafe, even many of those which are namespaced. Unsafe sysctls need to be additionally enabled by the cluster admin on a node-by-node basis:
So you cannot simply set any sysctl arbitrarily even from a
privileged
container running on your kubernetes cluster.