Kubernetes – Pull from Image Private Network Fails to Respect /etc/hosts

(Reposted from original post at: https://stackoverflow.com/questions/73012913/kubernetes-pull-from-image-private-network-fails-to-respect-etc-hosts-of-serv as this is a more appropriate place to ask the question)

I am running a small 3 node test kubernetes cluster (using kubeadm) running on Ubuntu Server 22.04, with Flannel as the network fabric. I also have a separate gitlab private server, with container registry set up and working.

The problem I am running into is I have a simple test deployment, and when I apply the deployment yaml, it fails to pull the image from the gitlab private server.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: platform-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: platform-service
  template:
    metadata:
      labels:
        app: platform-service
    spec:
      containers:
        - name: platform-service
          image: registry.examle.com/demo/platform-service:latest

Ubuntu Server: /etc/hosts (the relevant line)

192.168.1.30 registry.example.com

The Error

Failed to pull image "registry.example.com/demo/platform-service:latest": 
rpc error: code = Unknown desc = failed to pull and unpack image 
"registry.example.com/deni/platform-service:latest": failed to resolve reference 
"registry.example.com/demo/platform-service:latest": failed to do request: Head 
"https://registry.example.com/v2/demo/platform-service/manifests/latest": dial tcp 
xxx.xxx.xxx.xxx:443: i/o timeout

The 'xxx.xxx.xxx.xxx' is related to my external network, to which exists a domain name in the DNS, however all of my internal networks are set up to attach to the internal network representation, and 'registry.example.com' is a representation of my own domains.

Also to note:

docker pull registry.example.com/demo/platform-service:latest

From the command line of the server, works perfectly fine, it is just not working from kubernetes deploy yaml.

The problem

While the network on the server, and the host files on the server are configured correctly, the docker image is not resolving because when I apply it is not using the correct IP (that is configured in hosts), rather a public IP that is a different server. And the reason for the timeout is because the public facing server is not set up the same.

When I run kubectl apply -f platform-service.yaml why does it not respect the hosts file of the server, and is there a way configure hosts inside Kubernetes.

(If this problem is not clear, I apologize, I am quite new, and still learning terminology, maybe why google is not helping me with this problem.)

The closest S/O I could find is:

https://stackoverflow.com/questions/62940403/kubernetes-not-able-pull-image-from-private-registry-having-private-domain-point

(SO Answer #1): hostAliases (this is for the pod itself, not pulling the image), also, installed through apt/package manager rather than snap. With the rest of the answer suggests changing the distribution, which I would rather go with my current setup than change it.

— Update(s):

I have narrowed down the problem (I believe) to needing settings in containerd, but have not yet found how to set the hosts to match the server's /etc/hosts file
I created a second kubernetes cluster, using k3s instead of kubeadm: instructions found at https://computingforgeeks.com/install-kubernetes-on-ubuntu-using-k3s/ and am encountering the same problem.

Update

Attempts to add hosts to coredns not working either:
(https://stackoverflow.com/questions/65283827/how-to-change-host-name-resolve-like-host-file-in-coredns)

kubectl -n kube-system edit configmap/coredns

...
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        hosts custom.hosts registry.example.com {
            192.168.1.30 registry.example.com
            fallthrough
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
...

deleted the coredns pods (so they are recreated)

and still the docker pull on the deployment fails with the external ip address instead of the internal address.

Best Answer

After going through many different solutions and lots of research and testing. The answer was actually very simple.

Solution in my case

The /etc/hosts file MUST contain the host for the registry (and possibly the entry for the gitlab instance as well) on EVERY node of the cluster including the master node.

192.168.1.30 registry.example.com
192.168.1.30 gitlab.example.com    # Necessary in my case, not sure required

Once I included that on each of the 2 slaves, it attempted to pull the image, and failed with credential issues (which I was expecting to see once the hosts issue was resolved). From there I was able to add the credentials and now the image pulls fine from the private registry rather than the public facing registry.

Bonus: Fix for credentials error connecting to private registry (not part of the original question, but part of the setup process for connecting)

After fixing the /etc/hosts issue, you will probably need to set up 'regcred' credentials to access the private registry, Kubernetes documentation provides the steps on that part:

https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

Ubuntu Server: /etc/hosts (the relevant line)

The Error

The problem

— Update(s):

Update

Best Answer

Solution in my case

Bonus: Fix for credentials error connecting to private registry (not part of the original question, but part of the setup process for connecting)

Related Solutions

Docker – Kubernetes deployment “failed to pull image” with local registry, minikube

Kubernetes – Fix Kubeadm Join Fails in Private Network

Related Topic