Scenario / Questions

I’m using kubeadm to try to setup a dev master. I’m running into an issue where the healthcheck for kubelet is failing. I’m looking for direction on how to debug that. Running the command that’s suggested for debugging (systemctl status kubelet) don’t see the cause of the error:

kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago
     Docs: http://kubernetes.io/docs/
  Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
 Main PID: 4786 (code=exited, status=1/FAILURE)

Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state.
Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.

Where can I find a specific error message to indicate why this isn’t running?

After running swapoff -a to disable swap, I’m still not able to provision Kubernetes.

Here’s the full output from kubeadm init:

$ kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by that:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    - There is no internet connection; so the kubelet can't pull the following control plane images:
        - gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
        - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
        - gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

I’ve also tried removing the docker repository and installing Docker 1.12 which isn’t runnable – Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...

Find below all possible solutions or suggestions for the above questions..

Suggestion: 1

Found an issue regarding this: https://github.com/kubernetes/kubernetes/issues/53333

Following the previous answer worked for me, but not the resolution offered in the linked issue.

So perhaps, following their suggestion of editing 90-kubeadm.conf (in place of 10-kubeadm.conf) would work

Suggestion: 2

Things were solved by setting --fail-swap-on=false in the systemd script.
Just make the modification on the file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"

then run systemctl daemon-reload and then systemctl restart kubelet

Suggestion: 3

This is all already covered in the issue that Atom posted, so I don’t feel like I’m contributing an awful lot, but I can replicate your issue if swap is turned on. So for me the solution is to disable swap and retry the init:

sudo -i
swapoff -a
kubeadm reset
kubeadm init

The answer posted by dirtbag worked for me as well, but just to be safe after systemctl daemon-reload, I did a full kubeadm reset and kubeadm init, not just a systemctl restart kubelet.

If this doesn’t work for you, can you please paste the new output of kubeadm init after disabling swap?

Suggestion: 4

check the error from: journalctl -xeu kubelet

Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:

cat << EOF > /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

Suggestion: 5

I have had the same issue but on Fedora 30, kubelet 1.15.3, docker-ce 19.03.1
And the output of systemctl status kubelet was contained same as in your case:

Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Steps to solve were:
1. check if you have files kubelet.service and 10-kubeadm.conf on the next paths:

ls /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
ls /usr/lib/systemd/system/kubelet.service

10-kubeadm.conf:

more /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf 

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet.service:

more /usr/lib/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBELET_API_SERVER \
        $KUBELET_ADDRESS \
        $KUBELET_PORT \
        $KUBELET_HOSTNAME \
        $KUBE_ALLOW_PRIV \
        $KUBELET_ARGS
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target
  1. Delete systemd unit for kubelet in /etc/systemd/system/

    rm -R /etc/systemd/system/kubelet.service.d (confirm "y" for each file)
    rm /etc/systemd/system/kubelet.service
    
  2. Reload all systemd unit files, and recreate the entire dependency tree.

    systemctl daemon-reload

  3. restart kubelet

    systemctl restart kubelet

The output of kubelet status then should contain:

  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
  1. Initialize a Kubernetes control-plane node:

    kubeadm reset
    systemctl daemon-reload
    kubeadm init --pod-network-cidr=10.244.0.0/16
    

Note: You have one or more issue:

`- There is no internet connection; so the kubelet can't pull the following control plane images:`

Try to pull them manually:

kubeadm config images pull

You may need to upgrade kubeadm, kubelet, kebectl