LABS – CERTIFIED KUBERNETES ADMINISTRATOR WITH PRACTICE TESTS > TROUBLESHOOTING
Troubleshooting
01. Application Failure
02. Control Plane Failure
03. Worker Node Failure
04. Troubleshoot Network
01. Fix the broken cluster
- Fix node01
# hint
Step1. Check the status of services on the nodes.
Step2. Check the service logs using
journalctl -u kubelet
Step3. If it's stopped then start the stopped services.
Alternatively, run the command:
# ssh node01
# service kubelet start
Step1: Check the status of the nodes:
root@controlplane:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 6m38s v1.20.0
node01 NotReady <none> 4m59s v1.20.0
root@controlplane:~#
Step 2: SSH to node01 and check the status of container runtime (docker, in this case) and the kubelet service.
root@node01:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: inactive (dead) since Sun 2021-07-25 07:46:58 UTC; 5min ago
Docs: https://kubernetes.io/docs/home/
Process: 1917 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited,
Main PID: 1917 (code=exited, status=0/SUCCESS)
Since the kubelet is not running, attempt to start it by running:
root@node01:~# systemctl start kubelet
root@node01:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2021-07-25 07:53:35 UTC; 2s ago
Docs: https://kubernetes.io/docs/home/
node01 should go back to ready state now.
02. The cluster is broken again. Investigate and fix the issue.
- Fix cluster
journalctl -u kubelet -f
kubelet has stopped running on node01 again.
Since this is a systemd managed system,
we can check the kubelet log by running journalctl.
Here is a snippet showing the error with kubelet:
root@node01:~# journalctl -u kubelet
.
.
Jul 25 07:54:50 node01 kubelet[5681]: F0725 07:54:50.831238 5681 server.go:257]
unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt:
open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory
Jul 25 07:55:01 node01 kubelet[5710]: F0725 07:55:01.339531 5710 server.go:257]
.
.
There appears to be a mistake path used for the CA certificate
in the kubelet configuration. This can be corrected
by updating the file /var/lib/kubelet/config.yaml.
Once this is fixed, restart the kubelet service,
(like we did in the previous question) and node01 should
return back to a working state.
03. The cluster is broken again. Investigate and fix the issue.
- Fix Cluster
Check the kubelet.conf file at /etc/kubernetes/kubelet.conf.
Once again the kubelet service has stopped working. Checking the logs, we can see that this time, it is not able to reach the kube-apiserver.
root@node01:~# journalctl -u kubelet
.
.
Jul 25 08:05:26 node01 kubelet[7966]: E0725 08:05:26.426155 7966 reflector.go:138] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://controlplane:6553/api/v1/pods?fieldSelector=spec.nodeName%3Dnode01&limit=500&resourceVersion=0": dial tcp 10.1.126.9:6553: connect: connection refused
.
.
As we can clearly see, kubelet is trying to connect to the API server on the controlplane node on port 6553. This is incorrect.
To fix, correct the port on the kubeconfig file used by the kubelet.
apiVersion: v1
clusters:
- cluster:
certificate-authority-data:
--REDACTED---
server: https://controlplane:6443
Restart the kubelet after this change.
'CKA (Certified Kubernetes Administrator) > Kode Kloud' 카테고리의 다른 글
MockExam (1) (0) | 2022.02.04 |
---|---|
10.Troubleshooting - Troubleshoot Network (0) | 2022.02.04 |
10.Troubleshooting - Control Plane Failure (0) | 2022.02.03 |
09. Install - Cluster Installation using Kubeadm (0) | 2022.02.03 |
07. Networking - Ingress Networking (2) (0) | 2022.02.03 |