CKA (Certified Kubernetes Administrator)/Kode Kloud

10.Troubleshooting - Control Plane Failure

seulseul 2022. 2. 3. 17:58
LABS – CERTIFIED KUBERNETES ADMINISTRATOR WITH PRACTICE TESTS > TROUBLESHOOTING
Troubleshooting

01. Application Failure
02. Control Plane Failure
03. Worker Node Failure
04. Troubleshoot Network
 

01. The cluster is broken again. We tried deploying an application but it's not working.

Troubleshoot and fix the issue.


Start looking at the deployments.

클러스터가 다시 손상되었습니다. 애플리케이션 배포를 시도했지만 작동하지 않습니다.

문제를 해결하고 수정합니다.


배포를 살펴보기 시작합니다.
  • Fix the cluster
Check the status of all control plane components and identify the component's pod which has an issue.

모든 컨트롤 플레인 구성 요소의 상태를 확인하고 문제가 있는 구성 요소의 포드를 식별합니다.
Run the command: kubectl get pods -n kube-system. 

Check the kube-scheduler manifest file and fix the issue.
The command run by the scheduler pod is incorrect.

Here is a snippet of the YAML file.

spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
    - --port=0

 

02. Scale the deployment app to 2 pods.


 
  • Scale Deployment to 2 PODs
kubectl scale --replicas=2 deployment app

03.Even though the deployment was scaled to 2, the number of PODs does not seem to increase. Investigate and fix the issue.

 

 

Inspect the component responsible for managing deployments and replicasets.


  • Fix issue
  • Wait for deployment to actually scale
Run the command: kubectl get po -n kube-system and check the logs of kube-controller-manager pod to know the failure reason by running command: kubectl logs -n kube-system kube-controller-manager-controlplane
Then check the kube-controller-manager configuration file at /etc/kubernetes/manifests/kube-controller-manager.yaml and fix the issue.

root@controlplane:/etc/kubernetes/manifests# kubectl -n kube-system logs kube-controller-manager-controlplane
Flag --port has been deprecated, see --secure-port instead.
I0725 07:25:16.842138       1 serving.go:331] Generated self-signed cert in-memory
stat /etc/kubernetes/controller-manager-XXXX.conf: no such file or directory
root@controlplane:/etc/kubernetes/manifests# 
The configuration file specified (/etc/kubernetes/controller-manager-XXXX.conf) does not exist.
Correct the path: /etc/kubernetes/controller-manager.conf

 

04. Something is wrong with scaling again.

We just tried scaling the deployment to 3 replicas. But it's not happening.


Investigate and fix the issue.

 
  • Fix Issue
  • Wait for deployment to actually scale

 

Check the volume mount path in kube-controller-manager manifest file at /etc/kubernetes/manifests.

Just as we did in the previous question, 

inspect the logs of the kube-controller-managerpod:

root@controlplane:/etc/kubernetes/manifests# 

kubectl -n kube-system logs kube-controller-manager-controlplane

Flag --port has been deprecated, see --secure-port instead.

I0725 07:29:06.155330       1 serving.go:331] 

Generated self-signed cert in-memory

unable to load client CA file "/etc/kubernetes/pki/ca.crt":

open /etc/kubernetes/pki/ca.crt: no such file or directory

root@controlplane:/etc/kubernetes/manifests# 

It appears the path /etc/kubernetes/pki is not mounted from the controlplane 
to the kube-controller-manager pod. If we inspect the pod manifest file,

we can see that the incorrect hostPath is used for the volume:

WRONG

- hostPath:
      path: /etc/kubernetes/WRONG-PKI-DIRECTORY
      type: DirectoryOrCreate

CORRECT: ``yaml

hostPath: path: /etc/kubernetes/pki type: DirectoryOrCreate ```

Once the path is corrected, 
the pod will be recreated and our deployment should eventually scale up to 3 replicas.