05.Cluster Maintenance

CKA (Certified Kubernetes Administrator)/Kode Kloud

05.Cluster Maintenance - OS Upgrades

seulseul 2022. 1. 21. 14:41

Cluster Maintenance

1) OS Upgrades
2) Cluster Upgrade Process
3) Backup and Restore Methods

01. Let us explore the environment first. How many nodes do you see in the cluster?

Including the controlplane and worker nodes.

ask : 2

root@controlplane:~# kubectl get nodes
NAME           STATUS   ROLES                  AGE   VERSION
controlplane   Ready    control-plane,master   16m   v1.20.0
node01         Ready    <none>                 16m   v1.20.0

02. How many applications do you see hosted on the cluster?

Check the number of deployments.

ask :) 1

root@controlplane:~# kubectl get deployments
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
blue   3/3     3            3           27s

03. Which nodes are the applications hosted on?

ask :) node01

root@controlplane:~# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
blue-746c87566d-8mq86   1/1     Running   0          84s   10.244.1.2   node01   <none>           <none>
blue-746c87566d-bglgl   1/1     Running   0          84s   10.244.1.3   node01   <none>           <none>
blue-746c87566d-xmsws   1/1     Running   0          84s   10.244.1.4   node01   <none>           <none>

04. We need to take node01 out for maintenance.

Empty the node of all applications and mark it unschedulable.

Node node01 Unschedulable
Pods evicted from node01

# hint
Run the command kubectl drain node01 --ignore-daemonsets

root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-nrqgl, kube-system/kube-proxy-74gmz
evicting pod default/blue-746c87566d-xmsws
evicting pod default/blue-746c87566d-8mq86
evicting pod default/blue-746c87566d-bglgl
pod/blue-746c87566d-8mq86 evicted
pod/blue-746c87566d-bglgl evicted
pod/blue-746c87566d-xmsws evicted
node/node01 evicted

# cka 실제 출제 문제

Q1. 특정 노드를 사용불가 상태로 만들어야함.

kubectl drain <node-to-drain> --ignore-daemonsets

Tip

Drain & Cordon

node01을 업그레이드 하기위해서는 먼저 일시정지를 시켜줘야한다. 그럴땐 다음과같은 drain 명령어로 처리.
kubectl drain node01 --ignore-daemonsets
다시 활성화를 시켜주려면 uncordon 명령어로 처리
kubectl uncordon node01
기존에 node01에서 돌던 pod들은 다른 활성화(taint가 되지않은)된 node로 넘어가게되고,
node01을 다시 활성화 시켜준다해도 복귀하지않는다.

만약에 다른 활성화된 node로도 못옮기는상태에서 첫번째 명령어를 실행시키게된다면

error: unable to drain node "node01", aborting command...
There are pending nodes to be drained:
node01
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-app

위와같은 에러문이 뜨게된다. 즉 node01에 있는 hr-app이라는 pod때문에 비활성화(drain)를 못시켜준다는 의미다.

이럴경우엔 맨뒤에 --force 명령어를 넣어서 무시하고 drain시켜줄수가있다.

kubectl drain node01 --ignore-daemonsets --force
주의해야할점은, 이렇게 --force명령어를 실행하게될경우 hr-app은 영원히 날라가게된다.

만일, hr-app을 절대 없애지 말아야하는 pod이라고 가정할때, 다음과 같은 명령어로 해당 pod이 삭제되는걸 방지할수있다.

cordon은 drain과 유사하지만, drain은 기존 Node에 있던 Pod이 다른 node로 옮겨지는반면에,

cordon은 노드안에 있는 pod들도 잠시 멈추는 효과를 갖고있다.
kubectl cordon node01

05. What nodes are the apps on now?

ask : controlplane

root@controlplane:~# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
blue-746c87566d-bljgm   1/1     Running   0          2m55s   10.244.0.4   controlplane   <none>           <none>
blue-746c87566d-c249z   1/1     Running   0          2m55s   10.244.0.6   controlplane   <none>           <none>
blue-746c87566d-m5tss   1/1     Running   0          2m55s   10.244.0.5   controlplane   <none>           <none>

06. The maintenance tasks have been completed. Configure the node node01 to be schedulable again.

Node01 is Schedulable

# hint
Run the command kubectl uncordon node01

root@controlplane:~# kubectl uncordon node01
node/node01 uncordoned

07. How many pods are scheduled on node01 now?

ask : 0

root@controlplane:~# kubectl get pods -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
blue-746c87566d-bljgm   1/1     Running   0          12m   10.244.0.4   controlplane   <none>           <none>
blue-746c87566d-c249z   1/1     Running   0          12m   10.244.0.6   controlplane   <none>           <none>
blue-746c87566d-m5tss   1/1     Running   0          12m   10.244.0.5   controlplane   <none>           <none>

08. Why are there no pods on node01?

# hint
Running the uncordon command on a node will not automatically schedule pods on the node.

When new pods are created, they will be placed on node01.

~~1) node01 did not upgrade successfully~~

~~2) node01 is faulty~~

3) Only when new pods are created they will be scheduled (정답)

~~4) node01 is cordoned~~

09. Why are the pods placed on the controlplane node?

Check the controlplane node details.

~~1) controlplane node is cordoned~~

2) controlplane node does not have any taints (정답)

~~3) you can never have pods on master nodes~~

~~4) controlplane node is faulty~~

~~5) controlplane node has taints set on it~~

# hint
Use the command kubectl describe node controlplane

# solution
Since there are no taints on the controlplane node,

all the pods were started on it when we ran the kubectl drain node01 command.

10. Time travelling to the next maintenance window… (info)

11. We need to carry out a maintenance activity on node01 again.

Try draining the node again using the same command as before:

kubectl drain node01 --ignore-daemonsets

Did that work?

ask : NO

root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
error: unable to drain node "node01", aborting command...

There are pending nodes to be drained:
 node01
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-app
root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 already cordoned

12. Why did the drain command fail on node01? It worked the first time!

~~1) node01 was not upgraded correctly the last time~~

~~2) no pods on node01~~

~~3) node01 tainted~~

4) there is a pod in node01 which is not part of a replicaset

root@controlplane:~# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
blue-746c87566d-bljgm   1/1     Running   0          31m     10.244.0.4   controlplane   <none>           <none>
blue-746c87566d-c249z   1/1     Running   0          31m     10.244.0.6   controlplane   <none>           <none>
blue-746c87566d-m5tss   1/1     Running   0          31m     10.244.0.5   controlplane   <none>           <none>
hr-app                  1/1     Running   0          4m17s   10.244.1.5   node01         <none>           <none>

13. What is the name of the POD hosted on node01 that is not part of a replicaset?

ask : hr-app

14. What would happen to hr-app if node01 is drained forcefully? (Try it and see for yourself.)

~~hr-app will be re-created on master~~

~~hr-app will continue to run as a Docker container~~

~~hr-app will be recreated on other nodes~~

hr-app will be lost forever (정답)

A forceful drain of the node will delete any pod that is not part of a replicaset.

info

15. Oops! We did not want to do that! hr-app is a critical application that should not be destroyed.

We have now reverted back to the previous state and re-deployed hr-app as a deployment.

16. hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01.
Mark node01 as unschedulable so that no new pods are scheduled on this node.

Make sure that hr-app is not affected.

Node01 Unschedulable
hr-app still running on node01?

Do not drain node01, instead use the kubectl cordon node01 command. This will ensure that no new pods are scheduled on this node and the existing pods will not be affected by this operation.

root@controlplane:~# kubectl cordon node01
node/node01 cordoned
root@controlplane:~# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
blue-746c87566d-bljgm     1/1     Running   0          42m
blue-746c87566d-c249z     1/1     Running   0          42m
blue-746c87566d-m5tss     1/1     Running   0          42m
hr-app-76d475c57d-dnqqn   1/1     Running   0          2m58s

Bookmark

문제 1. Upgrade Master Node (master node v 1.20 에서 v1.21로 업그레이드)

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrading-control-plane-nodes

Upgrading kubeadm clusters

This page explains how to upgrade a Kubernetes cluster created with kubeadm from version 1.22.x to version 1.23.x, and from version 1.23.x to 1.23.y (where y > x). Skipping MINOR versions when upgrading is unsupported. To see information about upgrading cl

kubernetes.io

https://kubernetes.io/ko/docs/concepts/workloads/pods/disruptions/

중단(disruption)

이 가이드는 고가용성 애플리케이션을 구성하려는 소유자와 파드에서 발생하는 장애 유형을 이해하기 원하는 애플리케이션 소유자를 위한 것이다. 또한 클러스터의 업그레이드와 오토스케일

kubernetes.io

https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/

Safely Drain a Node

This page shows how to safely drain a node, optionally respecting the PodDisruptionBudget you have defined. Before you begin Your Kubernetes server must be at or later than version 1.5. To check the version, enter kubectl version. This task also assumes th

kubernetes.io

저작자표시 (새창열림)

'CKA (Certified Kubernetes Administrator) > Kode Kloud' 카테고리의 다른 글

05.Cluster Maintenance - Backup and Restore Methods (0)	2022.01.24
05.Cluster Maintenance - Cluster Upgrade Process (0)	2022.01.21
04.Application Lifecycle Management - Init Containers (0)	2022.01.21
4.Application Lifecycle Management - Multi Container PODs (0)	2022.01.21
4.Application Lifecycle Management - Secrets (0)	2022.01.21

현재글05.Cluster Maintenance - OS Upgrades

seulseul