Cluster Maintenance
1) OS Upgrades
2) Cluster Upgrade Process
3) Backup and Restore Methods
01. Let us explore the environment first. How many nodes do you see in the cluster?
Including the controlplane and worker nodes.
ask : 2
root@controlplane:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 16m v1.20.0
node01 Ready <none> 16m v1.20.0
02. How many applications do you see hosted on the cluster?
Check the number of deployments.
ask :) 1
root@controlplane:~# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
blue 3/3 3 3 27s
03. Which nodes are the applications hosted on?
ask :) node01
root@controlplane:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-746c87566d-8mq86 1/1 Running 0 84s 10.244.1.2 node01 <none> <none>
blue-746c87566d-bglgl 1/1 Running 0 84s 10.244.1.3 node01 <none> <none>
blue-746c87566d-xmsws 1/1 Running 0 84s 10.244.1.4 node01 <none> <none>
04. We need to take node01 out for maintenance.
Empty the node of all applications and mark it unschedulable.
- Node node01 Unschedulable
- Pods evicted from node01
# hint Run the command kubectl drain node01 --ignore-daemonsets root@controlplane:~# kubectl drain node01 --ignore-daemonsets node/node01 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-nrqgl, kube-system/kube-proxy-74gmz evicting pod default/blue-746c87566d-xmsws evicting pod default/blue-746c87566d-8mq86 evicting pod default/blue-746c87566d-bglgl pod/blue-746c87566d-8mq86 evicted pod/blue-746c87566d-bglgl evicted pod/blue-746c87566d-xmsws evicted node/node01 evicted
# cka 실제 출제 문제 Q1. 특정 노드를 사용불가 상태로 만들어야함. kubectl drain <node-to-drain> --ignore-daemonsets
Tip
Drain & Cordonnode01을 업그레이드 하기위해서는 먼저 일시정지를 시켜줘야한다. 그럴땐 다음과같은 drain 명령어로 처리.kubectl drain node01 --ignore-daemonsets 다시 활성화를 시켜주려면 uncordon 명령어로 처리 kubectl uncordon node01 기존에 node01에서 돌던 pod들은 다른 활성화(taint가 되지않은)된 node로 넘어가게되고, node01을 다시 활성화 시켜준다해도 복귀하지않는다. 만약에 다른 활성화된 node로도 못옮기는상태에서 첫번째 명령어를 실행시키게된다면 error: unable to drain node "node01", aborting command... There are pending nodes to be drained: node01 error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-app 위와같은 에러문이 뜨게된다. 즉 node01에 있는 hr-app이라는 pod때문에 비활성화(drain)를 못시켜준다는 의미다. 이럴경우엔 맨뒤에 --force 명령어를 넣어서 무시하고 drain시켜줄수가있다. kubectl drain node01 --ignore-daemonsets --force 주의해야할점은, 이렇게 --force명령어를 실행하게될경우 hr-app은 영원히 날라가게된다. 만일, hr-app을 절대 없애지 말아야하는 pod이라고 가정할때, 다음과 같은 명령어로 해당 pod이 삭제되는걸 방지할수있다. cordon은 drain과 유사하지만, drain은 기존 Node에 있던 Pod이 다른 node로 옮겨지는반면에, cordon은 노드안에 있는 pod들도 잠시 멈추는 효과를 갖고있다. kubectl cordon node01 |
05. What nodes are the apps on now?
root@controlplane:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-746c87566d-bljgm 1/1 Running 0 2m55s 10.244.0.4 controlplane <none> <none>
blue-746c87566d-c249z 1/1 Running 0 2m55s 10.244.0.6 controlplane <none> <none>
blue-746c87566d-m5tss 1/1 Running 0 2m55s 10.244.0.5 controlplane <none> <none>
06. The maintenance tasks have been completed. Configure the node node01 to be schedulable again.
- Node01 is Schedulable
# hint
Run the command kubectl uncordon node01
root@controlplane:~# kubectl uncordon node01
node/node01 uncordoned
07. How many pods are scheduled on node01 now?
ask : 0
root@controlplane:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-746c87566d-bljgm 1/1 Running 0 12m 10.244.0.4 controlplane <none> <none>
blue-746c87566d-c249z 1/1 Running 0 12m 10.244.0.6 controlplane <none> <none>
blue-746c87566d-m5tss 1/1 Running 0 12m 10.244.0.5 controlplane <none> <none>
08. Why are there no pods on node01?
# hint
Running the uncordon command on a node will not automatically schedule pods on the node.
When new pods are created, they will be placed on node01.
1) node01 did not upgrade successfully
2) node01 is faulty
3) Only when new pods are created they will be scheduled (정답)
4) node01 is cordoned
09. Why are the pods placed on the controlplane node?
Check the controlplane node details.
1) controlplane node is cordoned
2) controlplane node does not have any taints (정답)
3) you can never have pods on master nodes
4) controlplane node is faulty
5) controlplane node has taints set on it
# hint
Use the command kubectl describe node controlplane
# solution
Since there are no taints on the controlplane node,
all the pods were started on it when we ran the kubectl drain node01 command.
10. Time travelling to the next maintenance window… (info)
Ok
11. We need to carry out a maintenance activity on node01 again.
Try draining the node again using the same command as before:
kubectl drain node01 --ignore-daemonsets
Did that work?
ask : NO
root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
error: unable to drain node "node01", aborting command...
There are pending nodes to be drained:
node01
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-app
root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 already cordoned
12. Why did the drain command fail on node01? It worked the first time!
1) node01 was not upgraded correctly the last time
2) no pods on node01
3) node01 tainted
4) there is a pod in node01 which is not part of a replicaset
root@controlplane:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
blue-746c87566d-bljgm 1/1 Running 0 31m 10.244.0.4 controlplane <none> <none>
blue-746c87566d-c249z 1/1 Running 0 31m 10.244.0.6 controlplane <none> <none>
blue-746c87566d-m5tss 1/1 Running 0 31m 10.244.0.5 controlplane <none> <none>
hr-app 1/1 Running 0 4m17s 10.244.1.5 node01 <none> <none>
13. What is the name of the POD hosted on node01 that is not part of a replicaset?
ask : hr-app
14. What would happen to hr-app if node01 is drained forcefully? (Try it and see for yourself.)
hr-app will be re-created on master
hr-app will continue to run as a Docker container
hr-app will be recreated on other nodes
hr-app will be lost forever (정답)
A forceful drain of the node will delete any pod that is not part of a replicaset.
info
15. Oops! We did not want to do that! hr-app is a critical application that should not be destroyed.
We have now reverted back to the previous state and re-deployed hr-app as a deployment.
Ok
16. hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01.
Mark node01 as unschedulable so that no new pods are scheduled on this node.
Make sure that hr-app is not affected.
- Node01 Unschedulable
- hr-app still running on node01?
Do not drain node01, instead use the kubectl cordon node01 command. This will ensure that no new pods are scheduled on this node and the existing pods will not be affected by this operation.
root@controlplane:~# kubectl cordon node01
node/node01 cordoned
root@controlplane:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
blue-746c87566d-bljgm 1/1 Running 0 42m
blue-746c87566d-c249z 1/1 Running 0 42m
blue-746c87566d-m5tss 1/1 Running 0 42m
hr-app-76d475c57d-dnqqn 1/1 Running 0 2m58s
Bookmark
문제 1. Upgrade Master Node (master node v 1.20 에서 v1.21로 업그레이드)
https://kubernetes.io/ko/docs/concepts/workloads/pods/disruptions/
https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
'CKA (Certified Kubernetes Administrator) > Kode Kloud' 카테고리의 다른 글
05.Cluster Maintenance - Backup and Restore Methods (0) | 2022.01.24 |
---|---|
05.Cluster Maintenance - Cluster Upgrade Process (0) | 2022.01.21 |
04.Application Lifecycle Management - Init Containers (0) | 2022.01.21 |
4.Application Lifecycle Management - Multi Container PODs (0) | 2022.01.21 |
4.Application Lifecycle Management - Secrets (0) | 2022.01.21 |