Cluster Maintenance
1) OS Upgrades
2) Cluster Upgrade Process
3) Backup and Restore Methods
01. We have a working kubernetes cluster with a set of applications running. Let us first explore the setup.
How many deployments exist in the cluster?
ask : 2
root@controlplane:~# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
blue 3/3 3 3 105s
red 2/2 2 2 105s
02. What is the version of ETCD running on the cluster?
Check the ETCD Pod or Process
ask :) v3.4.13
# Hint
Look at the ETCD Logs OR check the image used by ETCD pod.
# Solution
Look at the ETCD Logs using the command
kubectl logs etcd-controlplane -n kube-system
or check the image used by the ETCD pod:
kubectl describe pod etcd-controlplane -n kube-system
root@controlplane:~# kubectl describe pod etcd-controlplane -n kube-system
Name: etcd-controlplane
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: controlplane/10.67.217.9
Start Time: Mon, 24 Jan 2022 04:53:00 +0000
Labels: component=etcd
tier=control-plane
Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.67.217.9:2379
kubernetes.io/config.hash: 985502623bdbef6ebebf0be608405ef3
kubernetes.io/config.mirror: 985502623bdbef6ebebf0be608405ef3
kubernetes.io/config.seen: 2022-01-24T04:52:57.846251613Z
kubernetes.io/config.source: file
Status: Running
IP: 10.67.217.9
IPs:
IP: 10.67.217.9
Controlled By: Node/controlplane
Containers:
etcd:
Container ID: docker://b144d4942e8650ac111a2a5d9c6e3f4ea70c6fb853b748e89bc8e965a7d0ed4d
Image: k8s.gcr.io/etcd:3.4.13-0
Image ID: docker-pullable://k8s.gcr.io/etcd@sha256:4ad90a11b55313b182afc186b9876c8e891531b8db4c9bf1541953021618d0e2
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://10.67.217.9:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://10.67.217.9:2380
--initial-cluster=controlplane=https://10.67.217.9:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://10.67.217.9:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://10.67.217.9:2380
--name=controlplane
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 24 Jan 2022 04:52:40 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
ephemeral-storage: 100Mi
memory: 100Mi
Liveness: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
Startup: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
Events: <none>
03. ip?
ask :) 127.0.0.1:2379
04. Where is the ETCD server certificate file located?
Note this path down as you will need to use it later
# 정답
/etc/kubernetes/pki/etcd/server.crt
# hint
--cert-file=/etc/kubernetes/pki/etcd/server.crt
05. Where is the ETCD CA Certificate file located?
Note this path down as you will need to use it later.
정답 :)
/etc/kubernetes/pki/etcd/ca.crt
hint
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
06. The master nodes in our cluster are planned for a regular maintenance reboot tonight.
While we do not anticipate anything to go wrong, we are required to take the necessary backups.
Take a snapshot of the ETCD database using the built-in snapshot functionality.
클러스터의 마스터 노드는 오늘 밤 정기 유지 관리 재부팅이 예정되어 있습니다.
우리는 잘못될 것으로 예상하지 않지만 필요한 백업을 수행해야 합니다.
내장된 스냅샷 기능을 사용하여 ETCD 데이터베이스의 스냅샷을 만드십시오.
Store the backup file at location /opt/snapshot-pre-boot.db
- Backup ETCD to /opt/snapshot-pre-boot.db
# Backup
ETCDCTL_API=3 etcdctl snapshot save /opt/snapshot-pre-boot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert="/etc/kubernetes/pki/etcd/ca.crt" \
--cert="/etc/kubernetes/pki/etcd/server.crt" \
--key="/etc/kubernetes/pki/etcd/server.key"
# hint
Use the etcdctl snapshot save command.
You will have to make use of additional flags to connect to the ETCD server.
--endpoints: Optional Flag, points to the address where ETCD is running (127.0.0.1:2379)
--cacert: Mandatory Flag (Absolute Path to the CA certificate file)
--cert: Mandatory Flag (Absolute Path to the Server certificate file)
--key:Mandatory Flag (Absolute Path to the Key file)
# solution
root@controlplane:~# ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot-pre-boot.db
Snapshot saved at /opt/snapshot-pre-boot.db
# my answer
root@controlplane:~# ETCDCTL_API=3 etcdctl snapshot save /opt/snapshot-pre-boot.db \
> --endpoints=https://127.0.0.1:2379 \
> --cacert="/etc/kubernetes/pki/etcd/ca.crt" \
> --cert="/etc/kubernetes/pki/etcd/server.crt" \
> --key="/etc/kubernetes/pki/etcd/server.key"
Snapshot saved at /opt/snapshot-pre-boot.db
root@controlplane:~#
07. info Great!
Let us now wait for the maintenance window to finish. Go get some sleep. (Don't go for real)
Ok
It's about 2 AM at Midnight! You get a call!
08. Wake up! We have a conference call!
After the reboot the master nodes came back online, but none of our applications are accessible.
Check the status of the applications on the cluster. What's wrong?
1) Deployments are not present2) Services are not present
3) All of the above4) Pods are not present
# hint
Are you able to see any deployments/pods or services in the default namespace?
09. Luckily we took a backup. Restore the original state of the cluster using the backup file.
- Deployments: 2
- Services: 3
# hint
Restore the etcd to a new directory from the snapshot
by using the etcdctl snapshot restore command.
Once the directory is restored,
update the ETCD configuration to use the restored directory.
sudo ETCDCTL_API=3 etcdctl snapshot restore \
/home/cloud_user/etcd_backup.db \
--data-dir="/var/lib/etcd" \
--initial-advertise-peer-urls="http://localhost:2380" \
--initial-cluster="default=http://localhost:2380"
root@controlplane:~# ETCDCTL_API=3 etcdctl --data-dir /var/lib/etcd-from-backup \
snapshot restore /opt/snapshot-pre-boot.db
2021-03-25 23:52:59.608547 I | mvcc: restore compact to 6466
2021-03-25 23:52:59.621400 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
root@controlplane:~#
Note: In this case, we are restoring the snapshot to a different directory but in the same server where we took the backup (the controlplane node) As a result, the only required option for the restore command is the --data-dir.
Next, update the /etc/kubernetes/manifests/etcd.yaml:
We have now restored the etcd snapshot to a new path on the controlplane
- /var/lib/etcd-from-backup,
so, the only change to be made in the YAML file,
is to change the hostPath for the volume called etcd-data
from old directory (/var/lib/etcd) to the new directory /var/lib/etcd-from-backup.
volumes:
- hostPath:
path: /var/lib/etcd-from-backup
type: DirectoryOrCreate
name: etcd-data
With this change, /var/lib/etcd on the container points
to /var/lib/etcd-from-backup on the controlplane (which is what we want)
When this file is updated, the ETCD pod is automatically re-created as this
is a static pod placed under the /etc/kubernetes/manifests directory.
Note: as the ETCD pod has changed it will automatically restart,
and also kube-controller-manager and kube-scheduler.
Wait 1-2 to mins for this pods to restart.
You can run a watch "docker ps | grep etcd" command to see
when the ETCD pod is restarted.
Note2: If the etcd pod is not getting Ready 1/1,
then restart it by kubectl delete pod -n kube-system etcd-controlplane
and wait 1 minute.
Note3: This is the simplest way to make sure that ETCD uses the restored data
after the ETCD pod is recreated.
You don't have to change anything else.
If you do change --data-dir to /var/lib/etcd-from-backup in the YAML file,
make sure that the volumeMounts for etcd-data is updated as well,
with the mountPath pointing to /var/lib/etcd-from-backup
(THIS COMPLETE STEP IS OPTIONAL AND NEED NOT BE DONE FOR COMPLETING THE RESTORE)
Bookmark
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/
'CKA (Certified Kubernetes Administrator) > Kode Kloud' 카테고리의 다른 글
07.Storage - Persistent Volume Claims (0) | 2022.01.25 |
---|---|
06.Security - View Certificate Details (0) | 2022.01.24 |
05.Cluster Maintenance - Cluster Upgrade Process (0) | 2022.01.21 |
05.Cluster Maintenance - OS Upgrades (0) | 2022.01.21 |
04.Application Lifecycle Management - Init Containers (0) | 2022.01.21 |