Introduction
The purpose of this guide is to describe the steps to upgrade Kubernetes from version 1.19.16 to 1.21.14 for Fortanix-Data-Security-Manager (DSM) release 4.15.
Overview
The Fortanix DSM 4.15 release will upgrade the system from Kubernetes version 1.19 to 1.21.
Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.
After upgrading Fortanix DSM to the 4.15 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 4.15 is installed. Please work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.
Also, you will need to upgrade Fortanix DSM to 4.15 before moving to any future release.
Prerequisites
Before upgrading the Kubernetes, ensure to perform the following steps:
Run the following command to check if the disk space of more than 15 GB is available in
/varand root directory (/):root@us-west-eqsv2-13:~# df -h /var/ /The following is the output:
Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-var 46G 28G 16G 64% /var /dev/sda2 46G 21G 24G 47% /Run the following command to delete the oldest version of Fortanix DSM from UI if the disk space is less than 15 GB:
$ df -h /var/ / Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-var 47G 26G 21G 56% /var /dev/sda2 47G 13G 33G 28% /Verify the following keys in
kube-apiserver.yamlof each node and ensure that the assigned IP address is same as the host IP.kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpointadvertise-addressstartupProbe IPreadinessProbe IPlivenessProbe IPIn case of any mismatch, edit the yaml file to replace the assigned IP address with host IP.
The following lines are reference from
/etc/kubernetes/manifests/kube-apiserver.yamlfile.Annotation:
annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 172.31.1.166:6443Advertise-address:
spec: containers: - command: - kube-apiserver - --advertise-address=172.31.1.166livenessProbe:
livenessProbe: failureThreshold: 8 httpGet: host: 172.31.1.166ReadinessProbe:
readinessProbe: failureThreshold: 3 httpGet: host: 172.31.1.166startupProbe:
startupProbe: failureThreshold: 24 httpGet: host: 172.31.1.166
Run the following command to check if all software versions are available in all the endpoints:
root@us-west-eqsv2-13:~# kubectl get ep -n swdistThe following is the output:
NAME ENDPOINTS AGE swdist 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 242d v2649 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 4d v2657 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 2dRun the following command to check the status of docker registry:
systemctl status docker-registryEnsure that the status is active and running before and after the software is uploaded.
Run the following command to ensure that the overlay mount matches with this on each node:
cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf [Mount]Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registryHere, ‘
vXXXX’ is the previous version and ‘vYYYY’ is the upgraded version.Ensure that the latest backup is triggered and verify that it is a successful backup (size and so on).
All nodes must report as healthy and be running Kubernetes version 1.19.16, Docker version 18.6, and kernel 5.8. Run the following command to get the nodes and list the IP:
kubectl get nodes -o wideLook for the version number under the column
VERSIONand it must be v1.19.16 for each of the nodes.NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ali1 Ready master 2d v1.19.16 Ubuntu 20.04.3 LTS 5.8.0-50-generic docker://19.3.11 nuc3 Ready master 3d v1.19.16 Ubuntu 20.04.3 LTS 5.8.0-50-generic docker://19.3.11All pods are healthy in the
default swdistandkube-systemsnamespace.Run the following command to check
kubeadmconfiguration on the cluster:kubectl get configmap kubeadm-config -oyaml -nkube-systemThis should return the following values for parameters in the master configuration:
kubernetesVersion: v1.16.15
imageRepository: http://containers.fortanix.com:5000/
Run the following command to check the status of
etcdand ifisLeader=trueis assigned to one of theetcdnode.etcdshould be TLS migrated.
Run the following command to generate the list ofetcdmembers wherepeerURLsshould have both ports listed, 2380(http) and 2382(https):sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member listThe following is the sample output of the above command:
6eac4cd6e44f7cb0: name=srv1-sitlab-dc peerURLs=https://10.4.65.11:2382 clientURLs=https://10.4.65.11:2379 isLeader=truee6214c803ea4e0c6: name=nuc3 peerURLs=http://10.197.192.12:2380,https://10.197.192.12:2382 clientURLs=http://10.197.192.12:2379 isLeader=true
Run the following command to ensure that version of
etcdon each of theetcdpods is3.4.13-0:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec -nkube-system -- etcd --versionRun the following command to check the health of
etcdcluster and ensure that the health of the cluster is healthy:root@ip-172-31-0-188:/home/administrator# sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-ip-172-31-0-188 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt –key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint healthThe following is the desired output:
Defaulted container "etcd" out of: etcd, etcd-wait (init) https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 7.526257ms root@ip-172-31-0-188:/home/administrator#On each node, navigate to
/etc/kubernetes/manifestsdirectory and run the following command to check the image versions for all kubernetes control plane components:etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yamlThe following is the desired output:
root@ip-172-31-0-231:/etc/kubernetes/manifests# cat etcd.yaml | grep "image:" image: containers.fortanix.com:5000/etcd:3.4.13-0 image: containers.fortanix.com:5000/etcd:3.4.13-0 root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-apiserver.yaml | grep "image:" image: containers.fortanix.com:5000/kube-apiserver:v1.19.16 root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-controller-manager.yaml | grep "image:" image: containers.fortanix.com:5000/kube-controller-manager:v1.19.16 root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-scheduler.yaml | grep "image:" image: containers.fortanix.com:5000/kube-scheduler:v1.19.16 root@ip-172-31-0-231:/etc/kubernetes/manifests#Perform the following steps to check the expiry of the Kubernetes certificates.
Check the certificates under
/etc/kubernetes/pkiand/etc/kubernetes/pki/etcddirectories.Run the following command to renew the expired certificates:
/opt/fortanix/sdkms/bin/renew-k8s-certs.sh
Run the following command on each node to check the status of
kubelet,docker, anddocker-registryservice:systemctl status docker systemctl status kubelet systemctl status docker-registryNOTE
Ensure that the status of the services is
Running.
Upgrading Kubernetes from 1.19 to 1.21
Ensure that you read the ‘Prerequisites’ section before upgrading.
Post Upgrade Procedure
The following are the post-update details to note.
Run the following command to check the status of the deploy job:
# pod status $ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods | grep deploy deploy-rrv8v 0/1 Completed 0 18d # job status $ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get job deploy NAME COMPLETIONS DURATION AGE deploy 1/1 4h54m 18dNOTE
Ensure that the status of the pod is
Completed.Run the following command to check if the Kubernetes version is upgraded to v1.21.14 (including the packages kubeadm, kubectl, kubelet):
$ dpkg -l | grep kube ii kubeadm 1.21.14-00fortanix amd64 Kubernetes Cluster Bootstrapping Tool ii kubectl 1.21.14-00 amd64 Kubernetes Command Line Tool ii kubelet 1.21.14-00 amd64 Kubernetes Node Agent ii kubernetes-cni 0.8.7-00 amd64 Kubernetes CNIRun the following command to check if image tag 0.18.0 for swdist container is updated:
$ sudo -E kubectl describe ds swdist -nswdist | grep Image Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0 Image: containers.fortanix.com:5000/swdist:0.18.0If you are using DC Labeling fortanix-data-security-manager-data-center-labeling, run the following command to verify if the zone label is added by the YAML of the node:
kubectl get node node_name -o yaml | grep -i 'zone'Run the following command on each of the
etcdpods in the cluster to check if theetcdversion is upgraded to3.4.13-0:$ sudo -E kubectl describe pod etcd-sdkms-server-1 -nkube-system | grep Image Image: containers.fortanix.com:5000/etcd:3.4.13-0 Image ID: docker-pullable://containers.fortanix.com:5000/etcd@sha256:1d142ee20719afc2168b2caa3df0c573d6b51741b2f47ea29c5afafa1e3bbe41 Image: containers.fortanix.com:5000/etcd:3.4.13-0 Image ID: docker-pullable://containers.fortanix.com:5000/etcd@sha256:1d142ee20719afc2168b2caa3df0c573d6b51741b2f47ea29c5afafa1e3bbe41Run the following command to check if
kube-proxyis upgraded to imagev1.21.14.3-dfc5441dc370bc:$ sudo -E kubectl describe ds kube-proxy -nkube-system | grep Image Image: containers.fortanix.com:5000/kube-proxy:v1.21.14.3-dfc5441dc370bcRun the following command to check if kured pod is running with image version
1.8.1:$ sudo -E kubectl describe ds kured -nkube-system | grep Image Image: containers.fortanix.com:5000/kured:1.8.1Run the following command on each of the nodes in the clusterKube proxy docker image versions for each k8s version to check if kube-apiserver, kube-controller-manager, kube-scheduler are upgraded to 1.21.14:
$ sudo cat /etc/kubernetes/manifests/kube-scheduler.yaml | grep "image:" image: containers.fortanix.com:5000/kube-scheduler:v1.21.14 $ sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep "image:" image: containers.fortanix.com:5000/kube-controller-manager:v1.21.14 $ sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep "image:" image: containers.fortanix.com:5000/kube-apiserver:v1.21.14Run the following command to check the status of the nodes and the k8s version:
root@ip-172-31-1-89:/home/administrator# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-31-0-56 Ready control-plane,master 19h v1.21.14 ip-172-31-1-188 Ready control-plane,master 19h v1.21.14 ip-172-31-1-89 Ready control-plane,master 20h v1.21.14During the upgrade of Kubernetes, we are migrating
kube-dnstocoredns. Run the following commands to check thecorednsdeployment or pods and their associated resources:root@ip-172-31-1-89:/home/administrator# kubectl get deployments -A | grep kube-dns kube-system kube-dns-autoscaler 1/1 1 1 20h root@ip-172-31-1-89:/home/administrator# kubectl get pods -A | grep kubedns root@ip-172-31-1-89:/home/administrator# root@ip-172-31-0-188:/home/administrator# kubectl get deployments -A | grep coredns kube-system coredns 3/3 3 3 6d16h root@ip-172-31-0-188:/home/administrator# root@ip-172-31-0-188:/home/administrator# kubectl get pods -A | grep coredns kube-system coredns-6d4d95746b-bndwr 1/1 Running 0 6d16h kube-system coredns-6d4d95746b-m88d9 1/1 Running 0 6d15h kube-system coredns-6d4d95746b-wsr79 1/1 Running 0 6d16h root@ip-172-31-0-188:/home/administrator# root@ip-172-31-0-188:/home/administrator# kubectl get clusterrole -A | grep coredns system:coredns 2022-11-21T14:51:38Z root@ip-172-31-0-188:/home/administrator# kubectl get clusterrolebinding -A | grep coredns system:coredns ClusterRole/system:coredns 6d16h root@ip-172-31-0-188:/home/administrator# kubectl get svc -A | grep kube-dns kube-system kube-dns ClusterIP 10.245.0.10 53/UDP,53/TCP,9153/TCP 6d16hDuring the upgrade of Kubernetes, we are migrating
dockertocontainerd. You can check thecontainerdversion available after upgrade. You can usectrorcrictl clitools as replacement of docker CLI.root@ip-172-31-0-188:/home/administrator# systemctl status docker Unit docker.service could not be found. root@ip-172-31-0-188:/home/administrator# systemctl status containerd | grep active Active: active (running) since Mon 2022-11-21 14:51:10 UTC; 6 days ago root@ip-172-31-0-188:/home/administrator# root@ip-172-31-1-89:/home/administrator# apt list --installed | grep containerd WARNING: apt does not have a stable CLI interface. Use with caution in scripts. containerd-config/focal,now 1.0 all [installed] containerd.io/focal,now 1.6.16-1 amd64 [installed]Run the following command to move the
etcd peercommunication port from2382to2380:$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-ip-172-31-0-56 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list Defaulted container "etcd" out of: etcd, etcd-wait (init) cacc42c8d609b98, started, ip-172-31-0-56, https://172.31.0.56:2380, https://172.31.0.56:2379, false 4a92930a7b7fdf0c, started, ip-172-31-1-188, https://172.31.1.188:2380, https://172.31.1.188:2379, false cf0a232fb92a9a5b, started, ip-172-31-1-89, https://172.31.1.89:2380, https://172.31.1.89:2379, false
Troubleshooting
In case kubelet client certificates expire (
/var/lib/kubelet/pki/kubelet-client.crt) and there is no/var/lib/kubelet/pki/kubelet-client-current.pemfile present, then you can create the certificates using the following commands:TEMP_DIR=/etc/kubernetes/tmp mkdir -p $TEMP_DIR BACKUP_PEM="/var/lib/kubelet/pki/kubelet-client-current.pem" KEY="/var/lib/kubelet/pki/kubelet-client.key" CERT="/var/lib/kubelet/pki/kubelet-client.crt" echo "Stopping kubelet service" systemctl stop kubelet echo "Creating a new key and cert file for kubelet auth" nodename=$(echo "$HOSTNAME" | awk '{print tolower($0)}') openssl req -out $TEMP_DIR/tmp.csr -new -newkey rsa:2048 -nodes -keyout $TEMP_DIR/tmp.key -subj "/O=system:nodes/CN=system:node:$nodename" cat > $TEMP_DIR/kubelet-client.ext << HERE keyUsage = critical,digitalSignature,keyEncipherment extendedKeyUsage = clientAuth HERE echo "Signing the generated csr with kubernetes CA" openssl x509 -req -days 365 -in $TEMP_DIR/tmp.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out $TEMP_DIR/tmp.crt -sha256 -extfile $TEMP_DIR/kubelet-client.ext cp $TEMP_DIR/tmp.crt $CERT cp $TEMP_DIR/tmp.key $KEY chmod 644 $CERT chmod 600 $KEY if grep -q "client-certificate-data" $KUBELET_CONF; then echo "Updating file $KUBELET_CONF to add reference to restored certificates" sed -i "s|\(client-certificate-data:\s*\).*\$|client-certificate: $CERT|" $KUBELET_CONF sed -i "s|\(client-key-data:\s*\).*\$|client-key: $KEY|" $KUBELET_CONF fi echo "Starting kubelet service" systemctl start kubeletUpgrade on a 2 node cluster can fail due to
etcdquorum failure. In such a scenario, if pods are healthy, you can re-run the deploy job manually using the following command. This will eventually upgrade the cluster to 1.14.sdkms-cluster deploy --stage DEPLOY --versionWARNING
2 node upgrades are not recommended.
When a cluster is upgraded from build 4.2.2087 to <4.3.xxxx> on a 3-node cluster, it is possible that the deploy job is exited and marked completed before cluster upgrade. In such a scenario, if all the pods are healthy, you can deploy the version again.