1.0 Introduction
The purpose of this article is to describe the steps to upgrade Kubernetes from version 1.29.6 to 1.30.5 for Fortanix-Data-Security-Manager (DSM) release 4.34.
2.0 Overview
The Fortanix DSM 4.34 release will upgrade the system from Kubernetes version 1.29 to 1.30.
Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.
After upgrading Fortanix DSM to the 4.34 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 4.34 is installed. Kindly work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.
Also, you will need to upgrade Fortanix DSM to 4.34 before moving to any future release.
3.0 Pre-Upgrade Checks
Before upgrading the Kubernetes, ensure the following:
3.1 Check and Manage Disk Space
Run the following command to check if the disk space of more than 15 GB is available in /var and root (/) directories: If the disk space is less than 15 GB, delete the oldest version of Fortanix DSM from the user interface (UI).
sudo df -h /var/ /The following is the sample output:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 993G 29G 964G 3% /
/dev/nvme0n1p1 993G 29G 964G 3% /3.2 Check Software Versions in Endpoints
Run the following command to check if all software versions are available in all the endpoints:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get ep -n swdistThe following is the sample output:
NAME ENDPOINTS AGE
swdist 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 242d
v2649 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 4d
v2657 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 2d3.3 Check Cluster and Node Health
Run the following command to ensure that the overlay mount matches with this on each node:
sudo cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf[Mount] Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registryHere, ‘
vXXXX’ is the previous version and ‘vYYYY’ is the upgraded version.Ensure that the latest backup is triggered and verify it is successful (size and other metrics).
All nodes must report as healthy and be running
Kubernetesversion1.29.6and kernel5.4.0-190-generic. Run the following command to get the nodes and list the IP:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get nodes -o wideLook for the version number under the column
VERSIONand it must bev1.29.6for each of the nodes.NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-0-189 Ready control-plane 3h44m v1.29.6 172.31.0.189 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12 ip-172-31-1-110 Ready control-plane 3h37m v1.29.6 172.31.1.110 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12 ip-172-31-2-217 Ready control-plane 3h33m v1.29.6 172.31.2.217 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12All pods are healthy in the
default,swdistandkube-systemnamespaces.Run the following command to check
kubeadmconfiguration on the cluster:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get configmap kubeadm-config -oyaml -nkube-systemThis should return the following values for parameters in the master configuration:
kubernetesVersion: v1.29.6
imageRepository: http://containers.fortanix.com:5000/
3.4 Check Etcd Cluster and Component
Run the following command to check the status of
etcdand ifisLeader=trueis assigned to one of theetcdnode.etcdshould be TLS migrated.
Run the following command to generate the list of
etcdmembers:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint statusThe following is the sample output of the above command:
Defaulted container "etcd" out of: etcd, etcd-wait (init) bf2dc0512cac45c3, started, dev-test-3, https://10.197.192.251:2380, https://10.197.192.251:2379, falseRun the following command to ensure that the version of
etcdon each of theetcdpods is3.5.12:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcd --versionThe following is the sample output of the above command:
Defaulted container "etcd" out of: etcd, etcd-wait (init) etcd Version: 3.5.12 Git SHA: e7b3bb6cc Go Version: go1.20.13 Go OS/Arch: linux/amd64Run the following command to check the health of
etcdcluster and ensure that the health of the cluster is healthy:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint healthThe following is the sample output of the above command:
Defaulted container "etcd" out of: etcd, etcd-wait (init) https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 5.953722msOn each node, navigate to
/etc/kubernetes/manifestsdirectory and run the following command to check the image versions for all Kubernetes control-plane components:sudo grep -i "image:" /etc/kubernetes/manifests/*.yamlPerform the following steps to check the expiry of the Kubernetes certificates.
Run the following commands to check the expiry of the certificates under
/etc/kubernetes/pkiand/etc/kubernetes/pki/etcddirectories:sudo find /etc/kubernetes/pki/ -name '*.crt' -exec openssl x509 -noout -dates -in {} \; | grep notAfter sudo find /etc/kubernetes/pki/etcd -name '*.crt' -exec openssl x509 -noout -dates -in {} \; | grep notAfterRun the following command to renew the expired certificates:
sudo /opt/fortanix/sdkms/bin/renew-k8s-certs.sh
Run the following command on each node to check the status of
kubelet,docker, anddocker-registryservice:sudo systemctl status containerd sudo systemctl status kubelet sudo systemctl status docker-registryNOTE
Ensure that the status of the services is
Running.
4.0 Post-Upgrade Checks
Ensure to refer to Section 3.0: Pre-Upgrade Checks before upgrading the Kubernetes:
4.1 Check Node and Deployment Status
Run the following command to check the status of the deploy job:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods | grep deployThe following is the sample output of the above command:
deploy-vqq7r 0/1 Completed 0 32mNOTE
Ensure that the status of the pod is
Completed.Run the following command to get the list of the deploy job:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get job deployThe following is the sample output of the above command:
NAME COMPLETIONS DURATION AGE deploy 1/1 20m 41mNOTE
Verify the completion and duration of the job.
If you are using DC Labeling, run the following command to verify if the zone label is added by the YAML of the node:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get node node_name -o yaml | grep -i 'zone'Run the following command to check the status of the nodes and the k8s version and the role must be control-plane:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get nodes -o wideThe following is the sample output of the above command:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME dsm-test-1 Ready control-plane 62m v1.30.5 10.197.192.252 <none> Ubuntu 20.04.6 LTS 5.4.0-196-generic containerd://1.7.12NOTE
Ensure the following:
STATUSof the nodes isReadyVERSIONcolumn reflectsv1.30.5ROLEScolumn reflectscontrol-planeKERNEL-VERSIONreflects5.4.0-196-generic
4.2 Check Kubernetes and Component Version
Run the following command to generate the list of
etcdmembers:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-dsm-test-1 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member listThe following is the sample output of the above command:
Defaulted container "etcd" out of: etcd, etcd-wait (init) ff5eaaee755acae0, started, dsm-test-1, https://10.197.192.252:2380, https://10.197.192.252:2379, falseRun the following command to check if
kube-proxyis upgraded to imagev1.30.5-2-955034e555cfd2:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds kube-proxy -nkube-system | grep ImageThe following is the sample output of the above command:
Image: containers.fortanix.com:5000/kube-proxy:v1.30.5-2-955034e555cfd2Run the following command to check if
kuredpod is running with image version1.16.0:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds kured -nkube-system | grep ImageThe following is the sample output of the above command:
Image: containers.fortanix.com:5000/kured:1.16.0Run the following command on each of the nodes in the cluster to check if
kube-apiserver,kube-controller-manager,kube-schedulerare upgraded to1.30.5:sudo grep -i "image:" /etc/kubernetes/manifests/*.yamlRun the following command to check the version of
etcd:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pod etcd-ip-172-31-0-189 -n kube-system -o yaml | grep image:The following is the sample output of the above command:
image: containers.fortanix.com:5000/etcd:3.5.15-0 image: containers.fortanix.com:5000/etcd:3.5.15-0 image: containers.fortanix.com:5000/etcd:3.5.15-0 image: containers.fortanix.com:5000/etcd:3.5.15-0Run the following command to check the version of
cert-managerhelm chart:sudo helm list -AThe following is the sample output of the above command:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION certmanager cert-manager 2 2024-07-10 11:57:19.980606498 +0000 UTC deployed cert-manager-v1.15.3 v1.15.3 csiplugin cert-manager 2 2024-07-10 11:57:22.410910496 +0000 UTC deployed cert-manager-csi-driver-v0.10.1 v0.10.1NOTE
Ensure that the
helm chartversion is1.15.3andcsipluginversion is0.10.1.Run the following command to check if the Kubernetes version is upgraded to
v1.30.5(includingkubeadm,kubectl,kubelet packages):sudo dpkg -l | grep kubeThe following is the sample output of the above command:
ii kubeadm 1.30.5-1.1fortanix amd64 Kubernetes Cluster Bootstrapping Tool ii kubectl 1.30.5-1.1 amd64 Kubernetes Command Line Tool ii kubelet 1.30.5-1.1 amd64 Kubernetes Node Agent ii kubernetes-cni 1.2.0-00 amd64 Kubernetes CNIRun the following command to check if image tag
0.29.0forswdistcontainer is updated:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds swdist -nswdist | grep ImageThe following is the sample output of the above command:
Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0 Image: containers.fortanix.com:5000/swdist:0.29.0Run the following command to check the replicas of
corednsdeployment:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -nkube-system -owide | grep corednsThe following is the sample output of the above command:
coredns-5bdcd56d4b-6t7g2 1/1 Running 0 32m 10.244.0.61 dsm-test-1 <none> <none>NOTE
Ensure that number of duplicate
corednsmust be equal to the number of nodes in the cluster.Run the following command to check the version of
flannelandflannel-plugin:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get ds kube-flannel-ds -n kube-system -o yaml | grep image:The following is the sample output of the above command:
image: containers.fortanix.com:5000/flannel:v0.25.6 image: containers.fortanix.com:5000/flannel-cni-plugin:v1.5.1flannel2 image: containers.fortanix.com:5000/flannel:v0.25.6NOTE
Ensure that the
flannelversion is0.25.6andflannel pluginversion is1.5.1flannel2.
4.3 Check cert-manager Configuration
Run the following command to check all the resources of cert-manager:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get all -n cert-managerThe following is the sample output of the above command:
NAME READY STATUS RESTARTS AGE
pod/cert-manager-csi-driver-9lvw2 3/3 Running 4 (14h ago) 15h
pod/certmanager-cert-manager-5fd9f859bb-7slz2 1/1 Running 0 14h
pod/certmanager-cert-manager-cainjector-5998546469-pk9kb 1/1 Running 0 14h
pod/certmanager-cert-manager-webhook-878f95fb5-699lp 1/1 Running 0 14h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/certmanager-cert-manager ClusterIP 10.245.213.126 <none> 9402/TCP 15h
service/certmanager-cert-manager-webhook ClusterIP 10.245.20.237 <none> 443/TCP 15h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cert-manager-csi-driver 1 1 1 1 1 <none> 15h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/certmanager-cert-manager 1/1 1 1 15h
deployment.apps/certmanager-cert-manager-cainjector 1/1 1 1 15h
deployment.apps/certmanager-cert-manager-webhook 1/1 1 1 15h
NAME DESIRED CURRENT READY AGE
replicaset.apps/certmanager-cert-manager-5fd9f859bb 1 1 1 14h
replicaset.apps/certmanager-cert-manager-6c6bdd85d9 0 0 0 15h
replicaset.apps/certmanager-cert-manager-cainjector-5998546469 1 1 1 14h
replicaset.apps/certmanager-cert-manager-cainjector-7b7cbc6988 0 0 0 15h
replicaset.apps/certmanager-cert-manager-webhook-555cbb78cd 0 0 0 15h
replicaset.apps/certmanager-cert-manager-webhook-878f95fb5 1 1 1 14h5.0 Troubleshooting
In case kubelet client certificates expire (
/var/lib/kubelet/pki/kubelet-client.crt) and there is no/var/lib/kubelet/pki/kubelet-client-current.pemfile present, then you can create the certificates using the following commands:TEMP_DIR=/etc/kubernetes/tmp mkdir -p $TEMP_DIR BACKUP_PEM="/var/lib/kubelet/pki/kubelet-client-current.pem" KEY="/var/lib/kubelet/pki/kubelet-client.key" CERT="/var/lib/kubelet/pki/kubelet-client.crt" echo "Stopping kubelet service" systemctl stop kubelet echo "Creating a new key and cert file for kubelet auth" nodename=$(echo "$HOSTNAME" | awk '{print tolower($0)}') openssl req -out $TEMP_DIR/tmp.csr -new -newkey rsa:2048 -nodes -keyout $TEMP_DIR/tmp.key -subj "/O=system:nodes/CN=system:node:$nodename" cat > $TEMP_DIR/kubelet-client.ext << HERE keyUsage = critical,digitalSignature,keyEncipherment extendedKeyUsage = clientAuth HERE echo "Signing the generated csr with kubernetes CA" openssl x509 -req -days 365 -in $TEMP_DIR/tmp.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out $TEMP_DIR/tmp.crt -sha256 -extfile $TEMP_DIR/kubelet-client.ext cp $TEMP_DIR/tmp.crt $CERT cp $TEMP_DIR/tmp.key $KEY chmod 644 $CERT chmod 600 $KEY if grep -q "client-certificate-data" $KUBELET_CONF; then echo "Updating file $KUBELET_CONF to add reference to restored certificates" sed -i "s|\(client-certificate-data:\s*\).*\$|client-certificate: $CERT|" $KUBELET_CONF sed -i "s|\(client-key-data:\s*\).*\$|client-key: $KEY|" $KUBELET_CONF fi echo "Starting kubelet service" systemctl start kubeletUpgrade on two node cluster can fail due to
etcdquorum failure. In such a scenario, if pods are healthy, you can re-run the deploy job manually using the following command. This will eventually upgrade the cluster to1.30.5.sudo sdkms-cluster deploy --stage DEPLOY --version <version>WARNING
Two node upgrades are not recommended.