Fortanix Data Security Manager (Release 4.34) Kubernetes Version Upgrade to 1.30 K8s

1.0 Introduction

The purpose of this article is to describe the steps to upgrade Kubernetes from version 1.29.6 to 1.30.5 for Fortanix-Data-Security-Manager (DSM) release 4.34.

2.0 Overview

The Fortanix DSM 4.34 release will upgrade the system from Kubernetes version 1.29 to 1.30.

Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.

After upgrading Fortanix DSM to the 4.34 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 4.34 is installed. Kindly work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.

Also, you will need to upgrade Fortanix DSM to 4.34 before moving to any future release.

3.0 Pre-Upgrade Checks

Before upgrading the Kubernetes, ensure the following:

3.1 Check and Manage Disk Space

Run the following command to check if the disk space of more than 15 GB is available in /var and root (/) directories: If the disk space is less than 15 GB, delete the oldest version of Fortanix DSM from the user interface (UI).

sudo df -h /var/ /

The following is the sample output:

Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 993G 29G 964G 3% /
/dev/nvme0n1p1 993G 29G 964G 3% /

3.2 Check Software Versions in Endpoints

Run the following command to check if all software versions are available in all the endpoints:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get ep -n swdist

The following is the sample output:

NAME      ENDPOINTS                                         AGE
swdist    10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   242d
v2649     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   4d
v2657     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   2d

3.3 Check Cluster and Node Health

Run the following command to ensure that the overlay mount matches with this on each node:

sudo cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf

[Mount]
Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registry

Here, ‘vXXXX’ is the previous version and ‘vYYYY’ is the upgraded version.

Ensure that the latest backup is triggered and verify it is successful (size and other metrics).

All nodes must report as healthy and be running Kubernetes version 1.29.6 and kernel 5.4.0-190-generic. Run the following command to get the nodes and list the IP:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get nodes -o wide

Look for the version number under the column VERSION and it must be v1.29.6 for each of the nodes.

NAME            STATUS ROLES        AGE   VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 
ip-172-31-0-189 Ready control-plane 3h44m v1.29.6 172.31.0.189 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12 
ip-172-31-1-110 Ready control-plane 3h37m v1.29.6 172.31.1.110 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12 
ip-172-31-2-217 Ready control-plane 3h33m v1.29.6 172.31.2.217 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.12

All pods are healthy in the default, swdistand kube-system namespaces.
Run the following command to check kubeadm configuration on the cluster:
```
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get configmap kubeadm-config -oyaml -nkube-system
```
This should return the following values for parameters in the master configuration:
- kubernetesVersion: v1.29.6
- imageRepository: http://containers.fortanix.com:5000/

3.4 Check Etcd Cluster and Component

Run the following command to check the status of etcd and if isLeader=true is assigned to one of the etcd node.
- etcd should be TLS migrated.

Run the following command to generate the list of etcd members:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint status

The following is the sample output of the above command:

Defaulted container "etcd" out of: etcd, etcd-wait (init)
bf2dc0512cac45c3, started, dev-test-3, https://10.197.192.251:2380, https://10.197.192.251:2379, false

Run the following command to ensure that the version of etcd on each of the etcd pods is 3.5.12:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcd --version

The following is the sample output of the above command:

Defaulted container "etcd" out of: etcd, etcd-wait (init)
etcd Version: 3.5.12
Git SHA: e7b3bb6cc
Go Version: go1.20.13
Go OS/Arch: linux/amd64

Run the following command to check the health of etcd cluster and ensure that the health of the cluster is healthy:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd-pod-name-fromanynode> -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint health

The following is the sample output of the above command:

Defaulted container "etcd" out of: etcd, etcd-wait (init)
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 5.953722ms

On each node, navigate to /etc/kubernetes/manifests directory and run the following command to check the image versions for all Kubernetes control-plane components:
```
sudo grep -i "image:" /etc/kubernetes/manifests/*.yaml
```
Perform the following steps to check the expiry of the Kubernetes certificates.
1. Run the following commands to check the expiry of the certificates under /etc/kubernetes/pki and /etc/kubernetes/pki/etcd directories:
```
sudo find /etc/kubernetes/pki/ -name '*.crt' -exec openssl x509 -noout -dates -in {} \; | grep notAfter
sudo find /etc/kubernetes/pki/etcd -name '*.crt' -exec openssl x509 -noout -dates -in {} \; | grep notAfter
```
2. Run the following command to renew the expired certificates:
```
sudo /opt/fortanix/sdkms/bin/renew-k8s-certs.sh
```
Run the following command on each node to check the status of kubelet, docker, and docker-registry service:
```
sudo systemctl status containerd
sudo systemctl status kubelet
sudo systemctl status docker-registry
```
NOTE
Ensure that the status of the services is Running.

4.0 Post-Upgrade Checks

Ensure to refer to Section 3.0: Pre-Upgrade Checks before upgrading the Kubernetes:

4.1 Check Node and Deployment Status

Run the following command to check the status of the deploy job:
```
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods | grep deploy
```
The following is the sample output of the above command:
```
deploy-vqq7r     0/1     Completed   0    32m
```
NOTE
Ensure that the status of the pod is Completed.
Run the following command to get the list of the deploy job:
```
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get job deploy
```
The following is the sample output of the above command:
```
NAME     COMPLETIONS   DURATION   AGE
deploy   1/1           20m        41m
```
NOTE
Verify the completion and duration of the job.
If you are using DC Labeling, run the following command to verify if the zone label is added by the YAML of the node:
```
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get node node_name -o yaml | grep -i 'zone'
```

Run the following command to check the status of the nodes and the k8s version and the role must be control-plane:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get nodes -o wide

The following is the sample output of the above command:

NAME         STATUS     ROLES         AGE   VERSION    INTERNAL-IP    EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 
dsm-test-1   Ready    control-plane   62m   v1.30.5   10.197.192.252  <none> Ubuntu 20.04.6 LTS   5.4.0-196-generic containerd://1.7.12

NOTE
Ensure the following:
STATUS of the nodes is Ready
VERSION column reflects v1.30.5
ROLES column reflects control-plane
KERNEL-VERSION reflects 5.4.0-196-generic

4.2 Check Kubernetes and Component Version

Run the following command to generate the list of etcd members:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-dsm-test-1 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list

The following is the sample output of the above command:

Defaulted container "etcd" out of: etcd, etcd-wait (init)
ff5eaaee755acae0, started, dsm-test-1, https://10.197.192.252:2380, https://10.197.192.252:2379, false

Run the following command to check if kube-proxy is upgraded to image v1.30.5-2-955034e555cfd2:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds kube-proxy -nkube-system | grep Image

The following is the sample output of the above command:

Image:      containers.fortanix.com:5000/kube-proxy:v1.30.5-2-955034e555cfd2

Run the following command to check if kured pod is running with image version 1.16.0:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds kured -nkube-system | grep Image

The following is the sample output of the above command:

Image: containers.fortanix.com:5000/kured:1.16.0

Run the following command on each of the nodes in the cluster to check if kube-apiserver, kube-controller-manager, kube-scheduler are upgraded to 1.30.5:
```
sudo grep -i "image:" /etc/kubernetes/manifests/*.yaml
```

Run the following command to check the version of etcd:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pod etcd-ip-172-31-0-189 -n kube-system -o  yaml | grep image:

The following is the sample output of the above command:

image: containers.fortanix.com:5000/etcd:3.5.15-0
image: containers.fortanix.com:5000/etcd:3.5.15-0
image: containers.fortanix.com:5000/etcd:3.5.15-0
image: containers.fortanix.com:5000/etcd:3.5.15-0

Run the following command to check the version of cert-manager helm chart:

sudo helm list -A

The following is the sample output of the above command:

NAME        NAMESPACE    REVISION UPDATED                                 STATUS   CHART APP           VERSION 
certmanager cert-manager 2        2024-07-10 11:57:19.980606498 +0000 UTC deployed cert-manager-v1.15.3 v1.15.3
csiplugin   cert-manager 2        2024-07-10 11:57:22.410910496 +0000 UTC deployed cert-manager-csi-driver-v0.10.1 v0.10.1

NOTE
Ensure that the helm chart version is 1.15.3 and csiplugin version is 0.10.1.

Run the following command to check if the Kubernetes version is upgraded to v1.30.5 (including kubeadm, kubectl, kubelet packages):

sudo dpkg -l | grep kube

The following is the sample output of the above command:

ii kubeadm 1.30.5-1.1fortanix amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.30.5-1.1 amd64 Kubernetes Command Line Tool
ii kubelet 1.30.5-1.1 amd64 Kubernetes Node Agent
ii kubernetes-cni 1.2.0-00 amd64 Kubernetes CNI

Run the following command to check if image tag 0.29.0 for swdist container is updated:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds swdist -nswdist | grep Image

The following is the sample output of the above command:

    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0
    Image:      containers.fortanix.com:5000/swdist:0.29.0

Run the following command to check the replicas of coredns deployment:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -nkube-system -owide | grep coredns

The following is the sample output of the above command:

coredns-5bdcd56d4b-6t7g2               1/1     Running   0             32m   10.244.0.61      dsm-test-1   <none>           <none>

NOTE
Ensure that number of duplicate coredns must be equal to the number of nodes in the cluster.

Run the following command to check the version of flannel and flannel-plugin:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get ds kube-flannel-ds -n kube-system -o yaml | grep image:

The following is the sample output of the above command:

image: containers.fortanix.com:5000/flannel:v0.25.6
image: containers.fortanix.com:5000/flannel-cni-plugin:v1.5.1flannel2
image: containers.fortanix.com:5000/flannel:v0.25.6

NOTE
Ensure that the flannel version is 0.25.6 and flannel plugin version is 1.5.1flannel2.

4.3 Check cert-manager Configuration

Run the following command to check all the resources of cert-manager:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get all -n cert-manager

The following is the sample output of the above command:

NAME READY STATUS RESTARTS AGE
pod/cert-manager-csi-driver-9lvw2 3/3 Running 4 (14h ago) 15h
pod/certmanager-cert-manager-5fd9f859bb-7slz2 1/1 Running 0 14h
pod/certmanager-cert-manager-cainjector-5998546469-pk9kb 1/1 Running 0 14h
pod/certmanager-cert-manager-webhook-878f95fb5-699lp 1/1 Running 0 14h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/certmanager-cert-manager ClusterIP 10.245.213.126  <none> 9402/TCP 15h
service/certmanager-cert-manager-webhook ClusterIP 10.245.20.237  <none> 443/TCP 15h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cert-manager-csi-driver 1 1 1 1 1  <none> 15h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/certmanager-cert-manager 1/1 1 1 15h
deployment.apps/certmanager-cert-manager-cainjector 1/1 1 1 15h
deployment.apps/certmanager-cert-manager-webhook 1/1 1 1 15h

NAME DESIRED CURRENT READY AGE
replicaset.apps/certmanager-cert-manager-5fd9f859bb 1 1 1 14h
replicaset.apps/certmanager-cert-manager-6c6bdd85d9 0 0 0 15h
replicaset.apps/certmanager-cert-manager-cainjector-5998546469 1 1 1 14h
replicaset.apps/certmanager-cert-manager-cainjector-7b7cbc6988 0 0 0 15h
replicaset.apps/certmanager-cert-manager-webhook-555cbb78cd 0 0 0 15h
replicaset.apps/certmanager-cert-manager-webhook-878f95fb5 1 1 1 14h

5.0 Troubleshooting

In case kubelet client certificates expire (/var/lib/kubelet/pki/kubelet-client.crt) and there is no /var/lib/kubelet/pki/kubelet-client-current.pem file present, then you can create the certificates using the following commands:

TEMP_DIR=/etc/kubernetes/tmp
mkdir -p $TEMP_DIR
BACKUP_PEM="/var/lib/kubelet/pki/kubelet-client-current.pem"
KEY="/var/lib/kubelet/pki/kubelet-client.key"
CERT="/var/lib/kubelet/pki/kubelet-client.crt"

echo "Stopping kubelet service"
systemctl stop kubelet

echo "Creating a new key and cert file for kubelet auth"
nodename=$(echo "$HOSTNAME" | awk '{print tolower($0)}')
openssl req -out $TEMP_DIR/tmp.csr -new -newkey rsa:2048 -nodes -keyout $TEMP_DIR/tmp.key -subj "/O=system:nodes/CN=system:node:$nodename"
cat > $TEMP_DIR/kubelet-client.ext << HERE
keyUsage = critical,digitalSignature,keyEncipherment
extendedKeyUsage = clientAuth
HERE
echo "Signing the generated csr with kubernetes CA"
openssl x509 -req -days 365 -in $TEMP_DIR/tmp.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out $TEMP_DIR/tmp.crt -sha256 -extfile $TEMP_DIR/kubelet-client.ext
cp $TEMP_DIR/tmp.crt $CERT
cp $TEMP_DIR/tmp.key $KEY

chmod 644 $CERT
chmod 600 $KEY

if grep -q "client-certificate-data" $KUBELET_CONF; then
    echo "Updating file $KUBELET_CONF to add reference to restored certificates"
    sed -i "s|\(client-certificate-data:\s*\).*\$|client-certificate: $CERT|" $KUBELET_CONF
    sed -i "s|\(client-key-data:\s*\).*\$|client-key: $KEY|" $KUBELET_CONF
fi

echo "Starting kubelet service"
systemctl start kubelet

Upgrade on two node cluster can fail due to etcd quorum failure. In such a scenario, if pods are healthy, you can re-run the deploy job manually using the following command. This will eventually upgrade the cluster to 1.30.5.
```
sudo sdkms-cluster deploy --stage DEPLOY --version <version>
```
WARNING
Two node upgrades are not recommended.

Fortanix Data Security Manager (Release 4.34) Kubernetes Version Upgrade to 1.30 K8s

1.0 Introduction

2.0 Overview

3.0 Pre-Upgrade Checks

3.1 Check and Manage Disk Space

3.2 Check Software Versions in Endpoints

3.3 Check Cluster and Node Health

3.4 Check Etcd Cluster and Component

4.0 Post-Upgrade Checks

4.1 Check Node and Deployment Status

4.2 Check Kubernetes and Component Version

4.3 Check cert-manager Configuration

5.0 Troubleshooting

PLATFORM

Key Insight

Data Security Manager™

Confidential Computing Manager

Enclave Development Platform®

Request A demo

Contact Us

Free Trial

SOLUTIONS

AWS KMS External Key Store (XKS)

Google External Key Manager

Bring Your Own Key (BYOK)

HSM Modernization

Multicloud Key Management

Post Quantum Cryptography

Code Signing

Secrets Management

Tokenization Transparent

Database Encryption

Filesystem Encryption

Confidential Data Search

Confidential AI

Healthcare

Banking & Financial Services

Fintech

Manufacturing

Web 3.0

Federal Government

RESOURCES

Blog

Whitepapers

Datasheets

Solution Briefs

Ebooks

Reports

Case Studies

Webinars

University

Media Kit

Newsletters

COMPANY

About

Careerswe’re hiring

Customers

Partners

Awards

Events

Press

News

Services

Support

FAQ

4.6