Fortanix Data Security Manager (Release 4.15) Kubernetes Version Upgrade to 1.21 K8s

Introduction

The purpose of this guide is to describe the steps to upgrade Kubernetes from version 1.19.16 to 1.21.14 for DSM (DSM) release 4.15.

Overview

The Fortanix DSM 4.15 release will upgrade the system from Kubernetes version 1.19 to 1.21.

Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.

After upgrading Fortanix DSM to the 4.15 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 4.15 is installed. Please work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.

Also, you will need to upgrade Fortanix DSM to 4.15 before moving to any future release.

Prerequisites

Before upgrading the Kubernetes, ensure to perform the following steps:

Run the following command to check if the disk space of more than 15 GB is available in /var and root directory (/):

root@us-west-eqsv2-13:~# df -h /var/ /

The following is the output:

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/main-var   46G   28G   16G  64% /var

/dev/sda2              46G   21G   24G  47% /

Run the following command to delete the oldest version of Fortanix DSM from UI if the disk space is less than 15 GB:

$ df -h /var/ /
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/main-var   47G   26G   21G  56% /var
/dev/sda2              47G   13G   33G  28% /

Verify the following keys in kube-apiserver.yaml of each node and ensure that the assigned IP address is same as the host IP.
- kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint
- advertise-address
- startupProbe IP
- readinessProbe IP
- livenessProbe IP
  In case of any mismatch, edit the yaml file to replace the assigned IP address with host IP.
  The following lines are reference from /etc/kubernetes/manifests/kube-apiserver.yaml file.
- Annotation:
- ```
annotations:
   kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 172.31.1.166:6443
```
- Advertise-address:
```
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=172.31.1.166
```
- livenessProbe:
```
livenessProbe:
     failureThreshold: 8 
     httpGet: 
       host: 172.31.1.166
```
- ReadinessProbe:
```
readinessProbe:
     failureThreshold: 3
     httpGet:
       host: 172.31.1.166
```
- startupProbe:
```
startupProbe:
     failureThreshold: 24
     httpGet:
       host: 172.31.1.166
```

Run the following command to check if all software versions are available in all the endpoints:

root@us-west-eqsv2-13:~# kubectl  get ep -n swdist

The following is the output:

NAME      ENDPOINTS                                         AGE

swdist    10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   242d

v2649     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   4d

v2657     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   2d

Run the following command to check the status of docker registry:
```
systemctl status docker-registry
```
Ensure that the status is active and running before and after the software is uploaded.

Run the following command to ensure that the overlay mount matches with this on each node:

cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf
[Mount]

Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registry

Here, ‘vXXXX’ is the previous version and ‘vYYYY’ is the upgraded version.

Ensure that the latest backup is triggered and verify that it is a successful backup (size and so on).

All nodes must report as healthy and be running Kubernetes version 1.19.16, Docker version 18.6, and kernel 5.8. Run the following command to get the nodes and list the IP:

kubectl get nodes -o wide

Look for the version number under the column VERSION and it must be v1.19.16 for each of the nodes.

NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ali1 Ready master 2d v1.19.16  Ubuntu 20.04.3 LTS 5.8.0-50-generic docker://19.3.11
nuc3 Ready master 3d v1.19.16  Ubuntu 20.04.3 LTS 5.8.0-50-generic docker://19.3.11

All pods are healthy in the default swdist and kube-systems namespace.
Run the following command to check kubeadm configuration on the cluster:
```
kubectl get configmap kubeadm-config -oyaml -nkube-system
```
This should return the following values for parameters in the master configuration:
- kubernetesVersion: v1.16.15
- imageRepository: http://containers.fortanix.com:5000/

Run the following command to check the status of etcd and if isLeader=true is assigned to one of the etcd node.

etcd should be TLS migrated.
Run the following command to generate the list of etcd members where peerURLs should have both ports listed, 2380(http) and 2382(https):

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec  -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list

The following is the sample output of the above command:

6eac4cd6e44f7cb0: name=srv1-sitlab-dc peerURLs=https://10.4.65.11:2382 clientURLs=https://10.4.65.11:2379 isLeader=truee6214c803ea4e0c6: name=nuc3 peerURLs=http://10.197.192.12:2380,https://10.197.192.12:2382 clientURLs=http://10.197.192.12:2379 isLeader=true

Run the following command to ensure that version of etcd on each of the etcd pods is 3.4.13-0:

sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec  -nkube-system -- etcd --version

Run the following command to check the health of etcd cluster and ensure that the health of the cluster is healthy:

root@ip-172-31-0-188:/home/administrator# sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-ip-172-31-0-188 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt –key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 endpoint health

The following is the desired output:

Defaulted container "etcd" out of: etcd, etcd-wait (init)
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 7.526257ms
root@ip-172-31-0-188:/home/administrator#

On each node, navigate to /etc/kubernetes/manifests directory and run the following command to check the image versions for all kubernetes control plane components:

etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml

The following is the desired output:

root@ip-172-31-0-231:/etc/kubernetes/manifests# cat etcd.yaml | grep "image:"
image: containers.fortanix.com:5000/etcd:3.4.13-0
image: containers.fortanix.com:5000/etcd:3.4.13-0
root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-apiserver.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-apiserver:v1.19.16
root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-controller-manager.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-controller-manager:v1.19.16
root@ip-172-31-0-231:/etc/kubernetes/manifests# cat kube-scheduler.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-scheduler:v1.19.16
root@ip-172-31-0-231:/etc/kubernetes/manifests#

Perform the following steps to check the expiry of the Kubernetes certificates.
1. Check the certificates under /etc/kubernetes/pki and /etc/kubernetes/pki/etcd directories.
2. Run the following command to renew the expired certificates:
```
/opt/fortanix/sdkms/bin/renew-k8s-certs.sh
```
Run the following command on each node to check the status of kubelet, docker, and docker-registry service:
```
systemctl status docker
systemctl status kubelet
systemctl status docker-registry
```
NOTE
Ensure that the status of the services is Running.

Upgrading Kubernetes from 1.19 to 1.21

Ensure that you read the ‘Prerequisites’ section before upgrading.

Post Upgrade Procedure

The following are the post-update details to note.

Run the following command to check the status of the deploy job:

# pod status
$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods | grep deploy
deploy-rrv8v 0/1 Completed 0 18d

# job status
$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get job deploy
NAME COMPLETIONS DURATION AGE
deploy 1/1 4h54m 18d

NOTE
Ensure that the status of the pod is Completed.

Run the following command to check if the Kubernetes version is upgraded to v1.21.14 (including the packages kubeadm, kubectl, kubelet):

$ dpkg -l | grep kube
ii kubeadm 1.21.14-00fortanix amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.21.14-00 amd64 Kubernetes Command Line Tool
ii kubelet 1.21.14-00 amd64 Kubernetes Node Agent
ii kubernetes-cni 0.8.7-00 amd64 Kubernetes CNI

Run the following command to check if image tag 0.18.0 for swdist container is updated:

$ sudo -E kubectl describe ds swdist -nswdist | grep Image
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0
Image: containers.fortanix.com:5000/swdist:0.18.0

If you are using DC Labeling fortanix-data-security-manager-data-center-labeling, run the following command to verify if the zone label is added by the YAML of the node:
```
kubectl get node node_name -o yaml | grep -i 'zone'
```

Run the following command on each of the etcd pods in the cluster to check if the etcd version is upgraded to 3.4.13-0:

$ sudo -E kubectl describe pod etcd-sdkms-server-1 -nkube-system | grep Image
Image: containers.fortanix.com:5000/etcd:3.4.13-0
Image ID: docker-pullable://containers.fortanix.com:5000/etcd@sha256:1d142ee20719afc2168b2caa3df0c573d6b51741b2f47ea29c5afafa1e3bbe41
Image: containers.fortanix.com:5000/etcd:3.4.13-0
Image ID: docker-pullable://containers.fortanix.com:5000/etcd@sha256:1d142ee20719afc2168b2caa3df0c573d6b51741b2f47ea29c5afafa1e3bbe41

Run the following command to check if kube-proxy is upgraded to image v1.21.14.3-dfc5441dc370bc:

$ sudo -E kubectl describe ds kube-proxy -nkube-system | grep Image
Image: containers.fortanix.com:5000/kube-proxy:v1.21.14.3-dfc5441dc370bc

Run the following command to check if kured pod is running with image version 1.8.1:

$ sudo -E kubectl describe ds kured -nkube-system | grep Image
Image: containers.fortanix.com:5000/kured:1.8.1

Run the following command on each of the nodes in the clusterKube proxy docker image versions for each k8s version to check if kube-apiserver, kube-controller-manager, kube-scheduler are upgraded to 1.21.14:

$ sudo cat /etc/kubernetes/manifests/kube-scheduler.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-scheduler:v1.21.14
$ sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-controller-manager:v1.21.14
$ sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep "image:"
image: containers.fortanix.com:5000/kube-apiserver:v1.21.14

Run the following command to check the status of the nodes and the k8s version:

root@ip-172-31-1-89:/home/administrator# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-0-56 Ready control-plane,master 19h v1.21.14
ip-172-31-1-188 Ready control-plane,master 19h v1.21.14
ip-172-31-1-89 Ready control-plane,master 20h v1.21.14

During the upgrade of Kubernetes, we are migrating kube-dns to coredns. Run the following commands to check the coredns deployment or pods and their associated resources:

root@ip-172-31-1-89:/home/administrator# kubectl get deployments -A | grep kube-dns
kube-system kube-dns-autoscaler 1/1 1 1 20h
root@ip-172-31-1-89:/home/administrator# kubectl get pods -A | grep kubedns
root@ip-172-31-1-89:/home/administrator#
root@ip-172-31-0-188:/home/administrator# kubectl get deployments -A | grep coredns
kube-system coredns 3/3 3 3 6d16h
root@ip-172-31-0-188:/home/administrator#
root@ip-172-31-0-188:/home/administrator# kubectl get pods -A | grep coredns
kube-system coredns-6d4d95746b-bndwr 1/1 Running 0 6d16h
kube-system coredns-6d4d95746b-m88d9 1/1 Running 0 6d15h
kube-system coredns-6d4d95746b-wsr79 1/1 Running 0 6d16h
root@ip-172-31-0-188:/home/administrator#
root@ip-172-31-0-188:/home/administrator# kubectl get clusterrole -A | grep coredns
system:coredns 2022-11-21T14:51:38Z
root@ip-172-31-0-188:/home/administrator# kubectl get clusterrolebinding -A | grep coredns
system:coredns ClusterRole/system:coredns 6d16h
root@ip-172-31-0-188:/home/administrator# kubectl get svc -A | grep kube-dns
kube-system kube-dns ClusterIP 10.245.0.10  53/UDP,53/TCP,9153/TCP 6d16h

During the upgrade of Kubernetes, we are migrating docker to containerd. You can check the containerd version available after upgrade. You can use ctr or crictl cli tools as replacement of docker CLI.

root@ip-172-31-0-188:/home/administrator# systemctl status docker
      Unit docker.service could not be found.
      root@ip-172-31-0-188:/home/administrator# systemctl status containerd | grep active
      Active: active (running) since Mon 2022-11-21 14:51:10 UTC; 6 days ago
      root@ip-172-31-0-188:/home/administrator#
      root@ip-172-31-1-89:/home/administrator# apt list --installed | grep containerd
      WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
      containerd-config/focal,now 1.0 all [installed]
containerd.io/focal,now 1.6.16-1 amd64 [installed]

Run the following command to move the etcd peer communication port from 2382 to 2380:

$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-ip-172-31-0-56 -nkube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list
Defaulted container "etcd" out of: etcd, etcd-wait (init)
cacc42c8d609b98, started, ip-172-31-0-56, https://172.31.0.56:2380, https://172.31.0.56:2379, false
4a92930a7b7fdf0c, started, ip-172-31-1-188, https://172.31.1.188:2380, https://172.31.1.188:2379, false
cf0a232fb92a9a5b, started, ip-172-31-1-89, https://172.31.1.89:2380, https://172.31.1.89:2379, false

Troubleshooting

In case kubelet client certificates expire (/var/lib/kubelet/pki/kubelet-client.crt) and there is no /var/lib/kubelet/pki/kubelet-client-current.pem file present, then you can create the certificates using the following commands:

TEMP_DIR=/etc/kubernetes/tmp
mkdir -p $TEMP_DIR
BACKUP_PEM="/var/lib/kubelet/pki/kubelet-client-current.pem"
KEY="/var/lib/kubelet/pki/kubelet-client.key"
CERT="/var/lib/kubelet/pki/kubelet-client.crt"

echo "Stopping kubelet service"
systemctl stop kubelet

echo "Creating a new key and cert file for kubelet auth"
nodename=$(echo "$HOSTNAME" | awk '{print tolower($0)}')
openssl req -out $TEMP_DIR/tmp.csr -new -newkey rsa:2048 -nodes -keyout $TEMP_DIR/tmp.key -subj "/O=system:nodes/CN=system:node:$nodename"
cat > $TEMP_DIR/kubelet-client.ext << HERE
keyUsage = critical,digitalSignature,keyEncipherment
extendedKeyUsage = clientAuth
HERE
echo "Signing the generated csr with kubernetes CA"
openssl x509 -req -days 365 -in $TEMP_DIR/tmp.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out $TEMP_DIR/tmp.crt -sha256 -extfile $TEMP_DIR/kubelet-client.ext
cp $TEMP_DIR/tmp.crt $CERT
cp $TEMP_DIR/tmp.key $KEY

chmod 644 $CERT
chmod 600 $KEY

if grep -q "client-certificate-data" $KUBELET_CONF; then
    echo "Updating file $KUBELET_CONF to add reference to restored certificates"
    sed -i "s|\(client-certificate-data:\s*\).*\$|client-certificate: $CERT|" $KUBELET_CONF
    sed -i "s|\(client-key-data:\s*\).*\$|client-key: $KEY|" $KUBELET_CONF
fi

echo "Starting kubelet service"
systemctl start kubelet

Upgrade on a 2 node cluster can fail due to etcd quorum failure. In such a scenario, if pods are healthy, you can re-run the deploy job manually using the following command. This will eventually upgrade the cluster to 1.14.
```
sdkms-cluster deploy --stage DEPLOY --version 
```
WARNING
2 node upgrades are not recommended.
When a cluster is upgraded from build 4.2.2087 to <4.3.xxxx> on a 3-node cluster, it is possible that the deploy job is exited and marked completed before cluster upgrade. In such a scenario, if all the pods are healthy, you can deploy the version again.

Fortanix Data Security Manager (Release 4.15) Kubernetes Version Upgrade to 1.21 K8s

Introduction

Overview

Prerequisites

Upgrading Kubernetes from 1.19 to 1.21

Post Upgrade Procedure

Troubleshooting

PLATFORM

Key Insight

Data Security Manager™

Confidential Computing Manager

Enclave Development Platform®

Request A demo

Contact Us

Free Trial

SOLUTIONS

AWS KMS External Key Store (XKS)

Google External Key Manager

Bring Your Own Key (BYOK)

HSM Modernization

Multicloud Key Management

Post Quantum Cryptography

Code Signing

Secrets Management

Tokenization Transparent

Database Encryption

Filesystem Encryption

Confidential Data Search

Confidential AI

Healthcare

Banking & Financial Services

Fintech

Manufacturing

Web 3.0

Federal Government

RESOURCES

Blog

Whitepapers

Datasheets

Solution Briefs

Ebooks

Reports

Case Studies

Webinars

University

Media Kit

Newsletters

COMPANY

About

Careerswe’re hiring

Customers

Partners

Awards

Events

Press

News

Services

Support

FAQ

4.6