Introduction
The purpose of this guide is to describe the steps to upgrade Kubernetes from version 1.10 k8s to 1.14 k8s for Fortanix DSM release 4.3.
Overview
The Fortanix DSM 4.3 release will upgrade the system from Kubernetes version 1.10 k8s to 1.14 k8s.
Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.
After upgrading Fortanix DSM to the 4.3 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 4.3 is installed. Please work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.
Also, you will need to upgrade Fortanix DSM to 4.3 before moving to any future release.
Prerequisites
The following are the prerequisites before upgrading:
Ensure that Disk space of more than 15 GB is available in
/var
and root directory (/
) by executing the following command:root@us-west-eqsv2-13:~# df -h /var/ /
The following is the output:
Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-var 46G 28G 16G 64% /var /dev/sda2 46G 21G 24G 47% /
Ensure that all software versions are available in all the endpoints by executing the following command:
root@us-west-eqsv2-13:~# kubectl get ep -n swdist
The following is the output:
NAME ENDPOINTS AGE swdist 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 242d v2649 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 4d v2657 10.244.0.212:22,10.244.1.191:22,10.244.2.152:22 2d
Ensure that Docker registry status “
systemctl status docker-registry
” is active and running before and after the software is uploaded. Also, ensure that the overlay mount matches with this on each node. The following is the command:cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf [Mount]
Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registry
Here, ‘
vXXXX
’ is the previous version and ‘vYYYY
’ is the upgraded version.Ensure that the latest backup is triggered and verify that it is a successful backup (size and so on).
Upgrading Kubernetes from 1.10 to 1.14
Ensure that you read the ‘Prerequisites’ section before upgrading.
Pre Checks
All nodes should report healthy and should be running Kubernetes version v1.10.13, docker version 17.3.1, and kernel 5.4.
Run the commandkubectl get nodes
and look for the version number under the columnVERSION
. It should showv1.10.13
for each of the nodes. The following is an example output:NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ali1 Ready master 2d v1.10.13 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic docker://17.3.1 nuc3 Ready master 3d v1.10.13 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic docker://17.3.1
All pods are healthy in the
default swdist
andkube-systems
namespace.Check the etcd status.
etcd should be TLS migrated. The following command should output the list of etcd members where
peerURLs
should have both ports listed, 2380(http) and 2382(https).sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec -nkube-system -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key-file /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list
The output of the above command should be similar to:
37bc079e7f15c970: name=ali1 peerURLs=http://10.197.192.14:2380,https://10.197.192.14:2382 clientURLs=https://10.197.192.14:2379 isLeader=false e6214c803ea4e0c6: name=nuc3 peerURLs=http://10.197.192.12:2380,https://10.197.192.12:2382 clientURLs=http://10.197.192.12:2379 isLeader=true
Check the etcd version on each of the etcd pods. It should be 3.2.18.
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec <etcd pod name> -nkube-system -- etcd --version
Check the etcd cluster health. It should report that the cluster is healthy. For example:
$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-us-west-eqsv2-prod-1 -nkube-system -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key-file /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 cluster-health member 60bf9f6f5fbf9ee3 is healthy: got healthy result from https://10.197.64.4:2379 member 63e02b05bdd1e768 is healthy: got healthy result from https://10.197.64.5:2379 member 6a6e23ad086373b9 is healthy: got healthy result from https://10.197.64.7:2379 member b4f9fd50b1fc3926 is healthy: got healthy result from https://10.197.64.1:2379 member efe173d68e9fe699 is healthy: got healthy result from https://10.197.64.3:2379 cluster is healthy
Check the image versions for all Kubernetes control plane components. On each node, go to the directory
/etc/kubernetes/manifests
and run the following commands. The desired output should be the following.administrator@us-west-eqsv2-prod-1:/etc/kubernetes/manifests$ sudo cat etcd.yaml | grep "image:" image: containers.fortanix.com:5000/etcd-amd64:3.2.18 image: containers.fortanix.com:5000/etcd-amd64:3.2.18 administrator@us-west-eqsv2-prod-1:/etc/kubernetes/manifests$ sudo cat kube-scheduler.yaml | grep "image:" image: containers.fortanix.com:5000/kube-scheduler-amd64:v1.10.13 administrator@us-west-eqsv2-prod-1:/etc/kubernetes/manifests$ sudo cat kube-controller-manager.yaml | grep "image:" image: containers.fortanix.com:5000/kube-controller-manager-amd64:v1.10.13 administrator@us-west-eqsv2-prod-1:/etc/kubernetes/manifests$ sudo cat kube-apiserver.yaml | grep "image:" image: containers.fortanix.com:5000/kube-apiserver-amd64:v1.10.13
Make sure the Kubernetes certificates have not expired or are about to expire.
Check the certificates under
/etc/kubernetes/pki
and/etc/kubernetes/pki/etcd
.If the certificates have expired, renew them using
/opt/fortanix/sdkms/bin/renew-k8s-certs.sh
.
The kubelet, docker, and docker-registry service should be running on each node.
systemctl status docker systemctl status kubelet systemctl status docker-registry
There should be disk space of 15+ GB available in
/var
and/
. If not, please delete the oldest version of Fortanix DSM from the UI.$ df -h /var/ / Filesystem Size Used Avail Use% Mounted on /dev/mapper/main-var 47G 26G 21G 56% /var /dev/sda2 47G 13G 33G 28% /
Post Upgrade
The following are the post-update details to note.
Kubernetes version is now upgraded from v1.10.13 to v1.14.10. This means that the packages kubeadm, kubectl, kubelet are upgraded to v.1.14.10.
kubectl does not support
-a
option now. The commandkubectl get pods -a
throws an error.kubelet configuration now uses a file called
/etc/default/kubelet
for extra arguments.sudo dpkg -l | grep 1.14.10 ii kubeadm 1.14.10-00fortanix amd64 Kubernetes Cluster Bootstrapping Tool ii kubectl 1.14.10-00 amd64 Kubernetes Command Line Tool ii kubelet 1.14.10-00 amd64 Kubernetes Node Agent
The docker version is upgraded from version 17.03 to version 18.06.
$ sudo dpkg -l | grep docker-ce ii docker-ce 18.06.3~ce~3-0~ubuntu amd64 Docker: the open-source application container engine
etcd upgraded to
3.3.10
from3.2.18
.Member list should show all members listening to peers on
2382(https)
and as a client on2379(https)
. There could be members that show peers listening on port2380(http)
alongside https usage.$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec etcd-sdkms-server-1 -nkube-system -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key-file /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list 9c5daa70fca6432: name=sdkms-server-1 peerURLs=https://10.197.192.43:2382 clientURLs=https://10.197.192.43:2379 isLeader=true 8d2bc14567dfd781: name=sdkms-server-2 peerURLs=https://10.197.192.44:2382 clientURLs=https://10.197.192.44:2379 isLeader=false
Flannel upgrade to
v0.11.0-1-g3b757492
fromv0.9.0
.Flannel CNI docker image is no longer used. Only a flannel docker image is used. Flannel CNI version updated in kube-flannel configmap from
0.3.0
to0.3.1
.$sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl describe ds kube-flannel-ds -nkube-system | grep Image Image: containers.fortanix.com:5000/flannel:v0.11.0-1-g3b757492 Image: containers.fortanix.com:5000/flannel:v0.11.0-1-g3b757492 $ sudo -E kubectl get configmap kube-flannel-cfg -nkube-system -oyaml | grep "cniVersion" "cniVersion": "0.3.1"
Pause docker image updated from version 3.0 to 3.1. To check this on the node, run the following command:
docker ps | grep pause
The swdist is updated as it has the kubectl specific version as part of the Dockerfile. This would cause swdist to roll out. It would be rolled out after nodes have been upgraded to v1.14. Check the age of the pods using the command below:
kubectl get pods -owide -nswdist
Kube Proxy patch:
Kube proxy daemon set has been patched after each k8s version to include patched kube-proxy docker image (this is done to patch the bug in kube-proxy and avoid contention on iptables lock).
Kube proxy docker image versions for each k8s version.
v1.11.10 -
v1.11.10-3-cab99e3cb4b51f
v1.12.10 -
v1.12.10-3-85f7b5925c428e
v1.13.12 -
v1.13.12-3-6e71bdf7e97b1c
v1.14.10 -
v1.14.10-5-740026d6e146df
Kernel is upgraded from 5.4.0-81-generic to 5.8.0-50-generic. The Kubernetes version on each node is upgraded to 1.14 prior to doing the kernel upgrade. This can be checked with the following command under column
KERNEL-VERSION
.$kubectl get nodes -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME sdkms-server-3 Ready master 8h v1.14.10 10.197.192.45 <none> Ubuntu 20.04.3 LTS 5.8.0-50-generic docker://18.6.3
Kured version is upgraded from 1.1.0 to 1.2.0
Troubleshooting
In case kubelet client certificates expire (
/var/lib/kubelet/pki/kubelet-client.crt
) and there is no/var/lib/kubelet/pki/kubelet-client-current.pem
file present, then you can create the certificates using the following commands:TEMP_DIR=/etc/kubernetes/tmp mkdir -p $TEMP_DIR BACKUP_PEM="/var/lib/kubelet/pki/kubelet-client-current.pem" KEY="/var/lib/kubelet/pki/kubelet-client.key" CERT="/var/lib/kubelet/pki/kubelet-client.crt" echo "Stopping kubelet service" systemctl stop kubelet echo "Creating a new key and cert file for kubelet auth" nodename=$(echo "$HOSTNAME" | awk '{print tolower($0)}') openssl req -out $TEMP_DIR/tmp.csr -new -newkey rsa:2048 -nodes -keyout $TEMP_DIR/tmp.key -subj "/O=system:nodes/CN=system:node:$nodename" cat > $TEMP_DIR/kubelet-client.ext << HERE keyUsage = critical,digitalSignature,keyEncipherment extendedKeyUsage = clientAuth HERE echo "Signing the generated csr with kubernetes CA" openssl x509 -req -days 365 -in $TEMP_DIR/tmp.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out $TEMP_DIR/tmp.crt -sha256 -extfile $TEMP_DIR/kubelet-client.ext cp $TEMP_DIR/tmp.crt $CERT cp $TEMP_DIR/tmp.key $KEY chmod 644 $CERT chmod 600 $KEY if grep -q "client-certificate-data" $KUBELET_CONF; then echo "Updating file $KUBELET_CONF to add reference to restored certificates" sed -i "s|\(client-certificate-data:\s*\).*\$|client-certificate: $CERT|" $KUBELET_CONF sed -i "s|\(client-key-data:\s*\).*\$|client-key: $KEY|" $KUBELET_CONF fi echo "Starting kubelet service" systemctl start kubelet
Upgrade on a 2 node cluster can fail due to etcd quorum failure. In such a scenario, if pods are healthy, you can re-run the deploy job manually using the following command. This will eventually upgrade the cluster to 1.14.
sdkms-cluster deploy --stage DEPLOY --version <version>
WARNING
2 node upgrades are not recommended.
When a cluster is upgraded from build 4.2.2087 to <4.3.xxxx> on a 3-node cluster, it is possible that the deploy job is exited and marked completed before cluster upgrade. In such a scenario, if all the pods are healthy, you can deploy the version again.