Fortanix Data Security Manager (Release 3.24) Kubernetes Version Upgrade to 1.10 K8s

Introduction

The purpose of this article is to describe the steps to upgrade Kubernetes for Fortanix Data Security Manager (DSM) release 3.24 from version 1.08 k8s to 1.10 k8s.

Overview

Fortanix DSM 3.24 is the first release to include Kubernetes upgrades. It will upgrade the system from Kubernetes version 1.8 k8s to 1.10 k8s.

Subsequent Kubernetes upgrades will be released as part of regular upgrades or could continue to be independent upgrades.

After upgrading Fortanix DSM to the 3.24 version, you will not be able to downgrade to previous releases. The Fortanix DSM UI will not allow a downgrade after 3.24 is installed. Please work with Fortanix Support to ensure you have a valid backup that can be used to perform a manual recovery.

Also, you will need to upgrade Fortanix DSM to 3.24 before moving to any future release.

Prerequisites

The following are the prerequisites before upgrading:

  • Ensure that Disk space of more than 15 GB is available in /var  and root directory (/) by executing the following command:
    root@us-west-eqsv2-13:~# df -h /var/ /

    The following is the output:

    Filesystem            Size  Used Avail Use% Mounted on
    
    /dev/mapper/main-var   46G   28G   16G  64% /var
    
    /dev/sda2              46G   21G   24G  47% /
  • Ensure that all software versions are available in all the endpoints by executing the following command:
    root@us-west-eqsv2-13:~# kubectl  get ep -n swdist
    The following is the output:
    NAME      ENDPOINTS                                         AGE
    
    swdist    10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   242d
    
    v2649     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   4d
    
    v2657     10.244.0.212:22,10.244.1.191:22,10.244.2.152:22   2d
  • Ensure that Docker registry status “systemctl status docker-registry” is active and running before and after software is uploaded. Also ensure that overlay mount matches with this on each node. The following is the command:
    cat /etc/systemd/system/var-opt-fortanix-swdist_overlay.mount.d/options.conf
    [Mount]
    Options=lowerdir=/var/opt/fortanix/swdist/data/vXXXX/registry:/var/opt/fortanix/swdist/data/vYYYY/registry
    Here, ‘vXXXX’ is the previous version and ‘vYYYY’ is the upgraded version.
  • Ensure that the latest backup is triggered and verify that it is a successful backup (size and so on).

Upgrading Kubernetes from 1.8 to 1.10

Ensure that you read the ‘Prerequisites’ section before upgrading.

The following are the steps for upgrading Kubernetes from 1.8 to 1.10:

  1. Make sure Kubernetes certificates have not expired or about to expire. Use the following paths to verify:
    • To check the expiration of the certs, check .crt files of all certificates in ‘/etc/kubernetes/pki

      The following command will display the expiration date:

      openssl x509 -enddate -noout -in 
    • If the certificates have expired, please renew the certificates using the following procedure:
      1. Execute the following script with ‘sudo’ privilege:
        sudo /opt/fortanix/sdkms/bin/renew-k8s-certs.sh
      2. Reboot the node.
        Complete the above steps on one node before moving on to the next one.
  2. Upgrade the current cluster to “3.23 in Patch” (Feb-5th). For more information, refer to https://support.fortanix.com/hc/en-us/articles/360057264491--3-23-Patch-Feb-5-2021
  3. Then, 3.24 Fortanix DSM upgrade procedure remains the same as any other release. Upgrade for a 3-node cluster takes around 45 to 60 minutes.

Verifying That the Upgrade is Successful (Post-Health Check of the Upgrade)

The following is the procedure to verify that the upgrade is successful:

  1. All nodes are healthy and are running Kubernetes version 1.10 by executing the following command:
    kubectl get nodes
  2. The deploy job is successful and latest deploy pod is in completed state.
  3. All the respective pods for different namespaces are healthy (For example, in default, swdist, and kube-systems)
  4. Verify that the latest version installed on the cluster is 3.24.

Troubleshooting

If any of the scripts fail at any step, they will print a detailed error message. Please report the problem to Fortanix support (support@fortanix.com) and provide the script output.

The following are some of the troubleshooting instructions:

  • Check kubelet and docker-registry
  • Logs are stored in /var/log/fortanix directory in each of the nodes:
    • Log file is named as kubernetes_update_from_<v1>_to_<v2>.<log/err/debug>
    • You can check the log and error files when troubleshooting.
  • You can verify if Etcd peer TLS communication is added after Kubernetes version is upgraded to 1.10.13 using the following command:
    sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec  -nkube-system -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/healthcheck-client.crt --key-file /etc/kubernetes/pki/etcd/healthcheck-client.key --endpoints https://127.0.0.1:2379 member list

    The following is an example output of the above command. The peerURLs should have both ports listed [2380(http) and 2382(https)]:

    37bc079e7f15c970: name=ali1 peerURLs=http://10.197.192.14:2380,https://10.197.192.14:2382 clientURLs=https://10.197.192.14:2379 isLeader=false
    
    e6214c803ea4e0c6: name=nuc3 peerURLs=http://10.197.192.12:2380,https://10.197.192.12:2382 clientURLs=http://10.197.192.12:2379 isLeader=true
  • If updater pod is not ready, it can be due to the following reasons:
    • Node is not ready. You can verify using the command:
      kubectl get nodes
    • Fortanix DSM pod on the node not ready (0/1). The following is the command:
      kubectl get pods -lapp=sdkms -owide | grep <nodename>
    • Presence of /var/run/reboot-required file on the node.
    • Package version is not updated to the upgraded one ‘/etc/fortanix/sdkms_version/version’ after Kubernetes version has been upgraded to 1.10.13.
    • Etcd is not migrated to use peer TLS communication after the node is upgraded to version 1.10.13.
  • Node is in schedulingDisabled (cordoned off)
    • Check logs in /var/log/fortanix to look for any errors in Kubernetes upgrade logs. Node draining and uncordoning happens each time you upgrade the Kubernetes version, more specifically, kubelet and kubectl versions.
    • If everything seems fine, there might be some packages installed after Kubernetes version is upgraded to 1.10.13 that require the node to be rebooted.
      • Check for /var/run/reboot-required file in the node.
      • If present, check for kured pod logs in kube-system namespace to see if it was able to drain the node and uncordon it.

Comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful