1.0 Introduction
This article describes the prechecks to be performed by the System Administrator before performing a DSM (DSM) software upgrade.
2.0 Prechecks
The run_precheck.sh script provides three options during execution, and the user can select the most suitable one by entering the correct input number.
Remote – It executes the
/opt/fortanix/sdkms/bin/check-dsm-health.shscript remotely on all the cluster nodes and fetches the related files, formats them, and displays the status on the screen.Local – It executes the
/opt/fortanix/sdkms/bin/check-dsm-health.shscript locally on the same machine.IPMI – It verifies the Intelligent Platform Management Interface (IPMI) connectivity.
2.1 Option 1 – Remote
Perform the following steps to execute the Fortanix DSM precheck script remotely on all the nodes:
Run the following command to go to the
dsm_prechecksdirectory:sudo su cd /opt/fortanix/sdkms/bin/dsm_prechecksRun the following command to create the
config.txtfile:kubectl get no --no-headers -owide | awk '{print $6 }' > config.txtIn the
parameters.txtfile, update the values for the following parameters:REMOTE_USER="" AUTH_TYPE="PASSWORD" or "PRIVATE_KEY" PRIVATE_KEY_FILE="" NO_PASSWORD="true" or "false"Where,
REMOTE_USERrefers to the name of the user. For example, administrator.AUTH_TYPEcan be eitherPASSWORDorPRIVATE_KEY.PRIVATE_KEY_FILErefers to the path of the private key. This is only applicable if the user selects the value forAUTH_TYPEparameter asPRIVATE_KEY.NO_PASSWORDcan be eithertrueorfalsebased on the sudo profile of the remote user.
Run the following command to execute the Fortanix DSM prechecks:
./run_prechecks.shEnter
1to select"remote".
The screen will prompt for password only if the user has selected theAUTH_TYPE=PASSWORDin Step 3.NOTE
Create a
node_password.txtfile if you want to provide a password in the formatIP|PASSWORDthrough the file.The user can check the detailed DSM prechecks logs in the following directory created by the script:
/tmp/health_checks/remote_logs
2.2 Option 2 - Local
Perform the following steps to execute the Fortanix DSM precheck script locally:
Run the following command to execute the Fortanix DSM prechecks:
sudo ./run_prechecks.shEnter
2to select"local".The user can check the detailed Fortanix DSM precheck logs in the following directory created by the script:
/tmp/health_checks/logs
2.3 Option 3 - IPMI
Perform the following steps to verify the Intelligent Platform Management Interface (IPMI) connectivity:
NOTE
Ensure that the IPMI is in the same network.
Run the following command to go to the
dsm_prechecksdirectory:cd /opt/fortanix/sdkms/bin/dsm_prechecksRun the following command to create the
ipmi.txtfile:vi ipmi.txtAdd IPMI address and username in the
ipmi.txtfile in the following format:10.10.10.10|admin 11.11.11.11|adminRun the following command to execute the Fortanix DSM prechecks:
sudo ./run_prechecks.shEnter
3to select"ipmi connectivity check".
2.4 Cluster Health Checks
After completing the prechecks, you can run the following cluster-level health check commands from any node within the cluster to validate its overall health:
These health checks help confirm that the cluster meets the necessary configuration, security, and readiness requirements before proceeding with the upgrade process.
The commands should be executed after the Fortanix DSM prechecks, and the results, along with the precheck outputs, must be shared with the Fortanix Support Team to verify the cluster’s health before the upgrade.
cat /etc/fortanix/sdkms_version/sdkms_version
kubectl get nodes,pods -owide -L datacenter
kubectl get pods -n kube-system -owide
kubectl get jobs
kubectl exec cassandra-0 -- nodetool status public;
kubectl get deployment -owide
kubectl get sts -owide
kubectl get ep -n swdist -owide
kubectl get cj -owide
kubectl get po -o custom-columns=POD:.metadata.name,IMAGE:.spec.containers[*].image
kubectl get po -o custom-columns=POD:.metadata.name,IMAGE:.spec.containers[*].image -n kube-system
sdkms-cluster get config --system | grep -v password3.0 Handling Upgrade Issues Related to Attestation Value (Non-SGX Only)
If you are upgrading to a version lower than Fortanix DSM 4.31 and using a non-sgx build, check the value for the config-values parameter using the following command:
sdkms-cluster get config --systemIf the attestation value is ias, then follow the workaround as outlined below. For more information, contact the Fortanix Support Team.
Perform the following steps:
Add the following parameter in the
config.yamlfile to update theattestationvalue:attestation: nullRun the following deploy command to apply the updated configurations:
sdkms-cluster --config config.yaml --stage DEPLOYMonitor the deployment job and ensure all pods are running.
Run the following command to view the status of the pods:
kubectl get pods –A -owideRun the following command to verify the
attestationvalue in theconfig-valuesfile:kubectl get cm config-values -o yamlThe following must be the output of the command:
apiVersion: v1 data: config-values: |- --- global: attestation: ~
4.0 Checks Handled by the Script
Check Name | Check Type | Purpose |
|---|---|---|
HEALTH_CHECK_QUORUM | NODE | Checks the health of the sdkms and Cassandra pods, the script |
HEALTH_CHECK_ALL | NODE | Checks the health of the sdkms and Cassandra pods, the script When we send |
DISK_CHECK | NODE | Checks the |
NTP_CHECK | NODE | The script |
API_CERT | NODE | Checks for the availability of |
KUBELET_CERT | NODE | Checks for the If it is pointing to some other file in If |
DOCKER_REGISTRY_SRVC_CHECK | NODE | Checks whether the docker-registry service is up and running. |
KUBELET_SRVC_CHECK | NODE | Checks whether the kubelet service is up and running. |
CRI_SRVC_CHECK | NODE | Checks whether the docker or containerd service is up and running. |
1M_CPU_CHECK | NODE | Checks whether the 1 minute CPU load average is less than the defined threshold value. (90%) |
5M_CPU_CHECK | NODE | Checks whether the 5 minute CPU load average is less than the defined threshold value (90%). |
15M_CPU_CHECK | Checks whether the 15 minute CPU load average is less than the defined threshold value (90%). | |
MEM_CHECK | NODE | Checks whether the memory utilization is less than the defined threshold value (75%). |
DB_FILES_PERM_CHECK | NODE | Checks whether the file permissions of |
CONN_Q_CHECK | NODE | Checks the number of public connections for each node querying the Metrics API, if the value is greater than 2000 then it flags it as Checks the backend Q size querying Metrics API, if the value is greater than 10000 then flag it as |
CAS_ACCT_CHECK | NODE | Checks the count of rows in |
CAS_ADMIN_ACCT_CHECK | NODE | Checks the count of sysadmin accounts. If the count is <= 1 then |
SGX_CHECK | NODE | Checks whether DSM pod is running “enclave” process and also whether the machine has If driver is present and sdkms pod is not running “enclave” process, then report Flag it as |
NODE_CHECK | CLUSTER | Checks whether all the nodes are in ready status |
PODS_CHECK | CLUSTER | Checks whether all pods are in Running or Completed status, if not, then flag them as |
JOB_CHECK | CLUSTER | Checks whether all jobs are in “Complete” status; if not, it flags them as The running jobs are flagged because before you start the upgrade, all jobs must be in completed status. |
CONTAINER_CHECK | CLUSTER | Checks whether all containers within pods are in a |
IPMI_INFO_CHK | NODE | Verifies the IPMI IP address, default gateway mac and default gateway IP are valid, if not, then flags them as |
CAS_REP_CHECK | CLUSTER | Checks whether the added DC Labeling is matching with the replication strategy in Cassandra. (Please note this validation is for Network Topology and fully replicated cluster) if there is any mismatch then it flags it as If in case the DC labeling is present, but the Strategy is Simple Strategy in Cassandra, then it flags it as |
CAS_CERT_CHECK | NODE | Checks whether the Cassandra pod certificate is valid if not it flags it as |
REPLICA_CHECK | CLUSTER | Checks the
and compares that with |
BIOS_VER_CHECK | NODE | Verifies that the node has latest BIOS installed, if not it flags it as |
IMAGE_VERSION_CHECK | CLUSTER | Validates the version value from If there is any mismatch, then it flags it as |
BACKUP_SETUP_CHECK | CLUSTER | Checks if the CRON job |
KUBEAPI_IP_CHECK | CLUSTER | Checks whether the |
SWDIST_CHECK | CLUSTER | Checks the count of directories in the path |
CAS_NODETOOL_CHECK | CLUSTER | Executes the nodetool status command and checks whether there are any nodes which are not of the pattern Checks whether the count of pattern |
ETCD_HEALTH_CHECK | CLUSTER | Verifies whether the etcd cluster is healthy, if not it flags it as |
SWDIST_OVERLAY_SRVC_CHECK | NODE | Verifies the status of |
PERM_DAEMON_SRVC_CHECK | NODE | Verifies the status of |
5.0 Fortanix DSM Prechecks Output Status
This section describes the following three type of precheck output status:
OK: No action item and it is successful.
WARN: Requires attention and the user needs to check the logs created on the node
/tmp/health_checks/and the user must share the log details with the Fortanix Support team.SKIPPED: Check was not executed due to some issue in the cluster.
For example, Cassandra Replication Strategy check requires Cassandra to be healthy to fetch details from the Cassandra pod, but if the pod is not healthy, then that check will be SKIPPED.
Hence, it is required to check the logs to understand the reason for skipping the check. The user must share the log details with the Fortanix support team.