1.0 Introduction
This article describes the prechecks to be performed by the System Administrator before performing a Fortanix DSM software upgrade.
2.0 Prechecks
The run_precheck.sh
script provides three options during execution, and the user can select the most suitable one by entering the correct input number.
Remote – It executes the
/opt/fortanix/sdkms/bin/check-dsm-health.sh
script remotely on all the cluster nodes and fetches the related files, formats them, and displays the status on the screen.Local – It executes the
/opt/fortanix/sdkms/bin/check-dsm-health.sh
script locally on the same machine.IPMI – It verifies the Intelligent Platform Management Interface (IPMI) connectivity.
2.1 Option 1 – Remote
Perform the following steps to execute the Fortanix DSM precheck script remotely on all the nodes:
Run the following command to go to the
dsm_prechecks
directory:sudo su cd /opt/fortanix/sdkms/bin/dsm_prechecks
Run the following command to create the
config.txt
file:kubectl get no --no-headers -owide | awk '{print $6 }' > config.txt
In the
parameters.txt
file, update the values for the following parameters:REMOTE_USER="" AUTH_TYPE="PASSWORD" or "PRIVATE_KEY" PRIVATE_KEY_FILE="" NO_PASSWORD="true" or "false"
Where,
REMOTE_USER
refers to the name of the user. For example, administrator.AUTH_TYPE
can be eitherPASSWORD
orPRIVATE_KEY
.PRIVATE_KEY_FILE
refers to the path of the private key. This is only applicable if the user selects the value forAUTH_TYPE
parameter asPRIVATE_KEY
.NO_PASSWORD
can be eithertrue
orfalse
based on the sudo profile of the remote user.
Run the following command to execute the Fortanix DSM prechecks:
./run_prechecks.sh
Enter
1
to select"remote"
.
The screen will prompt for password only if the user has selected theAUTH_TYPE=PASSWORD
in Step 3.NOTE
Create a
node_password.txt
file if you want to provide a password in the formatIP|PASSWORD
through the file.The user can check the detailed DSM prechecks logs in the following directory created by the script:
/tmp/health_checks/remote_logs
2.2 Option 2 - Local
Perform the following steps to execute the Fortanix DSM precheck script locally:
Run the following command to execute the Fortanix DSM prechecks:
sudo ./run_prechecks.sh
Enter
2
to select"local"
.The user can check the detailed Fortanix DSM precheck logs in the following directory created by the script:
/tmp/health_checks/logs
2.3 Option 3 - IPMI
Perform the following steps to verify the Intelligent Platform Management Interface (IPMI) connectivity:
NOTE
Ensure that the IPMI is in the same network.
Run the following command to go to the
dsm_prechecks
directory:cd /opt/fortanix/sdkms/bin/dsm_prechecks
Run the following command to create the
ipmi.txt
file:vi ipmi.txt
Add IPMI address and username in the
ipmi.txt
file in the following format:10.10.10.10|admin 11.11.11.11|admin
Run the following command to execute the Fortanix DSM prechecks:
sudo ./run_prechecks.sh
Enter
3
to select"ipmi connectivity check"
.
3.0 Handling Upgrade Issues Related to Attestation Value (Non-SGX Only)
If you are upgrading to a version lower than Fortanix DSM 4.31 and using a non-sgx build, check the value for the config-values
parameter using the following command:
sdkms-cluster get config --system
If the attestation
value is ias
, then follow the workaround as outlined below. For more information, contact the Fortanix Support Team.
Perform the following steps:
Add the following parameter in the
config.yaml
file to update theattestation
value:attestation: null
Run the following deploy command to apply the updated configurations:
sdkms-cluster --config config.yaml --stage DEPLOY
Monitor the deployment job and ensure all pods are running.
Run the following command to view the status of the pods:
kubectl get pods –A -owide
Run the following command to verify the
attestation
value in theconfig-values
file:kubectl get cm config-values -o yaml
The following must be the output of the command:
apiVersion: v1 data: config-values: |- --- global: attestation: ~
4.0 Checks Handled by the Script
Check Name | Check Type | Purpose |
---|---|---|
HEALTH_CHECK_QUORUM | NODE | Checks the health of the sdkms and Cassandra pods, the script |
HEALTH_CHECK_ALL | NODE | Checks the health of the sdkms and Cassandra pods, the script When we send |
DISK_CHECK | NODE | Checks the |
NTP_CHECK | NODE | The script |
API_CERT | NODE | Checks for the availability of |
KUBELET_CERT | NODE | Checks for the If it is pointing to some other file in If |
DOCKER_REGISTRY_SRVC_CHECK | NODE | Checks whether the docker-registry service is up and running. |
KUBELET_SRVC_CHECK | NODE | Checks whether the kubelet service is up and running. |
CRI_SRVC_CHECK | NODE | Checks whether the docker or containerd service is up and running. |
1M_CPU_CHECK | NODE | Checks whether the 1 minute CPU load average is less than the defined threshold value. (90%) |
5M_CPU_CHECK | NODE | Checks whether the 5 minute CPU load average is less than the defined threshold value (90%). |
15M_CPU_CHECK | Checks whether the 15 minute CPU load average is less than the defined threshold value (90%). | |
MEM_CHECK | NODE | Checks whether the memory utilization is less than the defined threshold value (75%). |
DB_FILES_PERM_CHECK | NODE | Checks whether the file permissions of |
CONN_Q_CHECK | NODE | Checks the number of public connections for each node querying the Metrics API, if the value is greater than 2000 then it flags it as Checks the backend Q size querying Metrics API, if the value is greater than 10000 then flag it as |
CAS_ACCT_CHECK | NODE | Checks the count of rows in |
CAS_ADMIN_ACCT_CHECK | NODE | Checks the count of sysadmin accounts. If the count is <= 1 then |
SGX_CHECK | NODE | Checks whether DSM pod is running “enclave” process and also whether the machine has If driver is present and sdkms pod is not running “enclave” process, then report Flag it as |
NODE_CHECK | CLUSTER | Checks whether all the nodes are in ready status |
PODS_CHECK | CLUSTER | Checks whether all pods are in Running or Completed status, if not, then flag them as |
JOB_CHECK | CLUSTER | Checks whether all jobs are in “Complete” status; if not, it flags them as The running jobs are flagged because before you start the upgrade, all jobs must be in completed status. |
CONTAINER_CHECK | CLUSTER | Checks whether all containers within pods are in a |
IPMI_INFO_CHK | NODE | Verifies the IPMI IP address, default gateway mac and default gateway IP are valid, if not, then flags them as |
CAS_REP_CHECK | CLUSTER | Checks whether the added DC Labeling is matching with the replication strategy in Cassandra. (Please note this validation is for Network Topology and fully replicated cluster) if there is any mismatch then it flags it as If in case the DC labeling is present, but the Strategy is Simple Strategy in Cassandra, then it flags it as |
CAS_CERT_CHECK | NODE | Checks whether the Cassandra pod certificate is valid if not it flags it as |
REPLICA_CHECK | CLUSTER | Checks the
and compares that with |
BIOS_VER_CHECK | NODE | Verifies that the node has latest BIOS installed, if not it flags it as |
IMAGE_VERSION_CHECK | CLUSTER | Validates the version value from If there is any mismatch, then it flags it as |
BACKUP_SETUP_CHECK | CLUSTER | Checks if the CRON job |
KUBEAPI_IP_CHECK | CLUSTER | Checks whether the |
SWDIST_CHECK | CLUSTER | Checks the count of directories in the path |
CAS_NODETOOL_CHECK | CLUSTER | Executes the nodetool status command and checks whether there are any nodes which are not of the pattern Checks whether the count of pattern |
ETCD_HEALTH_CHECK | CLUSTER | Verifies whether the etcd cluster is healthy, if not it flags it as |
SWDIST_OVERLAY_SRVC_CHECK | NODE | Verifies the status of |
PERM_DAEMON_SRVC_CHECK | NODE | Verifies the status of |
5.0 Fortanix DSM Prechecks Output Status
This section describes the following three type of precheck output status:
OK: No action item and it is successful.
WARN: Requires attention and the user needs to check the logs created on the node
/tmp/health_checks/
and the user must share the log details with the Fortanix Support team.SKIPPED: Check was not executed due to some issue in the cluster.
For example, Cassandra Replication Strategy check requires Cassandra to be healthy to fetch details from the Cassandra pod, but if the pod is not healthy, then that check will be SKIPPED.
Hence, it is required to check the logs to understand the reason for skipping the check. The user must share the log details with the Fortanix support team.