Fortanix Data Security Manager Software Pre-Upgrade Checks

1.0 Introduction

This article describes the prechecks to be performed by the System Administrator before performing a DSM (DSM) software upgrade.

2.0 Prechecks

The run_precheck.sh script provides three options during execution, and the user can select the most suitable one by entering the correct input number.

Remote – It executes the /opt/fortanix/sdkms/bin/check-dsm-health.sh script remotely on all the cluster nodes and fetches the related files, formats them, and displays the status on the screen.
Local – It executes the /opt/fortanix/sdkms/bin/check-dsm-health.sh script locally on the same machine.
IPMI – It verifies the Intelligent Platform Management Interface (IPMI) connectivity.

2.1 Option 1 – Remote

Perform the following steps to execute the Fortanix DSM precheck script remotely on all the nodes:

Run the following command to go to the dsm_prechecks directory:
```
sudo su
cd /opt/fortanix/sdkms/bin/dsm_prechecks
```

Run the following command to create the config.txt file:

kubectl get no --no-headers -owide | awk '{print $6 }' > config.txt

In the parameters.txt file, update the values for the following parameters:
```
REMOTE_USER=""
AUTH_TYPE="PASSWORD" or "PRIVATE_KEY"
PRIVATE_KEY_FILE=""
NO_PASSWORD="true" or "false"
```
Where,
- REMOTE_USER refers to the name of the user. For example, administrator.
- AUTH_TYPE can be either PASSWORD or PRIVATE_KEY.
- PRIVATE_KEY_FILE refers to the path of the private key. This is only applicable if the user selects the value for AUTH_TYPE parameter as PRIVATE_KEY.
- NO_PASSWORD can be either true or false based on the sudo profile of the remote user.
Run the following command to execute the Fortanix DSM prechecks:
```
./run_prechecks.sh
```
Enter 1 to select "remote".
The screen will prompt for password only if the user has selected the AUTH_TYPE=PASSWORD in Step 3.
NOTE
Create a node_password.txt file if you want to provide a password in the format IP|PASSWORD through the file.
The user can check the detailed DSM prechecks logs in the following directory created by the script:
```
/tmp/health_checks/remote_logs
```

2.2 Option 2 - Local

Perform the following steps to execute the Fortanix DSM precheck script locally:

Run the following command to execute the Fortanix DSM prechecks:
```
sudo ./run_prechecks.sh
```
Enter 2 to select "local".
The user can check the detailed Fortanix DSM precheck logs in the following directory created by the script:
```
/tmp/health_checks/logs
```

2.3 Option 3 - IPMI

Perform the following steps to verify the Intelligent Platform Management Interface (IPMI) connectivity:

NOTE
Ensure that the IPMI is in the same network.

Run the following command to go to the dsm_prechecks directory:
```
cd /opt/fortanix/sdkms/bin/dsm_prechecks
```
Run the following command to create the ipmi.txt file:
```
vi ipmi.txt
```
Add IPMI address and username in the ipmi.txt file in the following format:
```
10.10.10.10|admin
11.11.11.11|admin
```
Run the following command to execute the Fortanix DSM prechecks:
```
sudo ./run_prechecks.sh
```
Enter 3 to select "ipmi connectivity check".

2.4 Cluster Health Checks

After completing the prechecks, you can run the following cluster-level health check commands from any node within the cluster to validate its overall health:

These health checks help confirm that the cluster meets the necessary configuration, security, and readiness requirements before proceeding with the upgrade process.

The commands should be executed after the Fortanix DSM prechecks, and the results, along with the precheck outputs, must be shared with the Fortanix Support Team to verify the cluster’s health before the upgrade.

cat /etc/fortanix/sdkms_version/sdkms_version
kubectl get nodes,pods -owide -L datacenter
kubectl get pods -n kube-system -owide
kubectl get jobs
kubectl exec cassandra-0 -- nodetool status public;
kubectl get deployment -owide
kubectl get sts -owide
kubectl get ep -n swdist -owide
kubectl get cj -owide 
kubectl get po -o custom-columns=POD:.metadata.name,IMAGE:.spec.containers[*].image 
kubectl get po -o custom-columns=POD:.metadata.name,IMAGE:.spec.containers[*].image -n kube-system
sdkms-cluster get config --system | grep -v password

3.0 Handling Upgrade Issues Related to Attestation Value (Non-SGX Only)

If you are upgrading to a version lower than Fortanix DSM 4.31 and using a non-sgx build, check the value for the config-values parameter using the following command:

sdkms-cluster get config --system

If the attestation value is ias, then follow the workaround as outlined below. For more information, contact the Fortanix Support Team.

Perform the following steps:

Add the following parameter in the config.yaml file to update the attestation value:
```
attestation: null
```
Run the following deploy command to apply the updated configurations:
```
sdkms-cluster --config config.yaml --stage DEPLOY
```
Monitor the deployment job and ensure all pods are running.
Run the following command to view the status of the pods:
```
kubectl get pods –A -owide
```
Run the following command to verify the attestation value in the config-values file:
```
kubectl get cm config-values -o yaml
```
The following must be the output of the command:
```
apiVersion: v1
data:
  config-values: |-
    ---
    global:
      attestation: ~
```

4.0 Checks Handled by the Script

Check Name	Check Type	Purpose
HEALTH_CHECK_QUORUM	NODE	Checks the health of the sdkms and Cassandra pods, the script `check-dsm-health.sh` makes the following API call `sys/v1/health?consistency=quorum`
HEALTH_CHECK_ALL	NODE	Checks the health of the sdkms and Cassandra pods, the script `check-dsm-health.sh` makes the following API call `sys/v1/health?consistency=all` When we send `“all”`, it means the client is expecting a response from all replicas (for a 3-node fully replicated cluster, a response from 3 Cassandra pods is considered successful).
DISK_CHECK	NODE	Checks the `“/var”`, `“/”`, and `“/data”` directory usage. The threshold is 70%, so if disk usage is higher than that, you will see `“WARN”` in the status.
NTP_CHECK	NODE	The script `check-dsm-health.sh` executes the `ntpq -p` command and checks if the node is syncing with at least one NTP server .
API_CERT	NODE	Checks for the availability of `/etc/kubernetes/pki/apiserver.crt` cert and also expiry date. If the expiration date is less than 30 days, then it flags it as `“WARN”`.
KUBELET_CERT	NODE	Checks for the `kubelet.conf file`, `/etc/kubernetes/kubelet.conf`, and verifies whether the certificate is embedded; if yes, it checks the expiry of the certificate-data. If it is pointing to some other file in `/var/lib/kubelet/pki`, then it checks the expiry of that certificate. If `kubelet.conf` is pointing to `/var/lib/kubelet/pki/kubelet-cert-current.pem`, then no action is required.
DOCKER_REGISTRY_SRVC_CHECK	NODE	Checks whether the docker-registry service is up and running.
KUBELET_SRVC_CHECK	NODE	Checks whether the kubelet service is up and running.
CRI_SRVC_CHECK	NODE	Checks whether the docker or containerd service is up and running.
1M_CPU_CHECK	NODE	Checks whether the 1 minute CPU load average is less than the defined threshold value. (90%)
5M_CPU_CHECK	NODE	Checks whether the 5 minute CPU load average is less than the defined threshold value (90%).
15M_CPU_CHECK		Checks whether the 15 minute CPU load average is less than the defined threshold value (90%).
MEM_CHECK	NODE	Checks whether the memory utilization is less than the defined threshold value (75%).
DB_FILES_PERM_CHECK	NODE	Checks whether the file permissions of `/data/cassandra/public` are valid: user - 999 and group - `docker`, if not then report as `“WARN”`.
CONN_Q_CHECK	NODE	Checks the number of public connections for each node querying the Metrics API, if the value is greater than 2000 then it flags it as `“WARN”`, else `“OK”`. Checks the backend Q size querying Metrics API, if the value is greater than 10000 then flag it as `“WARN”`, else `“OK”`.
CAS_ACCT_CHECK	NODE	Checks the count of rows in `account` and `account_primary` tables, if not matching then Flag it as `“WARN”`, else `“OK”`.
CAS_ADMIN_ACCT_CHECK	NODE	Checks the count of sysadmin accounts. If the count is <= 1 then `“WARN”`, else `“OK”`.
SGX_CHECK	NODE	Checks whether DSM pod is running “enclave” process and also whether the machine has `/dev/*sgx` driver, if yes then report `“SGX SDKMS POD”`. If driver is present and sdkms pod is not running “enclave” process, then report `“NON SGX SDKMS POD ON SGX MACHINE”`. Flag it as `“WARN”` because we expect to run SGX deployment.
NODE_CHECK	CLUSTER	Checks whether all the nodes are in ready status `“kubectl get nodes”`.
PODS_CHECK	CLUSTER	Checks whether all pods are in Running or Completed status, if not, then flag them as `“WARN”`.
JOB_CHECK	CLUSTER	Checks whether all jobs are in “Complete” status; if not, it flags them as `“WARN”` for all running and failed jobs. The running jobs are flagged because before you start the upgrade, all jobs must be in completed status.
CONTAINER_CHECK	CLUSTER	Checks whether all containers within pods are in a `“ready”` state; if not, then flags them as `“WARN”`.
IPMI_INFO_CHK	NODE	Verifies the IPMI IP address, default gateway mac and default gateway IP are valid, if not, then flags them as `“WARN”`.
CAS_REP_CHECK	CLUSTER	Checks whether the added DC Labeling is matching with the replication strategy in Cassandra. (Please note this validation is for Network Topology and fully replicated cluster) if there is any mismatch then it flags it as `“WARN”`. If in case the DC labeling is present, but the Strategy is Simple Strategy in Cassandra, then it flags it as `“WARN”` (Simple Strategy is not recommended for production clusters).
CAS_CERT_CHECK	NODE	Checks whether the Cassandra pod certificate is valid if not it flags it as `“WARN”`.
REPLICA_CHECK	CLUSTER	Checks the `“replicas”` field value in the configuration map using the command : `sdkms-cluster get config --system` and compares that with `“replicas”` of deployment and statefulset. If there is no match, then it flags it as `“WARN”`, else `“OK”`.
BIOS_VER_CHECK	NODE	Verifies that the node has latest BIOS installed, if not it flags it as `“WARN”`.
IMAGE_VERSION_CHECK	CLUSTER	Validates the version value from `/etc/fortanix/sdkms_version/sdkms_version` file with the Image version of `SDKMS/SDKMS-PROXY/SDKMS-UI` deployments. If there is any mismatch, then it flags it as `“WARN”`. (This check also helps post upgrade/cluster creation)
BACKUP_SETUP_CHECK	CLUSTER	Checks if the CRON job `sdkms-backup` exists; if it is not present, it flags it as `“WARN”`.
KUBEAPI_IP_CHECK	CLUSTER	Checks whether the `kube-apiserver` manifest file has the correct IP address, if not then it flags it as `“WARN”`.
SWDIST_CHECK	CLUSTER	Checks the count of directories in the path `/var/opt/fortanix/swdist/data/` and Swdist endpoints and versions file (`/var/opt/fortanix/swdist/versions`) are matching.
CAS_NODETOOL_CHECK	CLUSTER	Executes the nodetool status command and checks whether there are any nodes which are not of the pattern `“UN”` (Up and Normal) Checks whether the count of pattern `“UN”` is matching with the Cassandra pod count.
ETCD_HEALTH_CHECK	CLUSTER	Verifies whether the etcd cluster is healthy, if not it flags it as `“WARN”`.
SWDIST_OVERLAY_SRVC_CHECK	NODE	Verifies the status of `/var/opt/fortanix/swdist_overlay` service.
PERM_DAEMON_SRVC_CHECK	NODE	Verifies the status of `perm_daemon.service` service.

5.0 Fortanix DSM Prechecks Output Status

This section describes the following three type of precheck output status:

OK: No action item and it is successful.
WARN: Requires attention and the user needs to check the logs created on the node /tmp/health_checks/ and the user must share the log details with the Fortanix Support team.
SKIPPED: Check was not executed due to some issue in the cluster.
For example, Cassandra Replication Strategy check requires Cassandra to be healthy to fetch details from the Cassandra pod, but if the pod is not healthy, then that check will be SKIPPED.
Hence, it is required to check the logs to understand the reason for skipping the check. The user must share the log details with the Fortanix support team.

Fortanix Data Security Manager Software Pre-Upgrade Checks - Manual

1.0 Introduction

2.0 Prechecks

2.1 Option 1 – Remote

2.2 Option 2 - Local

2.3 Option 3 - IPMI

2.4 Cluster Health Checks

3.0 Handling Upgrade Issues Related to Attestation Value (Non-SGX Only)

4.0 Checks Handled by the Script

5.0 Fortanix DSM Prechecks Output Status

PLATFORM

Key Insight

Data Security Manager™

Confidential Computing Manager

Enclave Development Platform®

Request A demo

Contact Us

Free Trial

SOLUTIONS

AWS KMS External Key Store (XKS)

Google External Key Manager

Bring Your Own Key (BYOK)

HSM Modernization

Multicloud Key Management

Post Quantum Cryptography

Code Signing

Secrets Management

Tokenization Transparent

Database Encryption

Filesystem Encryption

Confidential Data Search

Confidential AI

Healthcare

Banking & Financial Services

Fintech

Manufacturing

Web 3.0

Federal Government

RESOURCES

Blog

Whitepapers

Datasheets

Solution Briefs

Ebooks

Reports

Case Studies

Webinars

University

Media Kit

Newsletters

COMPANY

About

Careerswe’re hiring

Customers

Partners

Awards

Events

Press

News

Services

Support

FAQ

4.6