---
title: "Upgrade Prechecks Using Sensu - Automated"
slug: "fortanix-data-security-manager-upgrade-prechecks-using-sensu-automated"
updated: 2026-04-01T07:30:29Z
published: 2025-09-19T08:34:26Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://support.fortanix.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Upgrade Prechecks Using Sensu - Automated

## 1.0 Introduction

Welcome to the Fortanix-Data-Security-Manager (DSM) Administration guide. The purpose of this guide is to describe the automated prechecks using a script before upgrades so that the user can configure a Sensu check and run it using Sensu agent for real-time monitoring.

## 2.0 Prerequisites

- The script for automated prechecks is available at the following location from 4.16: `/opt/fortanix/sdkms/bin/check-dsm-health.sh`
- A Sensu agent running on a Fortanix DSM host that has sudo permission to run scripts in` /opt/fortanix/sdkms/bin/check-dsm-health.sh`
- A Sensu backend server to configure checks and monitor the alerts.
- Configure Sensu check in Sensu Dashboard to execute the script.

## 3.0 Checks Handled by the Script

| **Check Name** | **Check Type** | **Purpose** |
| --- | --- | --- |
| **HEALTH_CHECK_QUORUM** | NODE | Checks the health of the sdkms and Cassandra pods, the script `check-dsm-health.sh` makes the following API call `sys/v1/health?consistency=quorum` |
| **HEALTH_CHECK_ALL** | NODE | Checks the health of the sdkms and Cassandra pods, the script `check-dsm-health.sh` makes the following API call `sys/v1/health?consistency=all` When we send `“all”`, it means the client is expecting a response from all replicas (for a 3-node fully replicated cluster, a response from 3 Cassandra pods is considered successful). |
| **DISK_CHECK** | NODE | Checks the `“/var”`, `“/”`, and `“/data”` directory usage. The threshold is 70%, so if disk usage is higher than that, you will see `“WARN”` in the status. |
| **NTP_CHECK** | NODE | The script `check-dsm-health.sh` executes the `ntpq -p` command and checks if the node is syncing with at least one NTP server . |
| **API_CERT** | NODE | Checks for the availability of `/etc/kubernetes/pki/apiserver.crt` cert and also expiry date. If the expiration date is less than 30 days, then it flags it as `“WARN”`. |
| **KUBELET_CERT** | NODE | Checks for the `kubelet.conf file`, `/etc/kubernetes/kubelet.conf`, and verifies whether the certificate is embedded; if yes, it checks the expiry of the certificate-data. If it is pointing to some other file in `/var/lib/kubelet/pki`, then it checks the expiry of that certificate. If `kubelet.conf` is pointing to `/var/lib/kubelet/pki/kubelet-cert-current.pem`, then no action is required. |
| **DOCKER_REGISTRY_SRVC_CHECK** | NODE | Checks whether the docker-registry service is up and running. |
| **KUBELET_SRVC_CHECK** | NODE | Checks whether the kubelet service is up and running. |
| **CRI_SRVC_CHECK** | NODE | Checks whether the docker or containerd service is up and running. |
| **1M_CPU_CHECK** | NODE | Checks whether the 1 minute CPU load average is less than the defined threshold value. (90%) |
| **5M_CPU_CHECK** | NODE | Checks whether the 5 minute CPU load average is less than the defined threshold value (90%). |
| **15M_CPU_CHECK** |  | Checks whether the 15 minute CPU load average is less than the defined threshold value (90%). |
| **MEM_CHECK** | NODE | Checks whether the memory utilization is less than the defined threshold value (75%). |
| **DB_FILES_PERM_CHECK** | NODE | Checks whether the file permissions of `/data/cassandra/public` are valid: user - 999 and group - `docker`, if not then report as `“WARN”`. |
| **CONN_Q_CHECK** | NODE | Checks the number of public connections for each node using the Metrics API, if the value greater than 2000 then it flags it as `“WARN”`, else `“OK”`. Checks the backend Q size using Metrics API, if value is greater than 10000 then flag it as `“WARN”`, else `“OK”`. |
| **CAS_ACCT_CHECK** | NODE | Checks the count of rows in `account` and `account_primary` tables, if not matching then Flag it as `“WARN”`, else `“OK”`. |
| **CAS_ADMIN_ACCT_CHECK** | NODE | Checks the count of sysadmin accounts. If the count is <= 1 then `“WARN”`, else `“OK”`. |
| **SGX_CHECK** | NODE | Checks whether sdkms pod is running “enclave” process and also whether the machine has `/dev/*sgx` driver, if yes then report `“SGX SDKMS POD”`. If driver is present and sdkms pod is not running “enclave” process, then report `“NON SGX SDKMS POD ON SGX MACHINE”`. Flag it as `“WARN”` because you are supposed to run SGX deployment. |
| **NODE_CHECK** | CLUSTER | Checks whether all the nodes are in ready status `“kubectl get nodes”`. |
| **PODS_CHECK** | CLUSTER | Checks whether all pods are in Running or Completed status, if not, then flag them as `“WARN”`. |
| **JOB_CHECK** | CLUSTER | Checks whether all jobs are in “Complete” status; if not, it flags them as `“WARN”` for all running and failed jobs. The running jobs are flagged because before you start the upgrade, all jobs must be in completed status. |
| **CONTAINER_CHECK** | CLUSTER | Checks whether all containers within pods are in a `“ready”` state; if not, then flags them as `“WARN”`. |
| **IPMI_INFO_CHK** | NODE | Verifies the IPMI IP address, default gateway mac and default gateway IP are valid, if not, then flags them as `“WARN”`. |
| **CAS_REP_CHECK** | CLUSTER | Checks whether the added DC Labeling is matching with the replication strategy in Cassandra. (Please note this validation is for Network Topology and fully replicated cluster) if there is any mismatch then it flags it as `“WARN”`. If in case the DC labeling is present, but the Strategy is Simple Strategy in Cassandra, then it flags it as `“WARN”` (this is not recommended for production clusters). |
| **CAS_CERT_CHECK** | NODE | Checks whether the Cassandra pod certificate is valid if not it flags it as `“WARN”`. |
| **REPLICA_CHECK** | CLUSTER | Checks the `“replicas”` field value in the configuration map using the command : `sdkms-cluster get config --system` and compares that with `“replicas”` of deployment and statefulset. If there is no match, then it flags it as `“WARN”`, else `“OK”`. |
| **BIOS_VER_CHECK** | NODE | Verifies that the node has latest BIOS installed, if not it flags it as `“WARN”`. |
| **IMAGE_VERSION_CHECK** | CLUSTER | Validates the version value from `/etc/fortanix/sdkms_version/sdkms_version` file with the Image version of `SDKMS/SDKMS-PROXY/SDKMS-UI` deployments. If there is any mismatch, then it flags it as `“WARN”`. (This check also helps post upgrade/cluster creation) |
| **BACKUP_SETUP_CHECK** | CLUSTER | Checks if the CRON job `sdkms-backup` exists; if it is not present, it flags it as `“WARN”`. |
| **KUBEAPI_IP_CHECK** | CLUSTER | Checks whether the `kube-apiserver` manifest file has the correct IP address, if not then it flags it as `“WARN”`. |
| **SWDIST_CHECK** | CLUSTER | Checks the count of directories in the path `/var/opt/fortanix/swdist/data/` and Swdist endpoints and versions file (`/var/opt/fortanix/swdist/versions`) are matching. |
| **CAS_NODETOOL_CHECK** | CLUSTER | Executes the nodetool status command and checks whether there are any nodes which are not of the pattern `“UN”` (Up and Normal) Checks whether the count of pattern `“UN”` is matching with the Cassandra pod count. |
| **ETCD_HEALTH_CHECK** | CLUSTER | Verifies whether the etcd cluster is healthy, if not it flags it as `“WARN”`. |

> [!NOTE]
> NOTE
> 
> The execution time for node checks is 6 seconds. and for cluster checks, it is 5.5 seconds.

## 4.0 Execution Using Script

The node level and cluster level checks can be executed using the following commands:

```bash
./check-dsm-health --node [<info>|<monitor>] [--ignore-checks=check1,check2.. ]
./check-dsm-health --cluster [<info>|<monitor>] [--ignore-checks=check1,check2.. ]
```

Where,

- The first parameter, `--node` and `–cluster` determine which types of checks are to be executed.
- The second parameter takes two values:
  - `info`: This parameter turns off or disables alerting and just publishes the check status.
  - `monitor`: This parameter turns on alerting, and if there are checks with WARN status, it generates an alert.
- The third parameter, `--ignore-checks=check1,check2..` is optional and takes a list of comma-separated values. This is used in case of any known issues that require time to resolve, so you can add these checks to the ignore list so that the script does not alert for those checks.

## 5.0 Check Status

Status can take 3 values: **OK** , **WARN**, **SKIPPED**

- **OK:** No action item and it is successful.
- **WARN:** Requires attention and the user needs to check the logs created on the node `/tmp/health_checks/logs`.
- **SKIPPED:** Check was not executed because of some issue in the cluster. For instance, Cassandra Replication Strategy check requires Cassandra to be healthy to fetch details from the Cassandra pod, but if the pod is not healthy, then that check will be **SKIPPED**. Hence, it is required to check the logs to understand the reason for skipping the check.

> [!NOTE]
> NOTE
> 
> Logs older than seven days will be cleaned up by the script.

## 6.0 Examples

Configure the check on the Sensu dashboard. Click **Configuration** → **Checks** → Click **New**.

Execute the cluster check with `“monitor”` parameter using the following command:

```bash
sudo /opt/fortanix/sdkms/bin/check-dsm-health.sh --node monitor
```

Sensu dashboard snapshot:

![SENSU_DASHBOARD__NODE_CHECKS_.png](https://cdn.us.document360.io/c3bd85d2-4ad8-4d85-9f60-f1c168a3aad9/Images/Documentation/14456300480404.png)

**Figure 1: Sensu Dashboard [Node Checks]**

Execute the cluster check with `“info”` parameter using the following command:

```bash
sudo /opt/fortanix/sdkms/bin/check-dsm-health.sh –-cluster info
```

Configure a separate check to run in round robin (as these checks are cluster wide and any one node can execute the checks)

![Picture4.png](https://cdn.us.document360.io/c3bd85d2-4ad8-4d85-9f60-f1c168a3aad9/Images/Documentation/14456252229268.png)

![](https://cdn.us.document360.io/c3bd85d2-4ad8-4d85-9f60-f1c168a3aad9/Images/Documentation/image(6).png)

**Figure 2: Round Robin Cluster Checks**

If required add checks to the ignore list as below:

```bash
sudo /opt/fortanix/sdkms/bin/check-dsm-health.sh –-node monitor –-ignore-checks=NTP_CHECK
```

Fortanix Data Security Manager (DSM) is the world’s first cloud service secured with Intel® SGX. With Fortanix DSM, you can securely generate, store, and use cryptographic keys and certificates, as well as other secrets such as passwords, API keys, tokens, or any blob of data. Your business-critical applications and containers can integrate with Fortanix DSM using legacy cryptographic interfaces (PKCS#11, CNG, and JCE) or using the native Fortanix DSM RESTful interface.

Fortanix Data Security Manager (DSM) is the world’s first cloud service secured with Intel® SGX. With Fortanix DSM, you can securely generate, store, and use cryptographic keys and certificates, as well as other secrets such as passwords, API keys, tokens, or any blob of data. Your business-critical applications and containers can integrate with Fortanix DSM using legacy cryptographic interfaces (PKCS#11, CNG, and JCE) or using the native Fortanix DSM RESTful interface.

## Related

- [Port Requirements](/fortanix-data-security-manager-port-requirements.md)
- [Software Upgrade](/fortanix-data-security-manager-software-upgrade.md)
- [Software Pre-Upgrade Checks - Manual](/fortanix-data-security-manager-software-pre-upgrade-checks-manual.md)
