---
title: "Cluster Management Quick Reference"
slug: "fortanix-data-security-manager-cluster-management-quick-reference"
updated: 2026-04-01T07:32:51Z
published: 2026-03-05T05:32:29Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://support.fortanix.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Cluster Management Quick Reference

## 1.0 Introduction

This article is designed to facilitate the assessment of **Fortanix-Data-Security-Manager (DSM)** cluster management health through prechecks, commonly used commands, and troubleshooting steps for mitigating known issues within the Fortanix DSM cluster environment.

This quick reference guide is intended to be used by technical stakeholders of Fortanix DSM who will be responsible for setting up and managing Fortanix DSM clusters.

## 2.0 Fortanix DSM Cluster Management Commands

The table below provides a comprehensive list of commands used to manage Fortanix DSM cluster:

> [!NOTE]
> NOTE
> 
> Before executing `kubectl` commands, ensure the `admin.conf` file is loaded using the command:
> 
> ```bash
> export KUBECONFIG=/etc/kubernetes/admin.conf
> ```

| **TASK** | **COMMAND** |
| --- | --- |
| Verify nodes and pods status | Non root users:`sudo -E kubectl get nodes,pods -owide` Root users:`kubectl get nodes,pods -owide` |
| Verify pods status in `kube-system` namespace | `sudo -E kubectl get pods -n kube-system` |
| List Fortanix DSM pods and cassandra alone | - `sudo -E kubectl get pods -l app=cassandra -owide` - `sudo -E kubectl get pods -l app=sdkms -owide` |
| Capture pod logs | `sudo -E kubectl logs pod_name -f` |
| Capture pod logs of different namespace | `sudo -E kubectl logs pod_name -n namespace_name` |
| Get into the pod | `sudo -E kubectl exec -it pod_name bash` |
| Label the nodes | `sudo -E kubectl label node nodename` |
| Verify `nodetool` status | `sudo -E kubectl exec -it cassandra-0 -- nodetool status` |
| Verify replication strategy | `sudo -E kubectl exec cassandra-0 -- cqlsh -e "select * from system_schema.keyspaces where keyspace_name ='public'";` |
| Check the current system configuration | `sdkms-cluster get config --system` |
| Check the initial system configuration | `sdkms-cluster get config –user` |
| Create and list `Kubeadm` tokens | - `kubeadm token create` - `kubeadm token list` |
| Create a cluster | `sdkms-cluster create --self=ip_addr --config config.yaml` Where, `self_ip_address` is the IP address of the node. |
| Join node to a cluster | `sdkms-cluster join --peer=ip_address --token= --self=self_ip_address` |
| Join node to a cluster with DC labeling | Increment replication factor by 1 and run the following command: `sdkms-cluster join --peer=ip_address --token= --self=self_ip_address --label datacenter=dc_label` |
| Initiate cluster join with DC labeling | `sdkms-cluster join --peer=ip_address --token= --self=self_ip_address --label datacenter=""` |
| Reset the cluster | `sdkms-cluster reset --delete-data --reset-iptables` > [!NOTE] > NOTE > > Do not run this command if there is any active node associated with the cluster. |
| Remove the node from the cluster | `sdkms-cluster remove --force --node nodename` > [!NOTE] > NOTE > > Select the appropriate node that needs to be removed from the active cluster. |
| Remove the node from the cluster with DC labeling | `sudo sdkms-cluster remove --node &lt;node name&gt; --force` Reduce the replication factor by 1 after removal of node. |
| Re-deploy the cluster after modifying the configuration file `(config.yaml)` | `sdkms-cluster deploy --config config.yaml --stage DEPLOY` |
| Perform Fortanix DSM pods rolling restart | Navigate to `/opt/fortanix/sdkms/bin/dsm_backend_rolling_restart.sh` to restart Fortanix DSM pods. |
| View all cronjobs | `sudo -E kubectl get cronjobs` |
| Disable all cronjobs | `sudo -E kubectl get cj --no-headers \| awk '{print $1}' \| while read name; do sudo -E kubectl patch cronjob $name -p '{"spec”: {“suspend”: true}} ‘; done` |
| Enable all cronjobs | `sudo -E kubectl get cj --no-headers \| awk '{print $1}' \| while read name; do sudo -E kubectl patch cronjob $name -p '{"spec”: {“suspend”: false}} ‘; done` |

## 3.0 Stop or Restart a Fortanix DSM Cluster (Safe Shutdown)

This section describes how to safely stop a Fortanix DSM cluster node running in a Kubernetes environment.

Perform the following steps to ensure a safe and controlled shutdown of a Fortanix DSM node in the Kubernetes cluster:

1. Run the following command to prevent new pods from being scheduled on the node before shutdown:

```bash
kubectl cordon <node-name>
```

Here, `&lt;node-name&gt;` refers to the name of the Fortanix DSM node.

> [!NOTE]
> NOTE
> 
> Ensure that the cluster has met global quorum. Removing the node should not impact services.

This marks the node as unschedulable, ensuring that no new workloads are assigned.
2. Run the following command to safely move workloads from the node before shutdown:

```bash
kubectl drain <node-name> --ignore-daemonsets
```

Here, `&lt;node-name&gt;` refers to the name of the Fortanix DSM node.
3. Run the shutdown command to shut down the node:

```bash
sudo shutdown -h now
```

> [!NOTE]
> NOTE
> 
> - For hardware DSM machines, ensure Intelligent Platform Management Interface (IPMI) access is available in case of issues bringing the server online.
> - For virtual machines (VMs) hosted on ESXi/vSphere, web console access is required to power on the machine.

### 3.1 Restarting Fortanix DSM Server

You can start the Fortanix DSM node from IPMI and uncordon node to end the maintenance.

Perform the following steps to restart the Fortanix DSM server:

1. Power on the machine using the IPMI console.
2. Once the node is back online, access it using SSH.
3. Run the following command to allow scheduling on the node:

```bash
kubectl uncordon <node-name>
```

Here, `&lt;node-name&gt;` refers to the name of the Fortanix DSM node.
4. Run the following to verify the status of the node and workloads:

```bash
kubectl get nodes,pods
```

## 4.0 Fortanix DSM Prechecks

Fortanix DSM runs `run_health_checks.sh` script to analyse the cluster health status. The script is located at `/opt/fortanix/sdkms/bin/dsm_healthchecks/run_health_checks.sh`.

The following table provides a comprehensive list of Fortanix DSM cluster management prechecks handled by the above script:

| **CHECK NAME** | **CHECK TYPE** | **PURPOSE** |
| --- | --- | --- |
| **SWDIST_CHECK** | NODE | To check the discrepancy in **swdist** endpoint files and directories |
| **KUBEAPI_IP_CHECK** | NODE | To check the `kube-apiserver` IP address inconsistency |
| **CAS_ADMIN_ACCT_CHECK** | NODE | - To check the total **Sysadmin** account - To check best practices to have more than one user as **Sysadmin** |
| **1M_CPU_CHECK** | NODE | To check last minute CPU load average |
| **SWDIST_OVERLAY_SRVC_CHECK** | NODE | To verify `SWDIST_OVERLAY` is up and running |
| **SGX_CHECK** | NODE | To verify if the machine supports Software Guard Extension (SGX) technology |
| **PERM_DAEMON_SRVC_CHECK** | NODE | To verify if `PERM_DAEMON` is up and running |
| **NTP_CHECK** | NODE | To verify if Network Time Protocol (NTP) is configured |
| **MEM_CHECK** | NODE | To check system memory utilization |
| **KUBELET_SRVC_CHECK** | NODE | To verify if `Kubelet` is up and running |
| **KUBELET_CERT** | NODE | To verify `Kubelet` certificate validity |
| **KUBEAPI_SERVER_CERT** | NODE | To check `KUBEAPI SERVER` certificate validity |
| **HEALTH_CHECK_QUORUM** | NODE | To confirm the quorum status of nodes, distinguish between local and global |
| **HEALTH_CHECK_ALL** | NODE | To verify responses from all the Fortanix DSM pods in the cluster. If they pass, it returns `OK`. |
| **DOCKER_REGISTRY_SRVC_CHECK** | NODE | To verify `DOCKER_REGISTRY` is up and running |
| **DISK_CHECK_[/var]** | NODE | To verify disk space usage |
| **DISK_CHECK_[/]** | NODE |
| **DISK_CHECK_[/data]** | NODE |
| **DB_FILES_PERM_CHECK** | NODE | To verify if the `/data/Cassandra/public files` have executable permissions or not |
| **CRI_SRVC_CHECK** | NODE | To verify if CRI (Container Runtime Interface) is up and running |
| **CPU_MODEL_CHECK** | NODE | To list out the information on CPU, attestation, and Fortanix DSM appliance series type |
| **CAS_CERT_CHECK** | NODE | To verify Cassandra certificate expiry |
| **CAS_ACCT_CHECK** | NODE | To verify the discrepancy in Cassandra `account_primary` and account table |
| **CAS_EP_CHECK** | NODE | To flag if there are any stale IP addresses still part of Cassandra endpoint list |
| **CAS_TOMBSTONE_CHECK** | NODE | To validate the number of tombstones that a query or operation can generate when there are huge deletions in the cluster |
| **5M_CPU_CHECK** | NODE | To check the last 5 minutes CPU load average |
| **15M_CPU_CHECK** | NODE | To check the last 15 minutes CPU load average |
| **IPMI_INFO_CHECK** | NODE | To print IPMI configuration info such as IPMI IP address, default gateway mac address, and default gateway IP address. |
| **SWDIST_DUP_RULE_CHECK** | NODE | To verify `iptable` duplicate entries |
| **CONTAINER_CHECK** | CLUSTER | To check the container readiness |
| **BACKUP_SETUP_CHECK** | CLUSTER | To verify if the backup has been configured for the cluster |
| **REPLICA_CHECK** | CLUSTER | To verify replica counts in deployment and `configmap` |
| **PODS_CHECK** | CLUSTER | To verify the health status of the pods and their readiness |
| **NODE_CHECK** | CLUSTER | To verify if nodes are in a ready state |
| **LB_SETUP_CHECK** | CLUSTER | To verify the `LB_setup` whether it is external or Internal |
| **JOB_CHECK** | CLUSTER | To validate if the jobs are executed and completed |
| **IMAGE_VERSION_CHECK** | CLUSTER | To verify and report the Fortanix DSM version |
| **ETCD_HEALTH_CHECK** | CLUSTER | To verify if `ETCD` cluster is healthy |
| **CAS_REP_CHECK** | CLUSTER | To report replication strategy: This can be Simple or Strategy network topology |
| **CAS_NODETOOL_CHECK** | CLUSTER | To verify Cassandra `nodetool` status |
| **CONN_Q_CHECK** | CLUSTER | To calculate all public connections to the SDKMS pod |

### 4.1 Fortanix DSM Prechecks Output Status

The following are the types of Fortanix DSM precheck output status:

- **OK**: No action item and it is successful.
- **WARN**: Requires attention and the user needs to check the logs created on the node `/tmp/health_checks/` and the user must share the log details with the Fortanix Support team.
- **SKIPPED**: The check was not executed due to some issue in the cluster. For example, Cassandra Replication Strategy check requires Cassandra to be healthy to fetch details from the Cassandra pod, but if the pod is not healthy, then that check will be **SKIPPED**. Hence, it is required to check the logs to understand the reason for skipping the check. The user must share the log details with the Fortanix support team.

## 5.0 Troubleshooting

The following table lists potential causes of errors and exceptions, along with details on how to fix them for various Fortanix DSM use cases:

### 5.1 Cluster Create and Cluster Join

| **ISSUE** | **DESCRIPTION** | **RESOLUTION** |
| --- | --- | --- |
| Hostname of the node is in uppercase. Sample error log: [etcd] Waiting for the etcd pod to come up (this might take 2 minutes) Error from server: `(NotFound): pods "etcd-DEV-FRTNX01" not found `; | Fortanix DSM hostname must be in Lowercase letters. The error is due to a failure in cluster creation as it was waiting for etcd pods to become ready. | Command to set the hostname: ```bash sudo hostnamectl set-hostname newhostname ``` The user needs to reset the cluster and reinitialize cluster creation. |
| Domain name resolution sample error log: `sudo sdkms-cluster create --self=ip_address --config config.yaml sudo: unable to resolve host cslab-5 Temporary failure in name resolution [sdkms-cluster]` WARNING: BIOS version file not found. Skipping test ERROR: Error parsing `/etc/resolv.conf` | `/etc/resolv.conf` file should not be empty. | Verify `/etc/network/interfaces` for the DNS nameservers and add the same entries in `/etc/resolv.conf`. |
| [coredns] Waiting for coredns pod to be ready | Cluster creation fails when the network configuration does not have DNS nameservers configured. | Verify `/etc/network/interfaces` for the DNS nameservers and add the same entries in `/etc/resolv.conf`. |
| NTP SERVERS on the nodes is not in sync | The node joining process will fail due to missing NTP configuration and if the peer and self-node timings are not the same and this causes clock difference in the **etcd** pod. | Resolve the issue by properly configuring NTP. |
| Port requirement for intra-cluster communication between the nodes | Communication between various Kubernetes control plane components, such as the API server, scheduler, controller manager, and etcd, also occurs over specific ports. | *Refer to*[*Fortanix Data Security Manager Port Requirements*](/v1/docs/fortanix-data-security-manager-port-requirements)*to ensure all the required ports are open for communication.* |
| Access to the URLs of IAS (Intel Attestation Service) should be reachable from the joining node before initiating the `sdkms-join` process. This can manifest as `sdkms-join` pods failing to attest and therefore halting the upgrade. | DSM communicates with IAS during: - Cluster creation - Addition of new node Software upgrade | *Refer to*[*Fortanix Data Security Manager Cluster Attestation Guide (on-prem only)*](/v1/docs/fortanix-data-security-manager-cluster-attestation-guide-on-prem-only)*.* Ensure that nodes can connect to the IAS by using the following commands: ```bash nc -v iasproxy.fortanix.com 443 ``` |
| [kubelet] Waiting for node to become ready [kubelet] Installing cluster configuration ERROR: Found unreplaced IP address in `manifests/etcd.yaml` | The error message indicates that there is an unreplaced IP address in the `manifests/etcd.yaml` file. | Kindly raise the support ticket with the output of the following commands: ```bash ls -lrt /etc/kubernetes/pki/etcd ls -lrt /etc/Kubernetes cat /etc/kubernetes/bootstrap-kubelet.conf ``` |

### 5.2 POD Status

| **STATUS** | **DESCRIPTION** | **RESOLUTION** |
| --- | --- | --- |
| ERROR | Pods in an error state could be due to various reasons. Performing a detailed analysis of pod logs helps to identify the root cause. | Please raise a support ticket if you encounter any pods in the ERROR state, and kindly include the output of the following commands: ```bash sudo -E kubectl describe pod pod_name sudo -E kubectl get pods -owide sudo -E kubectl get pods -n kube-system sudo -E kubectl logs pod_name ``` |
| PENDING | When pods remain pending without being placed on any nodes, several factors could be responsible for preventing pod scheduling. A detailed analysis of logs is required to identify the underlying issues. | - Verify the status of the node to ensure the node is in a ready state using the command: ```bash sudo -E kubectl get nodes -o wide ``` - Verify the status of kube-system pods using the command: ```bash sudo -E kubectl get pods -n kube-system ``` - Describe the pod to list the errors using the command: ```bash sudo -E kubectl describe pod pod_name ``` |
| IMAGEPULLBACKOFF | When pods inside the container cannot fetch the images required, throws an `imagepullbackoff` error. | - Run the script located in `/opt/fortanix/sdkms/bin/ restart-docker-registry.sh` - Delete the pods that are in `imagepullbackoff` and it comes back healthy. If the issue persists, kindly reach out to Fortanix Support. |
| CRASHLOOPBACKOFF | A detailed analysis of pod logs is required to identify the underlying issues. | Kindly reach out to Fortanix Support with the required logs as mentioned above. |
| CREATECONFIGERROR | While pod creation, sometimes it fails to fetch the required resource. | Restart the pods using the following command: ```bash sudo -E Kubectl delete pod pod_name ``` |

### 5.3 Node Not Ready

| **ISSUE** | **DESCRIPTION** | **RESOLUTION** |
| --- | --- | --- |
| `kubectl` commands will not be accessible from the nodes that are not in the ready state. | Nodes may enter the not-ready state due to various factors. A detailed analysis is required to determine the root cause. | Kindly create the support ticket and share the output of the following commands: ```bash systemctl status kubelet journactl -fu kubelet ``` |
| Running `kubectl` commands from other nodes within the cluster report nodes that are not in a ready state as 'not ready’. | - | Kindly create the support ticket and share the output of the following commands: ```bash sudo -E kubectl get node,pods -owide sudo -E kubectl get pods -n kube-system ``` |

Fortanix Data Security Manager (DSM) is the world’s first cloud service secured with Intel® SGX. With Fortanix DSM, you can securely generate, store, and use cryptographic keys and certificates, as well as other secrets such as passwords, API keys, tokens, or any blob of data. Your business-critical applications and containers can integrate with Fortanix DSM using legacy cryptographic interfaces (PKCS#11, CNG, and JCE) or using the native Fortanix DSM RESTful interface.

## Related

- [Fortanix DSM as a KMS to Secure VMware Virtual Environments](/using-fortanix-data-security-manager-as-a-kms-to-secure-vmware-virtual-environments.md)
- [Data Center Labeling](/fortanix-data-security-manager-data-center-labeling.md)
- [CDK Protection (Non-SGX) - Fortanix DSM Version Below 5.6](/cluster-deployment-key-protection-non-sgx-dsm-version-below-5-6.md)
- [Port Requirements](/fortanix-data-security-manager-port-requirements.md)
- [High Availability Concepts](/fortanix-data-security-manager-high-availability-concepts-on-prem-only.md)
