1.0 Introduction
This article describes the Fortanix-Data-Security-Manager (DSM) backup and restore procedures such as:
High Availability
Disaster Recovery Plan
Configuring Backup
Data Recovery
2.0 High Availability
Fortanix DSM provides a highly available service if the majority of nodes in a cluster are active. An ideal multi-site deployment of Fortanix DSM would cover at least 3 data centres (Availability Zones). KMS cluster, so the service remains available. See the following picture for a proposed architecture.

Figure 1: A Typical Fortanix DSM Deployment
3.0 Disaster Recovery Plan (Fortanix DSM Version 3.15 onwards)
Disaster may be defined as a situation where the entire production Fortanix DSM cluster goes down. Due to the built-in HA capabilities of Fortanix DSM, this is an extremely unlikely event. Some extreme scenarios which may lead to a disaster include a region-wide weather event, a complete network failure or DDOS attack, or a software bug that leads to unrecoverable data corruption. Fortanix provides a DR plan to recover from some scenarios as explained below. These steps can be practiced periodically to test DR readiness.
Designate one Fortanix DSM node in the Out-of-region (OOR) data centre for DR. You can call this node “DR node”.
Make the DR node join the production cluster. During this process, the DR node obtains attestation, gets a copy of the cluster master key (CMK), and then encrypts the CMK with its sealing key (derived specifically to that node and a version of the Fortanix DSM software), and stores it in the database.
Obtain the join token by running the following commands on an existing node of the cluster. The token for the new node to join the cluster is generated at the time of cluster creation. This token is valid for 24 hours. If you do not see any valid token from the output of “
kubeadm
token list”, then run the following command to generate a new token:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubeadm token list // Optional, required only when the token has expired in case of adding a DR node after the cluster was created. sudo KUBECONFIG=/etc/kubernetes/admin.conf kubeadm token create
Run the following command on the DR node to join the cluster:
RECONFIGURE sudo sdkms-cluster join --peer=10.0.1.4/24 --token=fc853c.6fd193ba34870ddfGO
Configure backup in Fortanix DSM to take periodic backups of the Cassandra database of the production cluster. Refer to the Section 5.0: References to know the procedure for configuring the backups.
Store the backups of Cassandra database in OOR data centre.
Set aside the DR nodes so that if the production cluster goes down in the future, you can restore the cluster from the backup. Run the following command from any node in the cluster to remove the DR node from the cluster:
sudo sdkms-cluster remove --node <DR node name>
After the DR node is successfully removed from the cluster, log in to the DR node and run the following command to clean up data on this DR node:
sudo sdkms-cluster reset –-delete-data
NOTE
This command must be run on the removed DR node after the node has been removed successfully. You must not run this command on any existing node of the cluster or on the DR node if it is still part of the cluster.
You must bootstrap the Fortanix DSM cluster on this DR node prior to restoring the data. Run the following command to create a cluster:
sudo sdkms-cluster create --self=<server ip address/subnet mask> --config ./config.yaml
Perform the steps to restore the backups. Refer to the Section 5.0: Reference Documents to know the procedure for restoring the backups.
The DR node should now have the production data. Add 2 or more Fortanix DSM nodes to the DR node to create a fault tolerant Fortanix DSM cluster.
Figure 2: Disaster Recovery Using a Node in Out-of-Region Data Center
4.0 Backup Sizing
The backups that are configured are full backups, and each time a backup is performed it will consume space based on the total size of objects.
For estimation, use the following guideline:
The size of a database backup for a cluster is 800MB with the following objects:
100, 000 security objects
10,000 groups
2000 apps
1000 accounts
5.0 References
Refer to the following articles to know the backup and restore procedures on SGX and non-SGX machines:
Fortanix DSM Password-Based Backup and Restore Using SCP -SGX
Fortanix DSM Passwordless SSH Key-Based Backup and Restore Using SCP - SGX