1.0 Introduction
A Fortanix-Data-Security-Manager (DSM) cluster is fully operational when there is a global quorum, which means that most of the nodes in the cluster (across all data centers) are available.
This article describes how Fortanix DSM operates in read-only mode when it loses global quorum.
2.0 Read-Only Mode
When Fortanix DSM loses global quorum, it may still operate in Read-Only mode if the local quorum at the data center level is still met. In Read-Only mode, Fortanix DSM will lose write access to Cassandra, while still allowing read access to all the existing entities in an account.
Data center Labeling must be configured on the Fortanix DSM cluster to achieve Read-Only mode. To set up DC Labelling for A Multi-Site Cluster refer to fortanix-data-security-manager-data-center-labeling.
Consider the following scenarios:
The nodes are equally distributed across the cluster (that is, 1+1+1 or 3+3+3). Now, the majority of data centers are unavailable (say DC1 and DC2). Nodes in DC3 will serve requests in Read-Only
The nodes are asymmetrically distributed across data centers (that is, 2+1 or 3+2). Now the data center with larger number of nodes (say DC1) is unavailable. Nodes in DC2 will serve requests in Read-Only
NOTE
If DC2 is lost, the requests will be served in full operational capability, since the global quorum is still available.
The ability to serve requests in Read-Only mode allows for business continuity, even in the scenario where larger or majority of data center is not available.
If both global and local quorums are lost, then the requests will not be served in any mode. In the example of a 2 + 1 cluster setup, if nodes from both DC1 and DC2 are lost, then none of the data centers possess a local quorum.
2.1 Operations Allowed
The following operations are allowed in the Read-Only mode:
User and App authentication. UI would be available for login.
View/List all Groups, Apps, Administrative Apps, Security Objects, and Plugins.
View any active quorum policy, crypto policy, and key custodian attached to a Group/Account.
View all audit logs.
All Crypto operations that require read access to Security Objects (Encryption, Decryption, Sign, Verify, Key wrapping).
Execution of Plugins that require only read access to Security Objects.
As a system administrator, the user should be able to view:
All the accounts present in the system.
All the users and system administrators present in the system.
All the audit logs for the system administrator account.
All the policies, settings, authentication, and log management configurations.
2.2 Operations Not Allowed
The following operations are not allowed in the Read-Only mode:
Create/Update Groups, Apps, Administrative Apps, Security Objects, and Plugins.
Rotate/Deactivate/Disable/Enable a Security-object.
Invite another user to an account.
Editing/Adding Quorum, Crypto policies, and key custodians.
Creation of approval and component requests.
Key unwrapping, derivation, and agreement operations.
Editing/Adding new SSO configurations, log management configurations, and authentication configurations to an account.
As a system administrator, the user will not be able to
View cluster status and summary.
Perform a software update for the cluster.
Invite another user to the system administrator account.
Edit/add any policy, settings, and log management configuration.As a system administrator, the user will not be able to:
Due to security reasons, the user will not be able to access DSM in read-only mode using Single Sign-On (SSO) with SAML.
2.3 Monitoring
The state of the cluster can be verified by the following health check:
The command
curl https://<dsm-endpoint>/sys/v1/health?consistency=quorum
should give a 204 HTTP response on healthy global quorum. If the response is an error, it means that the global quorum is lost, and the cluster is in read-only mode.
2.4 Recovery
Once the lost nodes are rectified, they will auto-join the current cluster on reboot and regain the quorum. Eventually, the above health check and the status of all the services (pods) should be healthy.
If nodes were replaced as part of recovery, then do the following:
Remove the existing faulty nodes from the cluster. For instructions, refer to Section 8 of the Fortanix Data Security Manager Installation Guide.
Add a new node to the cluster. Make sure the installed software version on the new node is the same as the existing cluster. For instructions, refer to Section 7 of the Fortanix Data Security Manager Installation Guide.