1.0 Introduction
This article describes various monitoring and alerting procedures available in Fortanix-Data-Security-Manager (DSM).
It provides the following information about the Fortanix DSM:
Software Components
System Capabilities
Deployment
Checks and Alerts
2.0 Prerequisites
Download server artifacts from here.
3.0 Technology References
Fortanix DSM – Fortanix Data Security Manager
KMS – Key Management Service
SNMP– Simple Network Management Protocol
FQDN– Fully Qualified Domain Name
NTP– Network Time Protocol
TLS– Transport Layer Security
HSM- Hardware Security Module
HMG- HSM Management Gateway
4.0 Software Components
Fortanix DSM Monitoring and Alerting Solution includes the following open-source software components:
4.1 Monitoring Client Components
All FX2200 servers have a monitoring agent (sensu-agent) pre-packaged.
4.2 Monitoring Server Components
Alerting is based on “Sensu Go” and the following components are installed on the monitoring server
Sensu-Go backend.
Sensu Assets related to checks and handlers for notification.
5.0 Architecture
This solution is delivered as a completely self-contained virtual appliance that customers can install and set up on their own in their deployment environment. It is based on client-server architecture.
The following diagram shows the architecture of the solution:

Figure 1: Fortanix DSM Monitoring and Alerting Solution - Deployment Architecture
Fortanix DSM nodes come pre-installed with a monitoring agent:
Performs various health checks
Publish check results to the defined transport mechanism, which is then received by the Sensu server running in the solution VM.
The Fortanix DSM node just needs to be configured to point to the customer’s deployed instance of the Fortanix DSM monitoring and alerting solution.
6.0 System Capabilities
The solution shows checks and alerts data in a Web-based UI. Users can connect to the VM deployed in the customer environment to view this information. This dashboard has the following capabilities.
Show checks currently configured
Show currently active alerts in the system by node
Allow users to silence individual alerts/checks based on rules. This is useful when maintenance/upgrade is going on in the cluster.
6.1 Alerting Mechanisms
When the system detects an alert, it can deliver notification about the alert by a configured mechanism.
The solution by default supports following alerting mechanism.
Email: Will need an SMTP email configuration to deliver emails.
Slack: Will need a Slack API key to push alerts into Slack.
SNMP Trap: We will need SNMP trap receiver information to send traps to.
Custom: This is based on a shell script that the alerting server will invoke to send out alert notifications. This mechanism can be used to invoke any third-party client/executable.
6.2 Adding SNMP Trap Handler on Sensu Monitoring Server
The Sensu handler (sensu-snmp-trap-handler) sends alerts to an SNMP manager using SNMP traps. You should have an SNMP manager/trap receiver on your network to receive these traps.
Please download the artifacts from here.
SNMP handler requirements:
SNMP trap receiver FQDN hostname or IP address
SNMP trap receiver port (default is UDP port 162)
Community string (optional)
The following are the steps to add an SNMP trap handler to the Sensu Monitoring Server:
Copy the handler asset file:
sensu-snmp-trap-handler_fortanix_0.2.2_linux_amd64.tar.gz
to your web server'sdocument root
folder (/var/www/html
).Run the following script to add the SNMP trap handler asset:
./add_snmp_assets.sh
This will prompt you for your Sensu server's webserver URL. Enter the URL as http://sensu_serve_ip or http://sensuFQDN.
Edit file
snmp-handler.yml
file to add IP address or FQDN of your SNMP Manager/trap received. Edit the following line by replacing"SNMP_TRAP_RECEIVER"
with actual value:sensu-snmp-trap-handler --host SNMP_TRAP_RECEIVER
The following additional flags are supported and you can add if needed:
community string
: The SNMP Community string to use when sending traps (default "public")port int
: The SNMP manager trap port (UDP) (default 162)version string
: The SNMP version to use (1,2,2c) (default "2")
Add SNMP trap handler by running the following script:
./add_snmp_handlers.sh
MIB files are included here. If needed, copy the MIB files under the folder
"mibs"
to your SNMP Manager/Trap receiver.
7.0 Deployment
Fortanix DSM Monitoring and Alerting Solution will be delivered as a software bundle that users can deploy on their server or virtual machine.
7.1 Minimum Server Specification
2 CPUs with 2 cores each
8 GB RAM
7.2 List of Required Ports
The following ports need to be accessible on the monitoring server for WebUI and receiving notifications from the monitoring agent. The values mentioned below are default values and they can be changed.
Protocol | Port Number | Purpose |
---|---|---|
TCP | 8081 | Receive notifications from agents |
TCP | 3000 | Dashboard Web UI |
TCP | 80 | Asset download from Sensu Web Server |
7.3 Setting Up Sensu Server
The procedure described here for setting up the Sensu server requires a Red Hat machine or VM. We have tested with Red Hat 7.6 and 7.8. You will also need to have Apache installed on this machine.
Copy the server artifacts tarball (
Monitoring-Server-Artifacts.tgz
) on to your designated server/VM.Untar the tarball using the following command:
tar zxvf Monitoring-Server-Artifacts.tgz
Go to the folder
Monitoring-Server-Artifacts
.Install sensu-backend and sensu-cli packages.
sudo rpm -i sensu-go-backend-6.11.0-7218.x86_64.rpm sudo rpm -i sensu-go-cli-6.11.0-7218.x86_64.rpm
Edit and copy
backend.yml
file.cp backend.yml /etc/sensu
Start sensu-backend service.
systemctl start sensu-backend
Check the status of sensu-backend service.
systemctl status sensu-backend
Enable sensu-backend service to start automatically on reboot.
systemctl enable sensu-backend
Initialize sensu-backend service.
export SENSU_BACKEND_CLUSTER_ADMIN_USERNAME= export SENSU_BACKEND_CLUSTER_ADMIN_PASSWORD= sensu-backend init
Configure the command-line tool
Sensuctl
.sensuctl configure -n --username 'admin' --password 'P@ssw0rd!' --namespace default --url 'http://127.0.0.1:8080'
It is strongly recommended to change the default admin password.
sensuctl user change-password --interactive
Sensu also creates a default
agent
user with a passwordP@ssw0rd!
that corresponds to the defaults the Sensu agent uses.
It is strongly recommended to change the default agent password.sensuctl user change-password agent --current-password 'P@ssw0rd!' --new-password fortanix
If you have a web server of your own, then copy the
“asset tar”
files (from the foldersensu-assets
) to the “document root
” folder of your web server so it can be fetched by Fortanix servers.NOTE
If the assets are being installed on a TLS-enabled web server, then install the web server CA root and the intermediate certificates in the trust store of both your Sensu systems and the DSM nodes using the following commands:
sudo apt-get install ca-certificates -ysudo ./web_server& sudo ln -sfv /etc/sensu/tls/ca.pem /usr/local/share/ca-certificates/sensu-ca.crt sudo update-ca-certificates
If you do not have a web server of your own, then start included web server.
cd sensu-assets sudo ./web_server& cd ..
Create assets.
./add_assets.sh
Create checks.
./add_checks.sh
Create handlers.
./add_handlers.sh
Go to the Sensu dashboard and verify that all checks are present and you can log in.
http://<Sensu Server IP Address>:3000
If you want to use TLS to secure communication between the agent and server, make the following changes now.
Copy the TLS certificate, key, and CA certificate file in
/etc/sensu
.Change following in
backend.yml
api-url
– change the prefix from http to https.api-url: "https://localhost:8080"
ssl configuration
section – set the following lines (change the file name based on your files)cert-file: "/etc/sensu/cert.pem" key-file: "/etc/sensu/key.pem" trusted-ca-file: "/etc/sensu/ca.pem" insecure-skip-tls-verify: true
Restart the sensu-backend.
Access the Sensu dashboard using https://:3000. To learn how to integrate Splunk with an existing Sensu server, please refer to the article Splunk with Sensu Server Integration.
NOTE
If you are unable to access the dashboard, please make sure that port 3000 is not blocked by a firewall.
7.4 Set Up Active Directory/LDAP Authentication on the Sensu Server
The procedure described here for setting up active directory/LDAP authentication on the Sensu server.
Create a new file.
vi ad.yml
The following are the contents of the file:
type: ad api_version: authentication/v2 metadata: name: ActiveDirectory spec: groups_prefix: ad servers: - binding: password: <bind account password> user_dn: cn=<bindaccount>,ou=<group>,dc=<domain>,dc=com default_upn_domain: <domain.com> include_nested_groups: true host: <domain controller FQDN> insecure: true port: 636 security: tls trusted_ca_file: /etc/ssl/certs/downstairs-root-ca.pem user_search: attribute: sAMAccountName base_dn: <DN for root of search> name_attribute: displayName object_class: user group_search: attribute: member base_dn: ou=groups,dc=downstairs,dc=com name_attribute: cn object_class: group username_prefix: ad
Create the auth resource.
sensuctl create --file /location/ad.yml
Verify that the auth resource was created successfully.
sensuctl auth list
Log in with a user that falls within the search root.
They will be able to log in but will not see any namespaces or other resources.
Either kill
sensu-backend
service and run withoutsystemd
to watch real-time interactions, or for troubleshooting, execute the following:journalctl -xe | grep sensu
Create resource role that determines permissions.
sensuctl role create djuser --namespace sdkms --resource=checks,entities,events --verb=get,list
Create a binding between a group and a role.
sensuctl role-binding create djuser --role=djuser --group=ad:sensu --namespace sdkms
List roles.
sensuctl role list
7.5 Setup on Fortanix Servers
Run the following on each Fortanix Server:
Install the Fortanix DSM Monitoring package.
sudo apt-get install sdkms-monitoring
Copy the file
/opt/fortanix/sdkms/monitoring/agent.yml
to some location, and edit it to point to your Sensu server VM. Change the following line:backend-url: - "ws://<YOUR SERVER IP ADDRESS>:8081"
If you want to use TLS to secure communication between the agent and the server, then do the following:
Copy the CA file for the TLS certificate being used by the Sensu server to
/etc/sensu
folder.Set the following lines (change file name based on your files)
trusted-ca-file: "/etc/sensu/ca.pem"
For backend-url use the protocol prefix “
wss
” instead of “ws
”.If the certificate is self-signed whose root CA is not on Fortanix servers then add the following line:
insecure-skip-tls-verify: true
Copy the edited
agent.yml
file to the/etc/sensu
folder.sudo cp agent.yml /etc/sensu/
Start and enable sensu-agent service to start automatically on reboot.
sudo systemctl daemon-reload sudo systemctl start sensu-agent sudo systemctl enable sensu-agent
Check the status of sensu-agent service.
sudo systemctl status sensu-agent
8.0 Checks and Alerts
The checks and alerts are designed to check the health of each node (server) in the Fortanix DSM cluster and the services that each node runs. The solution by default performs the following checks and alerts. This is easily extensible and configurable. Based on customer needs, we can add additional checks and we can also customize the alert thresholds and intervals. Here is a list of checks and alerts that we currently support along with actions when an alert is triggered.
8.1 System Component: CPU
Metric: - Temperature
Threshold: - Warning & Critical
Alert Categorization: - Low
Issue Description: - This alert indicates environmental issues in the data center resulting in non-ambient temperature for appliances.
Recommended Action:
Check data center environmental controls.
If the data center temperature setting is okay, escalate to Fortanix support.
8.2 System Component: Memory
Metric: Utilization
Threshold: 80% Warning & 90% Critical
Alert Categorization: Low
Issue Description: These alerts mean that the memory utilization on the host has reached its limits. It is indicative of a high workload, and if this stays for an extended period, this indicates capacity expansion is required.
Recommended Action:
This is not necessarily an indication of failure, but indicative of high requests from clients. Please wait for at least 15 minutes to allow the temporary workloads to be completed.
Verify the process with high memory usage:
ps aux | sort -nrk 4,4 | head -n 3
If the above output contains
CassandraDaemon
orElasticsearch
or/root/enclave-runner /root/backend.sgxs
, then the issue is due to high traffic. Otherwise Please note the output and escalate to Fortanix.If due to high traffic, the alarm appears only for a few hosts, then it indicates suboptimal load balancing. Please check with Fortanix support.
If the alarm appears on many hosts, please note the client using Audit logs. Log in to the Fortanix DSM UI, and note the Application performing high transactions. Please notify the client team owning the App, to verify if this is non-standard traffic and to bring this down if possible. If this is expected traffic, then escalate to Fortanix support for capacity addition.
8.3 System Component: Disk
Metric: - Space utilization
Threshold: - 80% Warning & 90% Critical
Alert Categorization: - Low
Issue Description: - These alerts indicate the disk utilization on the host has reached its limits, which means a purge of old data is required.
Recommended Action:
This is indicative of:
Cassandra data occupying disk space limits.
Check the exact cause:
For Cassandra disk usage, run the command:
du -sh /data/cassandra
If any of the above outputs show that the disk usage is very high (in 100 Gbs), it means that the old data needs to be purged.
Stale accounts or keys: Please identify accounts and keys not being used recently, and then delete them from the Fortanix DSM UI.
If none of the above methods work, then the cause might be due to some unaccounted log file taking space. Escalate to Fortanix support for the correct identification of this log file and remediation.
8.4 System Component: NTP
Metric: - sync offset, stratum & unsynced
Threshold: - 20ms offset – Warning, 200ms offset – Critical, if stratum > 15 & NTP is not synced.
Alert Categorization: - Low
Issue Description: - This alert indicates possible failure to reach the external NTP server, which is very important for database synchronization.
Recommended Action:
Verify that the network link is up to NTP servers using ping. If network connectivity is fine, then it could indicate a failure in the service (Network Time Protocol daemon (ntpd) crashed).
To verify, run the command:
ntpq -p
: The output should show a correct sync state and reachability (The first row should have ‘*’ marked to indicate the sync connection. If none of the rows are marked, then it means that the sync is not set.
To fix the sync, run:
sudo service ntp restart
, this activity will be performed by E & T post the initial troubleshooting performed by the SPS team.
8.5 System Component: SDKMS REST API
port-check-external for sdkms-rest-api 443
sdkms-kmip-api 5696
sdkms-ui-nginx 4445
sdkms-proxy 4445
sdkms-server 443
Metric: - reachability
Threshold: - if not reachable
Alert Categorization: - Low
Issue Description: - This alert indicates service reachability failures.
Recommended Action:
Usually, intermittent failures will be recoverable, and the service will be automatically restarted. Hence, wait for at least 10 minutes for the alarm to recover.
If you are unable to recover from the failure, then please escalate to Fortanix support with the following information for easy debugging:
ssh
into one of the nodes of the cluster.Run the following command to get the status of all the pods and copy the output/contents the same:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -owide
From the above command verify if all the Fortanix DSM pods are in running state (Usually it would be 0/1 if not
READY
and 1/1 ifREADY
). Verify specifically if all the Cassandra pods are up and running. If the Cassandra pod is not inREADY
state, note down the Cassandra pod name, run the following command, and capture the output:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl logs -f sdkms-xxx-xxxxx
Get the list of jobs that ran by running the command and collect the info about the status of the jobs using the command:
kubectl get jobs
From the command that ran in ‘step b’, verify if all the Cassandra pods are up and running (Usually it would be 0/1 if not
READY
and 1/1 ifREADY
). If the Cassandra's pod is not inREADY
state, then note down the Cassandra pod name and run the following command:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl logs -f cassandra-x
8.6 System Component: api-service
Metric: - status
Threshold: - if state is down
Alert Categorization: - Low
Issue Description: - This alert indicates Fortanix DSM API service reachability failures.
Recommended Action:
Usually, intermittent failures will be recoverable, and the service will be automatically restarted. Wait for at least 10 minutes for the alarm to recover.
If it is not recovered, please escalate to Fortanix support with the following information for easy debugging.
ssh
into one of the nodes of the cluster.Run the following command to get the status of all the pods and copy the output/contents of the command:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -owide
From the command above verify if all the pods are in running state (Usually it would be 0/1 or 1/2 if not
READY
and 1/1 or 2/2 ifREADY
). Verify specifically if all the Cassandra pods are up and running. If the Cassandra pod is not inREADY
state, note down the Cassandra pod name, run the following command, and capture the output:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl logs -f sdkms-xxx-xxxxx
Get the list of jobs that have been completed by running the command and collecting the info on the status of jobs:
KUBECONFIG=/etc/kubernetes/admin.conf kubectl get jobs
From the command run in ‘step b’, verify if all the Cassandra pods are up and running (Usually it would be 0/1 if not
READY
and 1/1 ifREADY
). If Cassandra's pod is not inREADY
state, then note down the Cassandra pod name and run the following command:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl logs -f cassandra-x
8.7 System Component: Fortanix DSM-https-certificate-dsm
Metric: - days-to-expire
Threshold: - 60 days – Warning & 15 days – Critical
Alert Categorization: - Low
Issue Description: - This alert indicates certificate expiration in near future and acts as a reminder for certificate renewal.
Recommended Action:
Renew the certificates. SSH in any of the machines in the cluster.
Generate new CSR by running the command
"sudo get_csrs"
and install new certs usingsudo install_certs
.
8.8 System Component: Fortanix DSM-https-certificate-dsm-ui
Metric: - days-to-expire
Threshold: - 60 days – Warning & 15 days – Critical
Alert Categorization: - Low
Issue Description: - This alert indicates certificate expiration in the near future and acts as a reminder for certificate renewal.
Recommended Action:
Renew the certificates. SSH in any of the machines in the cluster.
Generate new CSR by running the command
sudo get_csrs
and install new certs usingsudo install_certs
.
8.9 System Component: Fortanix DSM-Cassandra-cluster
Metric: - schema-status
Threshold: - Bad nodes >1 - Critical
Alert Categorization: - Low
Issue Description: - This alert indicates the cluster status of Cassandra is Bad for the connected nodes
Recommended Action:
Usually, intermittent failures will be recoverable, and the service will be automatically restarted. Hence wait for at least 10 minutes for the alarm to recover.
If the cluster state is reported as yellow, then it is just a warning and no immediate action is required.
If the cluster state is reported as red and does not recover automatically, then it is critical and escalation to Fortanix support is required. In case the cluster state is in red, the following steps would help in debugging the problem.
ssh
into one of the nodes of the cluster.Run the below command to get the status of all the pods, and then copy the output/contents of the following commands:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -owide
From the command above verify if all the pods are in running state (Usually it would be 0/1 or 1/2 if not
READY
and 1/1 or 2/2 ifREADY
). Verify specifically if all the Cassandra pods are up and running. If the Cassandra pod is not inREADY
state, note down the Cassandra pod name run the following command, and capture the output:sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl logs -f cassandra-x
Go into the Cassandra pod that is failing by running the command:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl exec -ti cassandra-0 bash
On the Cassandra pod terminal, run the following command to get a list of nodes and capture the output:
nodetool status
9.0 Monitoring and Alerting HSM Management Gateway
This section describes the process of adding checks on the Sensu monitoring server to monitor HSM gateway instances.
These checks are run on the Sensu server using an agent installed on the Sensu server.
9.1 Setting Up Sensu Agent on Sensu Server
If you do not have a Sensu agent installed on your Sensu server then follow the instructions in this section to set up a Sensu agent on the Sensu server for running checks for the HSM gateway instances. If you already have the Sensu agent setup then skip step 1 below.
Download and install the Sensu agent on the Sensu server.
Download the file
sensu-go-agent-6.11.0-7218.x86_64.rpm
from the URL http://sensu.io/downloads.Install the Sensu go agent using the following command:
sudo yum install sensu-go-agent-6.11.0-7218.x86_64.rpm
Copy the following content to the file
/etc/sensu/agent.yml
and edit it as explained below:# Sensu agent configuration ## # agent overview ## #name: "hostname" #namespace: "default" subscriptions: - sdkms-monitoring #labels: #example_key: "example value" #annotations: #example/key: "example value" ## # agent configuration ## backend-url: - "wss://127.0.0.1:8081" #cache-dir: "/var/cache/sensu/sensu-agent" #config-file: "/etc/sensu/agent.yml" #log-level: "warn" # available log levels: panic, fatal, error, warn, info, debug ## # api configuration ## #api-host: "127.0.0.1" #api-port: 3031 #disable-api: false #events-burst-limit: 10 #events-rate-limit: 10.0 ## # authentication configuration ## user: "USER_AGENT_NAME" password: "AGENT_PASSWORD" ## # monitoring configuration ## #deregister: false #deregistration-handler: "example_handler" #keepalive-timeout: 120 #keepalive-interval: 20 ## # security configuration ## insecure-skip-tls-verify: true #redact: # - password # - passwd # - pass # - api_key # - api_token # - access_key # - secret_key # - private_key # - secret trusted-ca-file: "/etc/sensu/ssl/ca.pem" ## # socket configuration ## #disable-sockets: false #socket-host: "127.0.0.1" #socket-port: 3030 ## # statsd configuration ## #statsd-disable: false #statsd-event-handlers: # - example_handler #statsd-flush-interval: 10 #statsd-metrics-host: "127.0.0.1" #statsd-metrics-port: 8125
Edit the following lines:
backend-url
- If not using TLS, then change the value to "ws://127.0.0.1:8081
".trusted-ca-file
- If not using TLS, comment it out. If using TLS, make sure the path points to the CA certificate.Replace
USER_AGENT_NAME and AGENT_PASSWORD
with appropriate values for your setup.
Download and copy the following asset files in your web servers root folder (for example
/var/www/html
).https://assets.bonsai.sensu.io/a2115474fe198f3895b953f6d90de86607f33722/sensu-plugins-network-checks_5.0.0_centos7_linux_amd64.tar.gz
https://assets.bonsai.sensu.io/102d2b8c9dc264b98fa7973bf7657e9216bfb0a8/sensu-ruby-runtime_0.1.0_ruby-2.4.4_centos7_linux_amd64.tar.gz>
Create assets.
Create a file
sdkms-monitoring-rhel-assets.yml
with the following content and replace "SENSU_SERVER_IP
" with your server's IP address or FQDN.type: Asset api_version: core/v2 metadata: annotations: io.sensu.bonsai.api_url: https://bonsai.sensu.io/api/v1/assets/sensu-plugins/sensu-plugins-network-checks io.sensu.bonsai.name: sensu-plugins-network-checks io.sensu.bonsai.namespace: sensu-plugins io.sensu.bonsai.tags: ruby-runtime-2.4.4 io.sensu.bonsai.tier: Community io.sensu.bonsai.url: https://bonsai.sensu.io/assets/sensu-plugins/sensu-plugins-network-checks io.sensu.bonsai.version: 5.0.0 name: sensu-plugins-network-check-rhel namespace: default spec: builds: null filters: - entity.system.os == 'linux' - entity.system.arch == 'amd64' - entity.system.platform_family == 'rhel' headers: null sha512: f0a229918245d2156fcc34e272cb351d09f3d7ee79057cccaa88121d837723951c816593104ff959528b0dec7f18901b6735f7b7cf765ddcce85c6fdbb559378 url: http://SENSU_SERVER_IP/sensu-plugins-network-checks_5.0.0_centos7_linux_amd64.tar.gz --- type: Asset api_version: core/v2 metadata: annotations: io.sensu.bonsai.api_url: https://bonsai.sensu.io/api/v1/assets/sensu/sensu-ruby-runtime io.sensu.bonsai.name: sensu-ruby-runtime io.sensu.bonsai.namespace: sensu io.sensu.bonsai.tags: "" io.sensu.bonsai.tier: Community io.sensu.bonsai.url: https://bonsai.sensu.io/assets/sensu/sensu-ruby-runtime io.sensu.bonsai.version: 0.1.0 name: sensu-ruby-runtime-rhel namespace: default spec: builds: null filters: - entity.system.os == 'linux' - entity.system.arch == 'amd64' - entity.system.platform_family == 'rhel' headers: null sha512: 2d7800432f90625a02aec4a10b084bc72e253572970694e932b5ccdc72fb30f5cf91ed4b51f90942965df5228e521b8f5f06da3d52b886b172ba08d4130251dc url: http://SENSU_SERVER_IP/sensu-ruby-runtime_0.1.0_ruby-2.4.4_centos7_linux_amd64.tar.gz
Run the following command:
sensuctl create --file sdkms-monitoring-rhel-assets.yml
Add checks.
Create a file
sdkms-hmg-monitoring-checks.yml
with the following content.type: Check api_version: core/v2 metadata: name: sdkms-hmg-check namespace: default spec: check_hooks: null command: 'check-port.rb -H HMG_IP -p 4442' env_vars: null executed: 0 handlers: - YOUR_HANDLER_NAME high_flap_threshold: 0 history: null interval: 60 issued: 0 last_ok: 0 low_flap_threshold: 0 occurrences: 0 occurrences_watermark: 0 output: "" output_metric_format: "" output_metric_handlers: null proxy_entity_name: "" publish: true round_robin: false runtime_assets: - sensu-plugins-network-check-rhel - sensu-ruby-runtime-rhel status: 0 stdin: false subdue: null subscriptions: - sdkms-monitoring timeout: 0 total_state_change: 0 ttl: 0
In the above content:
Replace "
HMG_IP
" with the FQDN or IP addresses of your HMG instance. For more than one HMG IP address you can specify them as comma-separated values.The default value of the HMG port is
4442
. This is set up in the check definition file. If your HMG instance is running on a different port, then replace the port number.Replace “
YOUR_HANDLER_NAME
” with the name of the handler in your environment. To check the handler name run the following command:sensuctl handler list
Run the following command using the file created in step a.
sensuctl create --file sdkms-hmg-monitoring-checks.yml
List the checks to verify the new checks have been added.
sensuctl check list
Start the agent using the following commands.
sudo systemctl daemon-reload sudo systemctl enable sensu-agent sudo systemctl start sensu-agent sudo systemctl status sensu-agent
10.0 Fortanix Data Security Manager Metrics
Starting with Fortanix DSM version 3.21 metrics data will be available in Prometheus format that can be scraped by a Prometheus server and visualized using a tool like Grafana.
In Fortanix DSM version 3.21, we provide two categories of metrics time series data on each node. Currently, each category publishes the following metrics data. Later versions will add more data in each category.
Node Metrics
CPU Usage
Load Average
Memory usage
Disk I/O statistics
Filesystem usage
Network usage
DSM Metrics
Number of active connections
Public (port 443)
KMIP (port 5969)
Internal Admin (port 4444)
Logging Backlog queue length
Elasticsearch
Splunk
Other log integrations
10.1 DSM Monitoring Package Installation
If the Fortanix DSM monitoring package has not been installed yet, then install the monitoring package by running the following command on each Fortanix DSM node.
sudo apt-get install sdkms-monitoring
If this package was already installed before upgrading to 3.21, then this step is not required.
10.2 Setup
To enable publishing metrics information, you need to enable and run a few services. Use the following commands to enable and start these services:
sudo cp /opt/fortanix/sdkms/monitoring/node_exporter.default /etc/default/node_exporter
sudo cp /opt/fortanix/sdkms/monitoring/sdkms_exporter.default /etc/default/sdkms_exporter
sudo systemctl enable node-exporter
sudo systemctl start node-exporter
sudo systemctl status node-exporter
sudo systemctl enable sdkms-metrics
sudo systemctl start sdkms-metrics
sudo systemctl status sdkms-metrics
10.3 Configuring TLS for Metrics Collection
If you want to use TLS in your metrics collection endpoint, you can configure the “sdkms-metrics” to use TLS. Please follow the instructions mentioned below to setup TLS:
Get a TLS private key and certificate that you will be using for this service and save the file in any location. Both the certificate and private key should be in PEM format. We recommend storing it in the folder “/opt/fortanix/sdkms/monitoring/
”
Edit the service file “/etc/systemd/system/sdkms-metrics.service
” to change the “ExecStart
” as follows.
ExecStart=/opt/fortanix/sdkms/monitoring/exporter_exporter -
config.file /opt/fortanix/sdkms/monitoring/sdkms_exporter.yml -
web.tls.cert /opt/fortanix/sdkms/monitoring/CERT_FILENAME -
web.tls.key /opt/fortanix/sdkms/monitoring/KEY_FILENAME -
web.tls.listen-address :9998
NOTE
Replace
CERT_FILENAME
andKEY_FILENAME
with the name of the file where you stored the certificate and private key respectively.In the example above we have used port
9998
. You can change it to any port number you want.
sudo systemctl daemon-reload
sudo systemctl restart sdkms-metrics.service
10.4 Metrics Collection Endpoints
Metrics data is published on the following endpoints by default.
NOTE
If you are using TLS, then please change the endpoint URL to use “https” and the corresponding port number.
10.4.1 Node Metrics
http://NODE_IP_ADDRESS:9999/proxy?module=node
Example Data
This is the data available using Prometheus Node/system metrics exporter.
10.4.2 DSM Metrics
http://NODE_IP_ADDRESS:9999/proxy?module=sdkms
Example Data
# HELP es_backlog Number of pending ES documents
# TYPE es_backlog gauge
es_backlog 0
# HELP other_log_integrations Number of pending audit logs
# TYPE other_log_integrations gauge
other_log_integrations 0
# HELP kmip_connections Number of active kmip connections
# TYPE kmip_connections gauge
kmip_connections 0
# HELP splunk_queue_len Number of pending Splunk log events
# TYPE splunk_queue_len gauge
splunk_queue_len 0
# HELP splunk_pending_logs Number of pending Splunk logs
# TYPE splunk_pending_logs gauge
splunk_pending_logs 0
# HELP admin_connections Number of active admin connections
# TYPE admin_connections gauge
admin_connections 1
# HELP public_connections Number of active public connections
# TYPE public_connections gauge
public_connections 1
10.5 Prometheus Configuration
You can add jobs to your existing Prometheus configuration to collect these metrics. Here is an example of how to add jobs to scrape metrics from Fortanix DSM Node. Fill in targets based on your deployment.
- job_name: 'node_metrics'
scrape_interval: 300s
metrics_path: /proxy
params:
module:
- node
static_configs:
- targets: ['NODE1_IP:9999']
- targets: ['NODE2_IP:9999']
- targets: ['NODE3_IP:9999']
- job_name: 'sdkms_metrics'
scrape_interval: 60s
metrics_path: /proxy
params:
module:
- sdkms
static_configs:
- targets: ['NODE1_IP:9999']
- targets: ['NODE2_IP:9999']
- targets: ['NODE3_IP:9999']
10.6 Visualization
If you are using the Prometheus server to collect these metrics data, you can use Grafana to visualize the data.
For node metrics, use the “Node Exporter” dashboard to visualize the data and customize it as needed.
Here is an example dashboard using Grafana Node Exporter.

Figure 2: Visualization
For Fortanix DSM metrics, you can create your own dashboard using the collected data. For example, to visualize the number of active connections, you can use data “public_connections” and create a dashboard as shown below:

Figure 3: Dashboard