1.0 Introduction
This article describes how to integrate Fortanix-Data-Security-Manager (DSM) with the Databricks to enhance data security measures within the Databricks SQL warehouse. The primary goal of this integration is to enforce robust data security protocols, ensuring the confidentiality of sensitive data stored in the warehouse.
Tokenization and detokenization serve as fundamental techniques in data security, enabling the protection of sensitive data while preserving its usability.
2.0 Overview
Within this integration framework, the Fortanix DSM's vaultless tokenization is used with Databricks Notebooks and Python User-defined functions (UDFs) to facilitate tokenization and detokenization processes within the Databricks SQL warehouse.
Method 1 - Databricks Notebooks: These are essential tools for data science and machine learning workflows. They offer real-time co-authoring, automatic versioning, and built-in data visualizations. In this approach, Fortanix DSM Python Software Development Kits (SDKs) are used to connect to Fortanix DSM API endpoints within Databricks Notebooks for tokenizing and detokenizing sensitive data stored in the SQL warehouse. This ensures secure transformation of sensitive data columns in the table.
For more information, refer to the Introduction to Databricks Notebooks documentation.
Method 2 - Python User-defined functions (UDFs): They allow for secure and governed execution of Python code through SQL functions. By integrating UDFs with the Fortanix DSM API, perform tokenization operations to redact email and phone information from JSON strings, returning the redacted string and preventing unauthorized access.
For more information, refer to the Introduction to User-defined functions (UDFs) in Unity Catalog documentation.
3.0 Prerequisites
Ensure the following:
You must have a premium subscription account in the Databricks application.
To use User-defined functions (UDFs), you must have Python UDFs, which are currently in preview mode.
Fortanix DSM application is accessible. For more information, refer to Section 6.1: Signing Up and Section 6.2: Creating an Account.
4.0 Product Tested Version
Fortanix DSM version 4.30 and above.
5.0 Architecture Diagram

Figure 1: Architecture Diagram
The integration architecture is divided into two main planes: the Databricks Control Plane and the Databricks Data Plane, connected to the Fortanix DSM for secure data operations.
The Databricks Control Plane is where credentials for connecting to Fortanix DSM APIs are managed, and where SQL commands for operations on sensitive data are written. It handles configuration, authentication, and orchestration.
The Databricks Data Plane is where data resides and undergoes processing. It includes resources like SQL warehouses and data lakes. The Databricks Data Plane executes the SQL commands for tokenizing and detokenizing sensitive data.
First, in the Databricks Control Plane, Notebooks or Python UDFs connect to the Fortanix DSM using the Fortanix DSM Python SDK. Then, the SQL commands to tokenize and detokenize sensitive data in the Databricks SQL warehouse are written in the Control Plane and executed in the Data Plane.
6.0 Configure Fortanix DSM
A Fortanix DSM service must be configured, and the URL must be accessible. To create a Fortanix DSM account and group, refer to the following sections:
6.1 Signing Up
To get started with the Fortanix Data Security Manager (DSM) cloud service, you must register an account at <Your_DSM_Service_URL>. For example, https://eu.smartkey.io.
For detailed steps on how to set up the Fortanix DSM, refer to the User's Guide: Sign Up for Fortanix Data Security Manager SaaS documentation.
6.2 Creating an Account
Access the <Your_DSM_Service_URL> on the web browser and enter your credentials to log in to the Fortanix DSM.

Figure 2: Logging In
6.3 Creating a Group
Perform the following steps to create a group in the Fortanix DSM:
Click the Groups menu item in the DSM left navigation panel and click the + button on the Groups page to add a new group.
Figure 3: Add Group
On the Adding new group page, enter the following details:
Title: Enter a title for your group.
Description (optional): Enter a short description for the group.
Click the SAVE button to create the new group.
The new group has been added to the Fortanix DSM successfully.
6.4 Creating an Application
Perform the following steps to create an application (app) for the group created in the previous section:
Click the Apps menu item in the DSM left navigation panel and click the + button on the Apps page to add a new app.
Figure 4: Add Application
On the Adding new app page, enter the following details:
App name: Enter the name of your application.
Interface (optional): Select the interface type as REST API from the drop down menu.
ADD DESCRIPTION (optional): Enter a short description for the application.
Authentication method: Select the default API Key as the method of authentication from the drop down menu. For more information on these authentication methods, refer to User's Guide: Authentication documentation.
Assigning the new app to groups: Select the group created in Section 6.3: Creating a Group.
Click the SAVE button to add the new application.
The new application has been added to the Fortanix DSM successfully.
6.5 Copying the API Key
Perform the following steps to copy the API key from the Fortanix DSM:
Navigate to the Apps menu item in the DSM left navigation panel and click the app created in Section 6.4: Creating an Application to go to the detailed view of the app.
On the INFO tab, click the VIEW API KEY DETAILS button.
From the API Key Details dialog box, copy the API Key of the app to be used later.
6.6 Creating a Security Object
Perform the following steps to generate a tokenization key in the Fortanix DSM:
Click the Security Objects menu item in the DSM left navigation panel and click the + button on the Security Objects to add a security object.
Figure 5: Add Security Object
On the Add New Security Object page, enter the following details:
Security Object name: Enter the name of your security object. For example, db_name_token.
Group: Select the group as created in Section 6.3: Creating a Group.
Select the GENERATE radio button.
Choose a type: Select the Tokenization key type.
Key Size: Indicates the size of the key in bits.
Data type: Indicates the type of the security object token. For more information, refer to the User's Guide: Tokenization documentation.
Key operations permitted: Select the required operations to define the actions that can be performed with the cryptographic keys, such as encryption, decryption, signing, and verifying.
Click the GENERATE button to create the new security object.
Similarly, repeat the Steps 1 to 3 to create a security object for Email Address as well. For example, db_email_token.

Figure 6: Security Objects Added
The two new security objects have been added to the Fortanix DSM successfully.
7.0 Creating a Databricks Secret
You can use the Databricks Secret Management to store the Fortanix DSM API key, rather than hardcoding it in Notebooks or Python UDFs to ensure secure and authorized access to the API keys for Notebooks and UDFs at runtime.
Perform the following steps using Databricks CLI:
Run the following commands to create a Databricks secrets scope:
SECRETS_SCOPE_NAME="<your secret scope name>" databricks secrets create-scope $SECRETS_SCOPE_NAME
Run the following commands to list the scopes:
databricks secrets list-scopes
The output of the command will be:
Scope Backend Type hr_scope DATABRICKS
Run the following commands to add users to the scope to access the secret:
databricks secrets put-acl $SECRETS_SCOPE_NAME [email protected] MANAGE
Run the following commands to view the list of users who have the access to the secret:
databricks secrets list-acls $SECRETS_SCOPE_NAME
The output of the command will be:
[ { "permission":"MANAGE", "principal":"[email protected]" } ]
Run the following commands to add a secret:
databricks secrets put-secret $SECRETS_SCOPE_NAME FORTANIX_API_KEY --string-value "<API_KEY_VALUE>"
Run the following commands to view the list of the secrets:
databricks secrets list-secrets $SECRETS_SCOPE_NAME
The output of the command will be:
Key Last Updated Timestamp FORTANIX_API_KEY 1714076216500
Run the following commands to get the value of the secret:
databricks secrets get-secret $SECRETS_SCOPE_NAME FORTANIX_API_KEY
The output of the command will be:
{ "key":"FORTANIX_API_KEY", "value":"<base64_encoded_value>" }
Run the following commands to get
base64
decoded value of the secret:databricks secrets get-secret $SECRETS_SCOPE_NAME FORTANIX_API_KEY | jq -r '.value' | base64 -d
For more information, refer to the Access Control List documentation.
8.0 Methods
This section elaborates the steps for integrating either of the two Databricks methods as defined in Section 2.0: Overview.
8.1 Using Databricks Notebooks
Perform the following steps to create a new Databricks Notebook to import Fortanix DSM Python SDK, and define the tokenization and detokenization functions:
Log in to your Databricks Secret Management account.
From the left navigation panel, click the + NEW button → Notebook option to create a new Notebook.
Figure 7: Add Notebook
On the next screen, click the File tab → Import notebook… option from the drop down menu to import a sample DSM Notebook.
Figure 8: Import Notebook
In the Import dialog box, enter the following details:
Import from: Select the URL radio button.
Target Folder: The path to the target folder is selected by default.
URL: Enter the following URL in the provided space:
https://github.com/fortanix/databricks/blob/main/notebook/DSM_notebook.py.Figure 9: Import Dialog Box
Click the Import button to import the sample DSM Notebook file.
This Notebook fetches the API key from the Databricks secrets as configured in Section 7.0: Creating a Databricks Secret.api_key = dbutils.secrets.get(scope="hr_scope", key="FORTANIX_API_KEY")
Where,
scope
refers to the secret scope name defined in the Databricks Secret Management.FORTANIX_API_KEY
refers to the Fortanix DSM API key of the app copied in Section 6.5: Copying an API Key.
Click the Run cell option to run the Python script in the Notebook to validate if everything works.
Figure 10: Run Button
In the Attach to an existing compute resource form, select the General compute radio button and click the Start, attach and run button to execute the Notebook.
Wait for a few minutes to validate the status of the Notebook.
The Notebook will be created with the name DSM_notebook.
After the status of the Notebook is validated, you need to create a second Notebook in the account to define the cryptographic keys UUIDs and call tokenize_col
and detokenize_col
as defined in the previous steps.
Perform the following steps:
From the left navigation panel, click the + NEW button → Notebook option.
Figure 11: Add Notebook
On the next screen, click the File button → Import notebook… option from the drop down menu.
Figure 12: Import Notebook
In the Import dialog box, enter the following details:
Import from: Select the URL radio button.
Target Folder: The path to the target folder is selected by default.
URL: Enter the following URL in the provided space:
https://github.com/fortanix/databricks/blob/main/notebook/tokenize_and_detokenize_sample_notebook.py.
Click the Import button to import the new sample DSM Notebook file.
This Notebook references the DSM_notebook notebook as created in the previous section. It defines the key UUID values, and invokes the tokenization (tokenize_col
) and detokenization (detokenize_col
) functions described as follows:
These functions expect the table name, column name, and Key UUIDs as inputs.Run the following command to source the Fortanix DSM Notebook:
%run "./DSM_notebook"
Run the following command to define the Fortanix Key UUIDs:
lname_kid = "<LAST_NAME_KEY_UUID>" email_kid = "<EMAIL_ADDRESS_KEY_UUID>"
Where,
lname_kid
refers to the Fortanix DMS key UUID for the last name key as created in the Section 6.6: Creating a Security Object.email_kid
refers to the Fortanix DSM key UUID for the email key as created in the Section 6.6: Creating a Security Object.
In this sample notebook, the pre-created table has the following structure:
# "employees.data" # employee_id bigint # fname string # lname string # email string
Figure 13: Employees Data Table
Run the following command to tokenize the columns:
tokenize_col("employees","data",["lname","email"],[lname_kid,email_kid]) # Comment out calling insert_tokenizedData() from tokenize_col() in DSM_notebook if you not need another table to be created.
This sample function tokenizes the
lname
andemail
columns and creates a new table calledtokenized_<table_name>
.Run the following command to detokenize the columns by specifying the table and column names:
detokenize_col("employees","tokenized_data",["lname","email"],[lname_kid,email_kid])
Click the Run cell option to run the Python script for the second Notebook.
Figure 14: Sample Notebook
In the Attach to an existing compute resource form, select the General compute radio button and click the Start, attach and run button to execute the following functions in the Notebook:
Tokenize the selected columns, such as,
lname
,email
.Create a new table named
employeed.tokenized_data
in the Catalog.Detokenize the same columns from table
employeed.tokenized_data
.Figure 15: Employees Tokenized Data Table
8.2 Using Python User-defined functions
Perform the following steps to implement tokenization and detokenization operations in Databricks SQL warehouse using Python UDFs:
From the left navigation panel, click the SQL Editor button.
In the New query working space, paste the content from the Tokenization UDF Function file.
Click the run icon to execute the function file.
Figure 16: Run Button
The results are displayed in the Raw results section.
After the executing of the script is completed, click the + button to add a new query.
Figure 17: Create New Query
Similarly, in the New query working space, paste the content from the Detokenization UDF Function file.
Click the run icon to execute the function file.
Figure 18: Run Button
Create a new query to tokenize the email column of the
employees.data
table available in the Catalog.
Copy and paste the following commands to tokenize the email column:SELECT fortanix_tokenize( email, map( 'fortanix_api_endpoint', 'https://apac.smartkey.io', 'fortanix_api_key', secret('hr_scope', 'FORTANIX_API_KEY'), 'key_id', 'fdc25a25-f75b-4a5a-83d6-002c8ed71fba' ) ) from data;
Where,
fortanix_api_endpoint
refers to the URL endpoint for the Fortanix DSM API.fortanix_api_key
refers to the Fortanix DSM API key of the app copied in Section 6.5: Copying an API Key.secret
refers to the secret scope name defined in the Databricks Secret Management.key_id
refers to the unique identifier of the encryption key used for tokenization.
Click the run button execute the Python query to view the results of tokenized email column in the Raw results section:
Figure 19: Tokenized Query
Create a new query to detokenize the email column of the
employees.tokenized_data
table available in the Catalog.
Copy and paste the following commands to detokenize the email column:SELECT fortanix_detokenize( email, map( 'fortanix_api_endpoint', 'https://apac.smartkey.io', 'fortanix_api_key', secret('hr_scope', 'FORTANIX_API_KEY'), 'key_id', 'fdc25a25-f75b-4a5a-83d6-002c8ed71fba' ) ) from tokenized_data;
Where,
fortanix_api_endpoint
refers to the URL endpoint for the Fortanix DSM API.fortanix_api_key
refers to the Fortanix DSM API key of the app copied in Section 6.5: Copying an API Key.key_id
refers to the unique identifier of the encryption key used for tokenization.
Click the run button execute the Python query to view the results of detokenized email column in the Raw results section:
Figure 20: Detokenized Query