Workflow Applications Using Fortanix Enclave OS - AWS Nitro

1.0 Introduction

This article describes how to create and run Nitro Workflow on an Amazon Web Service (AWS) node.

2.0 Prerequisites

Ensure that you have enrolled a compute node using AWS Nitro on Amazon Linux. For more information, refer to User's Guide: Enroll a Compute Node Using AWS Nitro on Amazon Linux.

3.0 Create Input and Output Datasets

Datasets are the definitions containing the location and access credentials of the data that allow the enclave OS in the Workflow to download the data and upload the data. In this example, we use the AWS S3 bucket to:

  • Store an encrypted file that will be downloaded and decrypted by the enclave OS.

  • Upload an encrypted file using the credentials provided in the dataset.

  • Create an S3 bucket with a directory that is accessible using a URL. 

3.1 Input User (Data Provider)

Consider that the Data Owner user has access to sensitive information and wants to allow an Application Owner to process this information.
This sensitive data is stored in a conditions.csv file.

In this example, the file is encrypted, uploaded to a storage solution (AWS S3), and a dataset is configured with credentials and an encryption key for enclave access or processing:

  1. If you have not already, download and untar the tarball file below:

    workflow-scripts
    2.88 KB

    tar xvfz workflow-scripts.tar.gz
  2. Obtain a copy of the CSV sample data (also available here: https://synthea.mitre.org/downloads):

    wget https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_csv_apr2020.zip
    unzip synthea_sample_data_csv_apr2020.zip

    You will only be using the file csv/conditions.csv to encrypt this file, generate a key locally, encrypt the file and store the key in a KMS securely:

    1. Run the following command to generate an encryption key:

      xxd -u -l 32 -p /dev/random | tr -d '\n' > ./key.hex
    2. Use aes_256_gcm.py script included in the tar file that you downloaded. Run the following command to encrypt the file:

      ./workflow_scripts/aes_256_gcm.py enc -K ./key.hex -in ./csv/conditions.csv -out ./conditions.csv.enc
  3. Run the following command to upload the encrypted file to a secure storage location such as AWS S3:

    aws s3 --profile upload cp ./conditions.csv.enc <s3-directory> 
  4. Generate a pre-signed URL to access the file to avoid inserting the whole AWS SDK in the example:
    Use the presign.py script included in the tar file and run the following command:

    ../presign.py default download <s3-directory> conditions.csv.enc 86400

    Where, S3-directory is the directory of your S3 bucket directory.
    As an example, the output from the above command will be as shown below:

    https://fortanix-pocs-data.s3.amazonaws.com/conditions.csv.enc?AWSAccessKeyId=&Signature=PcpH99nszG2%2Fv85z4IbgwgVDywc%3D&Expires=1613817035

    The output consists of two parts:

    • The location - https://fortanix-pocs-data.s3.amazonaws.com/conditions.csv.enc

    • Query parameters: AWSAccessKeyId=&Signature=PcpH99nszG2%2Fv85z4IbgwgVDywc%3D&Expires=1613817035
      The query parameters must be base64 encoded for the dataset definition using the following command:

      echo -n ‘AWSAccessKeyId=<key>&Signature=PcpH99nszG2%2Fv85z4IbgwgVDywc%3D&Expires=1613817035’ | base64The output will be as follows:
      QVdTQWNjZXNzS2V5SWQ9QUtJQVhPNU42R0dOQ05WMzUzV1MmU2lnbmF0dXJlPVBjcEg5OW5zekcyJTJGdjg1ejRJYmd3Z1ZEeXdjJTNEJkV4cGlyZXM9MTYxMzgxNzAzNQ==

    At this point, users accessing the URL above with full query string parameters will be able to download the encrypted file until it expires in 1 day. If the URL above expires, the dataset will need to be updated with new query parameters

    NOTE

    If you access the URL without the string following '?', you will get a 403 forbidden.

    Hence, treat the query parameters as access credentials.

Perform the following steps to create an input dataset:

  1. Click the Datasets menu item in the CCM UI left navigation bar and click the CREATE DATASET button to create a new dataset. 

    dataset-landing-screen.png

    Figure 1: Create a New Dataset

  2. In the Create new dataset form, enter the following details:

    • Name – Enter the name of dataset. For example: Conditions Data.

    • Description (optional) – Enter the description of the dataset. For example: Patients with associated conditions.

    • Labels (optional) – Attach one or more key-value labels to the dataset. For example: Key: Location and Value: East US

    • Group – Select the required group name from the drop down menu to associate this dataset with that group.

    • Location – The AWS S3 URL where data can be accessed. For example: https://fortanix-pocs-data.s3.amazonaws.com/conditions.csv.enc

    • Long Description (optional) – Enter the content in GitHub-flavoured Markdown file format. You can also use the Fetch Long Description button to get the Markdown file content from an external URL.

      Fetch Long Description Dialog Box.png

      Figure 2: Fetch Long Descripton Dialog Box

      The following is the sample long description in Markdown format:

      - Strikethrough Text
      ~~It is Strikethrough test..~~
      
      - Blockquote Text
      > Test Blockquote.
      
      - Bold
      **A sample Description.**
      
      - Italic
      *It is Italics*
      
      - Bold and Italic
      ***Bold and Italics text***
      
      - Link
      This is [Website](https://www.fortanix.com/)?
    • Credentials – the credentials needed to access the data. The credentials must be in the correct JSON format and consist of:

      • Query parameters that were base64 encoded.

      • The key that was used to encrypt the file.

      {
        "query_string": "<my-query-string>",
        "encryption": {
          "key": "<my-key>"
        }
      }

      For example:

      {
        "query_string": "QVdTQWNjZXNzS2V5SWQ9QUtJQVhPNU42R0dOQ05WMzUzV1MmU2lnbmF0dXJlPVBjcEg5OW5zekcyJTJGdjg1ejRJYmd3Z1ZEeXdjJTNEJkV4cGlyZXM9MTYxMzgxNzAzNQ==",
        "encryption": {
          "key": "63F0E4C07666126226D795027862ACC5848E939881C3CFE8CB3EB47DD7B3D24A"
        }
      }

      TIP

      Before saving the dataset, it is a good idea to verify that the JSON is correct. After saving the dataset you will not be able to view the credentials and access to data may fail. Any online JSON formatting tool can be used to validate that the JSON is correct.

      NOTE

      • The credentials are only passed as text when creating the dataset over an HTTPS connection.

      • It is then stored in a KMS (Fortanix Data Security Manager) and only accessible to approved enclaves.

      • Not even the Data Owner can retrieve the credentials.

    create-dataset-updated-field.png

    Figure 3: Create Input Dataset

  3. Click the CREATE DATASET button to create the input dataset.

3.2 Output User (Data Receiver)

Once the data has been received by the Enclave, the user application will run within the Enclave and generate output data (processed data). This data should be encrypted (using your key) before being uploaded to an untrusted store. This is achieved by defining an output dataset to be used by the Workflow.
Perform the following steps:

  1. Run the following command to generate an encryption key:

    xxd -u -l 32 -p /dev/random | tr -d '\n' > ./key_out.hex
  2. Use the presign.py script included in the tar file. Run the following command to generate a presign URL for the upload:

    ././presign.py default upload <s3-directory> conditions_output.csv.enc 86400

    Where s3-directory is the directory of your S3 bucket directory.
    As an example, the output of the above command will be as shown below:

    https://fortanix-pocs-data.s3.amazonaws.com/conditions_output.csv.enc?AWSAccessKeyId=<key>Signature=HFvhxaiKY0cGR9XqgGLp5zcAWac%3D&Expires=1613817880

    The output consists of two parts:

    • The location: https://fortanix-pocs-data.s3.amazonaws.com/conditions_output.csv.enc

    • Query parameters: AWSAccessKeyId=<key>Signature=HFvhxaiKY0cGR9XqgGLp5zcAWac%3D&Expires=1613817880
      The query parameters must be base64 encoded for the dataset definition using the following command:

      echo -n ‘AWSAccessKeyId=<key>Signature=HFvhxaiKY0cGR9XqgGLp5zcAWac%3D&Expires=1613817880’ | base64
    • The output will be as follows:

      QVdTQWNjZXNzS2V5SWQ9QUtJQVhPNU42R0dOTk1SWFZLUEEmU2lnbmF0dXJlPUhGdmh4YWlLWTBjR1I5WHFnR0xwNXpjQVdhYyUzRCZFeHBpcmVzPTE2MTM4MTc4ODA=
  3. Create an output dataset with the following sample values:

    • Name – Enter the name of the dataset. For example: Conditions processing output.

    • Description (optional) – Enter the description of the dataset. For example: Patients with associated conditions.

    • Labels (optional) – Attach one or more key-value labels to the dataset. For example: Key: Location and Value: East US.

    • Group – Select the required group name from the drop down menu to associate this dataset with that group.

    • Location – The AWS S3 URL where data can be accessed. For example: https://fortanix-pocs-data.s3.amazonaws.com/conditions_output.csv.enc 

    • Long Description (optional) – Enter the content in GitHub-flavoured Markdown file format. You can also use the Fetch Long Description button to get the Markdown file content from an external URL.

      Fetch Long Description Dialog Box.png

      Figure 4: Fetch Long Description Dialog Box

      The following is the sample long description:

      - Strikethrough Text
      ~~It is Strikethrough test..~~
      
      - Blockquote Text
      > Test Blockquote.
      
      - Bold
      **A sample Description.**
      
      - Italic
      *It is Italics*
      
      - Bold and Italic
      ***Bold and Italics text***
      
      - Link
      This is [Website](https://www.fortanix.com/)?
    • Credentials – the credentials needed to access the data. The credentials must be in the correct JSON format and consist of:

      • Query parameters that were base64 encoded.

      • The key that was used to encrypt the file.

        {
          "query_string": "<my-query-string>",
          "encryption": {
            "key": "<my-key>"
          }
        }

        For example:

        {
          "query_string": "QVdTQWNjZXNzS2V5SWQ9QUtJQVhPNU42R0dOTk1SWFZLUEEmU2lnbmF0dXJlPUhGdmh4YWlLWTBjR1I5WHFnR0xwNXpjQVdhYyUzRCZFeHBpcmVzPTE2MTM4MTc4ODA=","encryption": {
            "key": "63F0E4C07666126226D795027862ACC5848E939881C3CFE8CB3EB47DD7B3D24A"
          }
        }

        TIP

        Before saving the dataset, it is a good idea to verify that the JSON is correct. After saving the dataset you will not be able to view the credentials and access to data may fail. Any online JSON formatting tool can be used to validate that the JSON is correct.

        NOTE

        • The credentials are only passed as text when creating the dataset over an HTTPS connection.

        • It is then stored in a KMS (Fortanix Data Security Manager) and only accessible to approved enclaves.

        • Not even the Data Owner can retrieve the credentials.

create-dataset-updated-field.png

Figure 5: Create Output Dataset

3.3 Create a General Purpose Python Docker Image

Create a docker image that will run arbitrary protected Python code. The following files are used. These files are included in the tar file provided with this example:

Dockerfile:

FROM python:3.6

RUN apt-get update && apt-get install -y python3-cryptography python3-requests python3-pandas
RUN mkdir -p /opt/fortanix/enclave-os/app-config/rw
RUN mkdir -p /demo/code /demo/input

COPY ./start.py ./utils.py ./describe.py /demo/

CMD ["/demo/start.py"]

start.py: This file is the main entry point into the application.

!/usr/bin/python3
  
import os
import utils
import hashlib
from subprocess import PIPE, run

def main():
    input_folder="/opt/fortanix/enclave-os/app-config/rw"
    
    command = ["/usr/bin/python3", "/demo/describe.py"]
    
    # This downloads and decrypts all input data. File names are the object names from app config.
    for i in utils.read_json_datasets("input"):
        decrypted = utils.get_dataset(i)
        open(input_folder + i.name, 'wb').write(decrypted)
        
        # Add the file as input argument for our script
        command.append(input_folder + i.name)
        
    print("Running script")
    result = run(command, stdout=PIPE, stderr=PIPE, universal_newlines=True)
    
    # For simplicity uploading just stdout/stderr/returncode.
    utils.upload_result("output", result.returncode, result.stdout, result.stderr)
    
    print("Execution complete")

if __name__ == "__main__":
    main()

utils.py: This file contains the set of library functions.

!/usr/bin/env python3

import os
import sys
import string
import json
import requests
import base64
import hashlib
from subprocess import PIPE, run

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.ciphers import (
    Cipher, algorithms, modes
)

NONCE_SIZE=12
TAG_SIZE=16
KEY_SIZE=32

PORTS_PATH="/opt/fortanix/enclave-os/app-config/rw/"

def convert_key(key_hex):
    key=key_hex.rstrip();
    if len(key) != 64:
        raise Exception("Key file must be 64 bytes hex string for AES-256-GCM")

    if not all(c in string.hexdigits for c in key) or len(key) != 64:
        raise Exception("Key must be a 64 character hex stream")

    return bytes.fromhex(key)

def encrypt_buffer(key, data):
    nonce=os.urandom(NONCE_SIZE)

    cipher = Cipher(algorithms.AES(key), modes.GCM(nonce), backend=default_backend())
    encryptor = cipher.encryptor()

    return nonce + encryptor.update(data.encode()) + encryptor.finalize() + encryptor.tag

def decrypt_buffer(key, data):
    tag=data[-TAG_SIZE:]
    nonce=data[:NONCE_SIZE]

    cipher = Cipher(algorithms.AES(key), modes.GCM(nonce, tag), backend=default_backend())
    decryptor = cipher.decryptor()

    return decryptor.update(data[NONCE_SIZE:len(data)-TAG_SIZE]) + decryptor.finalize()

class JsonDataset:
    def __init__(self, location, credentials, name):
       self.location = location
       self.credentials = credentials
       self.name = name

def read_json_datasets(port):
    ports=[]
    for folder in os.listdir(PORTS_PATH + port):
        subfolder=PORTS_PATH + port + "/" + folder
        if os.path.exists(subfolder + "/dataset"):
            credentials=json.load(open(subfolder + "/dataset/credentials.bin", "r"))
            location=open(subfolder + "/dataset/location.txt", "r").read()
            ports.append(JsonDataset(location, credentials, folder))

    return ports

def get_dataset(dataset):
    url = dataset.location + "?" + base64.b64decode(dataset.credentials["query_string"]).decode('ascii')
    r = requests.get(url, allow_redirects=True)
    r.raise_for_status()

    print("Retrieved dataset from location: " + dataset.location)
    key = convert_key(dataset.credentials["encryption"]["key"])
    return decrypt_buffer(key, r.content)

class RunResult:
    def __init__(self, returncode, stdout, stderr):
        self.returncode = returncode
        self.stdout = base64.b64encode(stdout.encode()).decode('ascii')
        self.stderr = base64.b64encode(stderr.encode()).decode('ascii')

def upload_result(port, returncode, stdout, stderr):
    result=RunResult(returncode, stdout, stderr)
    json_str=json.dumps(result.__dict__)

    for dataset in read_json_datasets(port):
        url = dataset.location + "?" + base64.b64decode(dataset.credentials["query_string"]).decode('ascii')
        key = convert_key(dataset.credentials["encryption"]["key"])

        print("Writing output to location: " + dataset.location)
        requests.put(url, encrypt_buffer(key, json_str))

describe.py: This file contains the custom code called by start.py.

!/usr/bin/python3
  
import pandas as pd
import sys

for i in sys.argv[1:]:
    df_do = pd.read_csv(i)
    print("Dataset: " + i + "\n")
    print(df_do['DESCRIPTION'].describe())
    print("")

A standard docker build and docker push command must be used to create your docker and push to your registry. For example:

docker build -t /simple-python-sgx .
docker push /simple-python-sgx

4.0 Create a Nitro Enclave OS Application and Image

To know the steps for creating a Nitro Enclave OS application and image, refer to User's Guide: Add and Edit an Application

NOTE

Ensure that the Nitro File persistence option is disabled during the image creation.

5.0 Approve Tasks

Navigate to the Tasks menu item in the Fortanix CCM UI left navigation bar, fetch the domain and build whitelisting tasks, and approve the tasks.

pending-landing-screen.png

Figure 6: Tasks

6.0 Create Application Configuration

The Docker image recognizes the Python script using an Application Configuration which defines the ports.

Perform the following steps to create an Application Configuration:

  1. Navigate to ApplicationsConfigurations from the menu item in the Fortanix CCM UI left navigation bar.

  2. Click ADD CONFIGURATION to add a new configuration. 

    add-configuration-landing-screen.png

    Figure 7: Create App Configuration

    • ImageSelect the application image, such as  <my-registry>/simple-python-sgx:latest, for which you want to create a configuration.
      Where <my-registry> is the location of your docker registry.

    • Configuration Name – Enter a name for the configuration.

    • Group – Select the required group name from the drop down menu to associate this dataset with that group.

    • Description – Enter the description of the configuration.

    • Ports – Specify the ports to be used in the workflow. Multiple ports can be added, depending on the required connections. For example: input, output, heartbeat, and so on.

    • Labels – Attach one or more key-value labels to the application configuration.

    • Configuration items – These are key-value pairs used to configure the application.

    NOTE

    For ACI applications, Fortanix permits only files in the path /opt/fortanix/.

    ADD APP CONFIGURATION.png

    Figure 8: Save Configuration

  3. Click SAVE CONFIGURATION to save the configuration.  

    SAVE BUTTON.png

    Figure 9: Configuration Saved

7.0 Create a Workflow

Perform the following steps to create a Workflow:

  1. Click the Workflows menu item in the Fortanix CCM UI left navigation bar.

  2. On the Workflows page, click + WORKFLOW to create a new Workflow. 

    add-workflow-button.png

    Figure 10: Create Workflow

  3. In the CREATE NEW WORKFLOW dialog box, enter the Workflow Name, assign it to a Group, and provide a Description (optional). Click CREATE WORKFLOW to access the Workflow graph.

    Figure 19.png

    Figure 11: Created the Workflow

  4. To add an application to the Workflow graph, drag the "App" icon and drop it into the graph area. Click the + APPLICATION button. In the ADD APPLICATION dialog box, select an existing application name and image. For example: /simple-python-sgx:latest from the list of available application images.
    Where, <my-registry> is the location of your registry.

    CreateWorkflow.png

    Figure 12: Created the Workflow

  5. Click the + ADD NEW CONFIGURATION button to either add a new application configuration or select an existing one. 

    AppConfig.png

    Figure 13: Add Application Configuration

  6. Add input and output datasets to the Workflow graph by dragging the dataset icon and placing it in the graph area. Click the + DATASET button. In the ADD DATASET dialog box, select from an existing dataset created in the previous section.

    AddDataset.png

    Figure 14: Add Dataset Workflow

  7. Establish connections between the applications and input/output datasets. To do this, connect the Input Dataset to the Application by selecting the "Input" Target Port. Repeat this process to connect the Application to the Output Dataset with the "Output" Target Port.

    SelectPort.png

    Figure 15: Create Connection

  8. After the Workflow is complete, click the REQUEST APPROVAL button to initiate the approval process for the Workflow.  

    WorkflowApproval.png

    Figure 16: Request Workflow Approval

    WARNING

    When a draft Workflow is submitted for approval, it will be removed from the drafts list, and editing it directly will no longer be possible once it is in a "pending" or "approved" state.

  9. The workflow remains in a “pending” state until it receives approval from all users. In the Pending menu item, click SHOW APPROVAL REQUEST to approve a Workflow.  

    WorkflowApprovalPending.png

    Figure 17: Workflow Pending Approval

  10. In the APPROVAL REQUEST - CREATE WORKFLOW dialog box, you can either APPROVE or DECLINE a workflow.  

    show-approval-request-dialog-box.png

    Figure 18: Approve the Workflow

    NOTE

    • A user can also approve/decline a Workflow from the CCM Tasks menu item.

    • Notice that the users who have approved the Workflow have a green tick  WorkflowEx27.png against their icon.

  11. All the users of a Workflow must approve the Workflow to finalize it. If a user declines a Workflow, the Workflow is rejected. When all the users approve the Workflow, it is deployed.

    1. CCM configures apps to access the datasets.

    2. CCM creates the Workflow Application Configs.

    3. CCM returns the list of hashes needed to start the apps.

8.0 Run Nitro Workflow

Perform the following steps to run a Nitro Workflow:

  1. Run the following command to execute the application image:

    docker run -it --rm --privileged -v /run/nitro_enclaves:/run/nitro_enclaves   -e NODE_AGENT=http://172.31.9.232:9092/v1/ -e CCM_BACKEND=ccm.fortanix.com:443 -e APPCONFIG_ID=e545d0ba32c0edf86226306cad924bcfa2ad9f7fd74dafd0f4c1c7724759a9df 513076507034.dkr.ecr.us-west-1.amazonaws.com/development-images/ccm-automation-output-images:python-converted642

    Where,

    • 9092 is the default node agent port.

    • 172.31.14.110 is the node agent host IP address.

    • APPCONFIG_ID is the runtime configuration hash of the workflow app, which can be copied from the app info of the workflow.

    • 513076507034.dkr.ecr.us-west-1.amazonaws.com/development-images/ccm-automation-output-images:python-generic-app-conv is the converted app found in the Images under the Image Name column in the Images table.

      NOTE

      It is recommended to use your own inputs for node IP address and converted image in the above format. The command contains sample values only.

  2. To verify that the application is running, click the APPLICATION from the menu list in the Fortanix CCM UI and verify that there is a running application image associated with it and displayed with the application in the detailed view of the application.

  3. When the App Owner starts the application with the application config identifier.
    The Data Output Owner can view the output using the following steps:

    1. Run the following command to download the output file:

      aws s3 --profile download cp s3:<s3-directory>/conditions_output.csv.enc .

      For example:

      aws s3 --profile download cp s3://fortanix-pocs-data/conditions_output.csv.enc .
    2. Run the following command to decrypt the file:
      Use the aes_256_gcm.py script provided in the tar file.

      ./aes_256_gcm.py dec -K ./key.hex -in ./conditions_output.csv.enc -out ./output.txt
      $ cat output.txt | jq .
      {
      "returncode": 0,
      "stdout": "RGF0YXNldDogL29wdC9mb3J0YW5peC9lbmNsYXZlLW9zL2FwcC1jb25maWcvaW5wdXQvY3hoY2Z4ZHZsCgpjb3VudCAgICAgICAgICAgICAgICAgICAgICAgICAgIDgzNzYKdW5pcXVlICAgICAgICAgICAgICAgICAgICAgICAgICAgMTI5CnRvcCAgICAgICBWaXJhbCBzaW51c2l0aXMgKGRpc29yZGVyKQpmcmVxICAgICAgICAgICAgICAgICAgICAgICAgICAgIDEyNDgKTmFtZTogREVTQ1JJUFRJT04sIGR0eXBlOiBvYmplY3QKCg==",
      "stderr": ""
      }

      The following is the expected output of the file:

      $ cat output.txt | jq -r .stdout | base64 -d
      Dataset: /opt/fortanix/enclave-os/app-config/input/cxhcfxdvl
      count                 8376 
      unique                129 
      top                   Viral sinusitis (disorder) 
      freq                  1248 
      Name: DESCRIPTION, dtype: object