User's Guide: Data Ingestion

Data Ingestion

Data Ingestion is the first phase of the Confidential AI flow, where the input data is provided and captured as datasets. Datasets can be created either by connecting to an S3 bucket or by uploading a file to the Confidential AI platform.

Collect Data by Creating a Dataset

To collect the data:

  1. On the Data Ingestion page, click CREATE DATASET, and select CSV Dataset if you have a structured tabular data that consists of rows and columns in CSV format or an Image Dataset if your data is of the format bmp, jpg, jpeg, png, tif, tiff, dng. CAI_DataIngestionCreate.pngFigure 1: Create dataset
  2. CSV Dataset/Image Dataset
    1. Dataset name - Enter the Dataset name. For example: patient_input_dataset.
    2. Upload a file - Select this option if you want to upload your data directly to the Fortanix Confidential AI platform.
      • Upload a *.csv file for a tabular dataset.
      • Upload a tar.gz file for an image dataset. This file will contain images in the format: *.bmp, *.jpg, *.jpeg, *.png, *.tif, *.tiff, *.dng.
        1. In the File Upload section, upload the file.
          In a CSV dataset, notice that after the file is uploaded, the headers (column names) are detected and displayed. For example: Name, Weight, Age, and so on. The number of rows is also detected and displayed.
        Input data size is limited to 200 MB per dataset for the 3.9 release.
    CAI_Uploadfile.pngCAI_Uploadfile1.pngFigure 2: Upload a file
    S3 URL - Select this option to bring your data by connecting to an S3 account. For details on how to prepare your S3 bucket for Confidential AI, refer to the <C-AI detailed guide>.
    1. S3 bucket URL: Enter the URL of the S3 location. (The S3 URL should start with `s3://`)
    2. Access Key ID and Secret Key: Enter the Access Key ID and Secret keys for C-AI access to the data on your S3 account.
    3. Encryption key (optional): You can also provide an encryption key that was used to encrypt the data that is available on the S3 account. The encryption supported is AES-256-GCM and the provided encryption key is expected to be a 64 character long hex string.
    4. Click RETRIEVE to save the S3 details and retrieve your data. CAI__S3.pngCAI__S3Create.pngFigure 3: S3 details
      After the S3 location is saved, if you selected a CSV dataset, the headers (column names) are detected and displayed. For example, Name, Weight, Age, and so on. The number of rows is also detected and displayed.
    5. Add Labels: To track what the data is used for; you can optionally add Labels in the form of “Key:Value” pairs.

      Enter the Key and Value pair and click the LABEL button to save the label. The newly created label will appear in the Labels Added field.

      • A label’s key and value can have a maximum of 256 characters and is case-sensitive.
      • Some keys are reserved for internal use which are called system-defined labels.
        • Such as: 'Fortanix', 'fortanix', ‘CCM, ‘ccm’, confidentialcomputingmanager. Or
        • {Fortanix|Fortanix|CCM|ccm|confidentialcomputingmanager|Confidentialcomputingmanager}<Any_Non-Alphanumeric-Char><Any-Char>.
      • A dataset can have multiple labels. For example:

      Example of a “Key:Value” pairs is – “usage:purpose” where “usage” is the Key and “purpose” is the Value of the key such as “Training” or “Inference”.

  3. Click CREATE DATASET to save the data. You will now see the saved dataset in the dataset table. CAI_DatasetCreated.pngFigure 4: Saved dataset


Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful