Security Objects Tokenization Best Practices

1.0 Introduction

This article describes the best-practices scenario for performing API tokenization.

For detailed information about the FpeConstraints, refer to the Fortanix DSM REST API page.

2.0 API

FpeOptions::Advanced allows specifying a char_set, which is used for input decoding and output encoding. The requirement is to use one char_set for input decoding and another char_set for output encoding. The cardinality (size of alphabets) of the two character sets must be the same for every part.

2.1 Description

An AES security object is required for tokenization. The tokenization options are specified in the FPE field of SobjectRequest.

  • A new field, named cipher_char_set is added to FpeEncryptedPart to support specifying two character sets. The token values use the cipher_char_set alphabets and default to char_set if not specified.

  • The cipher_char_set field is used for output encoding during tokenization and input decoding during de-tokenization.

  • The specified constraints apply to both char_set and cipher_char_set alphabets. Any constraints specified must be validated to ensure that they can be applied to both of the mentioned alphabets. Otherwise, the API should respond with an appropriate error message.

2.2 Response Body

The response Sobject struct will have new fields as described above. There is no other change.

pub struct Sobject {
...
}

3.0 JSON Examples

3.1 Applying Constraints to Subparts of an OR Part

For OR parts, the applies_to field of the constraints field (if specified) cannot specify subparts of the OR to apply the constraints to. Instead, those constraints should be specified directly in those subparts themselves. Thus, the following example is incorrect:

{
"or": [
    {"literal": ["12344"]},
    {"literal": ["abcdefg"]},
    {
      "char_set": [["0", "9"]],
      "min_length": 10,
      "max_length": 10
    }
  ],
  "constraints": {
    "luhn_check": true,
    "applies_to": {
      "0": "all",
      "2": "all"
    }
  }
}

The following example is correct:

{
  "or": [
    {"literal": ["12344"]},
    {"literal": ["abcdefg"]},
    {
      "char_set": [["0", "9"]],
      "min_length": 10,
      "max_length": 10
    }
  ],
  "constraints": {
    "luhn_check": true,
    "applies_to": {
      "0": "all",
      "2": "all"
    }
  }
}

3.2 Bad Preserve or Mask-related Fields

3.2.1 Indices out-of-range

In the example below, the preserve field on the encrypted part is incorrect, due to the index -4 being out of range. (The encrypted part has a minimum length of 3, and if the input is exactly three digits long, -4 would be out of range. The fact that the maximum length is greater than 4 is irrelevant.)

{
  "char_set": [["0", "9"]],
  "min_length": 3,
  "max_length": 7,
  "preserve": [-4, -3]
}

3.2.2 Specifying Fields at Two Different “Levels”

If a preserve or mask field is specified on a compound part then any of its subparts (or its descendants) cannot specify that field. The example below violates that rule. (While the mask fields do not conflict here, this is still disallowed.)

{
  "concat": [
    {"literal": ["hello"]},
    {
      "char_set": [["0", "9"]],
      "min_length": 6,
      "max_length": 6,
      "mask": []
    }
  ],
  "mask": false
}

The string "all" is used for encrypted parts, and true is used for compound parts. This example below mixes up the two usages.

{
  "concat": [
    {
      "char_set": [["0", "9"]],
      "min_length": 10,
      "max_length": 10,
      "preserve": true
    },
    {
      "concat": [
        {
          "char_set": [["A", "Z"]],
          "min_length": 10,
          "max_length": 10
        },
        {
          "char_set": [["a", "z"]],
          "min_length": 10,
          "max_length": 10
        }
      ],
      "preserve": "all"
    }
  ]
}

3.2.3 Preserving Only the Month Part of a Date

Dates have rules as to which parts can be preserved and which cannot. This example is invalid due to its attempt to preserve only the Month part.

{
  "concat": [
    {
      "char_set": [["0", "9"]],
      "min_length": 2,
      "max_length": 2,
      "preserve": "all"
      "constraints": {
        "date": "month"
      }
    },
    {"literal": ["/"]},
    {
      "char_set": [["0", "9"]],
      "min_length": 2,
      "max_length": 2,
      "constraints": {
        "date": "year"
      }
    }
  ],
  "constraints": {
    "date": {
      "month_year_date": {}
    },
    "applies_to": {
      "0": "all",
      "2": "all"
    }
  }
}

3.2.4 Luhn Check Constraints and Other Constraints

Any encrypted part under a Luhn check constraint cannot specify any other constraints unless it is guaranteed to be fully preserved. The first and second encrypted parts in the example below do not satisfy this requirement. (The third one does, because even if the part is at its maximum length of 5, the indices 0, 1, -3, -2, and -1 still manage to cover every single index.)

{
  "concat": [
    {
      "char_set": [["0", "9"]],
      "min_length": 5,
      "max_length": 5,
      "preserve": [2],
      "constraints": {
        "num_ne": [90, 292]
      }
    },
    {
      "char_set": [["0", "9"]],
      "min_length": 5,
      "max_length": 5,
      "constraints": {
        "num_lt": 99021
      }
    },
    {
      "char_set": [["0", "9"]],
      "min_length": 3,
      "max_length": 5,
      "preserve": [0, 1, -3, -2, -1],
      "constraints": {
        "num_ne": [902]
      }
    }
  ],
  "constraints": {
    "luhn_check": true
  }
}

3.2.5 Nested Luhn Checks

Luhn check constraints prohibit sharing encrypted parts. The following example is invalid due to the first encrypted part falling under two Luhn check constraints: one specified on the outer concatenation and one specified on the inner concatenation.

{
  "concat": [
    {
      "concat": [
        {
          "char_set": [["0", "9"]],
          "min_length": 5,
          "max_length": 5
        },
        {"literal": ["-"]},
        {
          "char_set": [["0", "9"]],
          "min_length": 5,
          "max_length": 5
        }
      ],
      "constraints": {
        "luhn_check": true,
        "applies_to": {
          "0": "all",
          "2": "all"
        }
      }
    },
    {
      "char_set": [["0", "9"]],
      "min_length": 6,
      "max_length": 6
    }
  ],
  "constraints": {
    "luhn_check": true,
    "applies_to": {
      "0": {
        "0": "all"
      },
      "1": "all"
    }
  }
}

3.3 Date Parts Underneath an OR Part

The encrypted parts that comprise a date cannot be underneath an or or multiple part (with the exception that the entire date part can be underneath an or or multiple part). Thus, this example is invalid, as the applies_to field for the date constraint is being applied to an or part (and hence the year subparts underneath the or).

{
  "concat": [
    {
      "char_set": [["0", "9"]],
      "min_length": 2,
      "max_length": 2,
      "constraints": {
        "date": "month"
      }
    },
    {
      "or": [
        {
          "char_set": [["0", "9"]],
          "min_length": 2,
          "max_length": 2,
          "constraints": {
            "date": "year"
          }
        },
        {
          "char_set": [["0", "9"]],
          "min_length": 4,
          "max_length": 4,
          "constraints": {
            "date": "year"
          }
        }
      ]
    }
  ],
  "constraints": {
    "date": {
      "month_year_date": {}
    }
  }
}

Unlike ordinary FF1 encryption when handling date-related parts, the resulting output token may differ in length as compared to the input token.

NOTE

A duplicate name field is added while serializing FpeOptions values of the Advanced variant to accommodate the users with legacy code that expects the availability of the name field.

However, if the user wants to tokenize or detokenize only an input token, then the exact order of the characters is unimportant. The only exception is that “numeric” character sets must consist of exactly the digits from ‘0' to '9', in order. This is relevant for constraints, as all the constraints currently available are only applicable to numeric parts.

NOTE

A Unicode codepoint does not necessarily correspond to the general idea of a “character”; for example, “é” can be represented by U+00E9, or by the sequence U+0065, U+3031 (with the two representations being canonically equivalent). This API treats all three codepoints as distinct, and hence performs no Unicode normalization of any kind.

4.0 More Information