1.0 Introduction
This article elaborates various examples of FPE schemas for the pre-defined datatypes used in the Fortanix user interface, street address and language character examples.
2.0 General Category
2.1 Credit Card Number
The credit card number has the following schema:
{
"min_length": 13,
"max_length": 19,
"char_set": [["0", "9"]],
"constraints": {
"luhn_check": true
}
}
This Encrypted part defines a credit card number (all digits, with no delimiters). It is at least 13 characters and at most 19 characters long. In addition, the number must satisfy the Luhn checksum formula.
2.2 IMSI
The IMSI data type has the following schema:
{
"char_set": [["0", "9"]],
"min_length": 14,
"max_length": 15
}
This is a 14- to 15-digit long numeric string that is free of any delimiters.
2.3 IMEI
The IMEI data type uses the following schema:
{
"char_set": [["0", "9"]],
"min_length": 15,
"max_length": 15,
"constraints": {
"luhn_check": true
}
}
This is a 15-digit numeric token that must meet the Luhn check condition.
2.4 IPV4 Address
The IPV4 address type uses the following schema:
{
"concat": [
{
"multiple": {
"concat": [
{
"min_length": 1,
"max_length": 3,
"char_set": [["0", "9"]],
"constraints": {
"num_lt": 256
}
},
{"literal": ["."]}
]
},
"min_repetitions": 2,
"max_repetitions": 2
},
{
"min_length": 1,
"max_length": 3,
"char_set": [["0", "9"]],
"constraints": {
"num_lt": 256
}
},
{"literal": ["."]},
{
"min_length": 1,
"max_length": 3,
"char_set": [["0", "9"]],
"constraints": {
"num_lt": 256
}
}
]
}
This is a token with four groups of digits, separated by periods (“.”). Each group of digits is one to three characters long and must be a number less than 256.
For example:
The following matches the schema:
123.123.123.123
2.3.23.53 satisfies the schema.
The following does not match the schema:
3 212 2 2 does not matches the schema, since the digits are not separated by dots.
123123123123 does not matches the schema, since there are no delimiters.
789.890.490.0 does not matches the schema, since digit groups must be less than 256.
2.5 Phone Number
Phone numbers in the North American Numbering plan (for example, the phone numbers for countries such as the US and Canada, but not for example, Mexico) have the general format NPA-NXX-XXXX, consisting of a three-digit area code (NPA) and a seven-digit subscriber number (NXX-XXXX).
The phone number type uses the following schema:
{
"concat": [
{"literal": ["+1 ", ""]},
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3,
"constraints": {
"num_gt": 199
}
},
{"literal": ["-", "."]},
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3,
"constraints": {
"num_gt": 199
}
},
{"literal": ["-", "."]},
{
"char_set": [["0", "9"]],
"min_length": 4,
"max_length": 4
}
]
}
2.6 Fax Number
The fax number type uses the following schema:
{
"concat": [
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 4,
"max_length": 4
}
]
}
2.7 Date
This encodes a date of the following formats:
MM DD YYYY
DD MM YYYY
The date type uses the following schema:
{
"concat": [
{
"min_length": 2,
"max_length": 2,
"char_set": [["0", "9"]],
"constraints": {
"date": "day"
}
},
{"literal": ["/", ".", "-", " "]},
{
"min_length": 2,
"max_length": 2,
"char_set": [["0", "9"]],
"constraints": {
"date": "month"
}
},
{"literal": ["/", ".", "-", " "]},
{
"min_length": 4,
"max_length": 4,
"char_set: [["0", "9"]],
"constraints": {
"date": "year"
}
}
],
"constraints": {
"date": { "dmy_date": {} },
"applies_to": {
"0": "all",
"2": "all",
"4": "all"
}
}
}
This encodes a date of the form DD/MM/YYYY, where the slash delimiters can be periods (.), hyphens (-), or spaces ( ); the schema does not enforce consistency in delimiter choice. For example, the date format can be 03/04.2021
You can restrict the minimum and maximum allowed dates. This information gets added to the constraints.date.dmy_date
field in the schema above.
For example, if you want a list of dates that fall between January 1, 2000 and December 31, 3000 (inclusive), the constraints.date.dmy_date field must resemble as follows:
{
"before": {
"year": 3001,
"month": 1,
"day": 1
},
"after": {
"year": 1999,
"month": 12,
"day": 31
}
}
2.8 Email Address
The current API cannot handle the complete email address format specification, which contains rules such as limiting the number of UTF-8 bytes in the local part, disallowing consecutive dots (for example, [email protected] is invalid), and so on.
Our simplified email datatype consists of the following three sections (in the order below):
A local part (for example, the “test” in “[email protected]”), which consists of printable ASCII characters. In particular,
Upper and lowercase letters (A-Z and a-z) are allowed.
Digits from 0 to 9 are also allowed.
The symbols “!”, “#”, “$”, “%”, “&”, a single quote, “.”, “*”, “+”, “-”, “/”, “=”, “?”, “^”, “_”, a backtick, opening and closing braces, “|”, and “~” are allowed.
The entire local part must be 1 to 64 characters long.
The “@” character
A domain name containing one or more dot-separated DNS labels (for example, “fortanix” and “com” in “[email protected]”). The entire domain is limited to 255 characters.
Each DNS label must be at most 63 characters long.
The DNS labels consist of only ASCII alphanumeric characters (where the letters can be uppercase or lowercase) and dashes (“-”).
Hence, the JSON FpeDataPart will look like the following:
{
"concat": [
{
"char_set": [
["!", "!"],
["#", "'"],
["*", "+"],
["-", "9"],
["=", "="],
["?", "?"],
["A", "Z"],
["^", "~"]
],
"min_length": 1,
"max_length": 64
},
{"literal": ["@"]},
{
"concat": [
{
"char_set": [
["0", "9"],
["A", "Z"],
["a", "z"],
["-", "-"]
],
"min_length": 1,
"max_length": 63
},
{
"multiple": {
"concat": [
{"literal": ["."]},
{
"char_set": [
["0", "9"],
["A", "Z"],
["a", "z"],
["-", "-"]
],
"min_length": 1,
"max_length": 63
}
]
}
}
],
"max_length": 255
}
]
}
To specify multiple dot-separated DNS labels, the domain name has been encoded as follows:
An Encrypted part representing the first DNS label.
Represents multiple occurrences of a dot-DNS label combo (for example, “.com” is one dot-DNS label combination; “.gov.uk” contains two dot-DNS label combinations).
The dot-DNS label combination is represented by a concatenation of two subparts: a Literal part for the dot and an Encrypted part for the DNS label.
3.0 Language Characters
The Fortanix tokenization API supports all Unicode codepoints including Chinese, Japanese and Korean characters.
NOTE
Tokenizing with large character sets may result in slow tokenization. As the tokenization API supports Unicode, it is possible to support Japanese characters in the Fortanix tokens.
{
"min_length": 10,
"max_length": 10,
"char_set": [["\u4E00", "\u9FFF"]]
}
The Encrypted part describes a 10-character long string of Chinese, Japanese and Korean characters.
As the tokenization API supports Unicode, it is possible to support Korean characters in the Fortanix tokens. For example, a 10-character-long datatype format using precomposed Korean characters (Hangul):
{
"min_length": 10,
"max_length": 10,
"char_set": [["\uAC00", "\uD7A3"]]
}
NOTE
The character set here is precisely all 11172 assigned codepoints in the “Hangul Syllables” block, starting from “가” and ending at “힣”.
Large character sets may result in slow tokenization and detokenization.
3.1 Emoji
The emoji type uses the following schema with 10 emoji characters:
{
"min_length": 10,
"max_length": 10,
"char_set": [["😀", "😐"]]
}
The character set ranges from U+1F600 to U+1F610, consists of the first 17 characters in the “Emoticons” block in Unicode.
NOTE
If Unicode escapes are used in the JSON above, then encode the two emoji characters above using surrogate pairs as JSON does not support Unicode escapes for characters outside the Basic Multilingual Plane.
4.0 Street Address
Street addresses can take various forms. For example, the street number contains at least 1 digit and at most 5, and each word consists of at least one letter (uppercase or lowercase), but no more than 20 letters.
This format covers inputs such as 800 West El Camino Real, but not 800B Some Street (due to the letter in the street number) or 800 B Street, CA (notice the comma).
The street number has the following schema:
{
"concat": [
{
"max_length": 5,
"min_length": 1,
"char_set": [["0", "9"]]
},
{
"multiple": {
"concat": [
{"literal": [" "]},
{
"max_length": 20,
"min_length": 1,
"char_set": [["A", "Z"], ["a", "z"]]
}
]
},
"max_repetitions": 10,
"min_repetitions": 1
}
]
}
JSON structure: this is a Concat consisting of the following subparts:
Encrypted part of the street number.
Multiple part
Subpart to be repeated: a Concat part, representing a space-word combination.
Literal part, representing a space.
Encrypted part, representing a word.
5.0 Identification Numbers Category
5.1 Social Security Number (SSN)
In the United States, a Social Security Number has the general format AAA-GG-SSSS, where AAA, GG, and SSSS are groups of decimal digits, separated by hyphens. Furthermore, the AAA group must be less than 900, and cannot be equal to 0 or 666. The other two digit groups also cannot be equal to 0.
This can be expressed through a Concat that consists of the AAA section, a hyphen, the GG section, another hyphen, and the SSSS section.
The SSN has the following schema:
{
"concat": [
{
"min_length": 3,
"max_length": 3,
"char_set": [["0", "9"]],
"constraints": {
"num_lt": 900,
"num_ne": [0, 666]
}
},
{"literal": ["-"]},
{
"min_length": 2,
"max_length": 2,
"char_set": [["0", "9"]],
"constraints": {
"num_ne": [0]
}
},
{"literal": ["-"]},
{
"min_length": 4,
"max_length": 4,
"char_set": [["0", "9"]],
"constraints": {
"num_ne": [0]
}
}
]
}
The constraints.num_ne field within each Encrypted part is a list of numbers that the token section should not be equal to.
5.2 Passport Number
The password number has the following schema:
{
"char_set": [["0", "9"], ["A", "Z"], ["a", "z"]],
"min_length": 6,
"max_length": 9
}
The token should be an ASCII alphanumeric string, between six and nine characters long (inclusive). This may approximate the actual US passport format.
5.3 Driver's License
The driver license has the following schema:
{
"char_set": [["0", "9"], ["A", "Z"], ["a", "z"]],
"min_length": 6,
"max_length": 9
}
The token should be an ASCII alphanumeric string, 4 to 100 characters long, with no delimiters allowed.
5.4 Individual Taxpayer Identification Number (ITIN)
The ITIN has the following schema:
{
"concat": [
{
"min_length": 3,
"max_length": 3,
"char_set": [["0", "9"]],
"constraints": {
"num_gt": 899
}
},
{"literal": ["-"]},
{
"min_length": 2,
"max_length": 2,
"char_set": [["0", "9"]],
"constraints": {
"num_ne": [0]
}
},
{"literal": ["-"]},
{
"min_length": 4,
"max_length": 4,
"char_set": [["0", "9"]]
}
]
}
The format is similar to the format of SSN (AAA-GG-SSSS), where three digit groups are separated by dashes (“-”). Furthermore, the following constraints apply:
The AAA group must be at least 900.
The second-digit group cannot be zero.
Real ITINs are likely subject to additional restrictions, which this schema does not cover.
For example:
The following matches the schema:
900-45-6789 satisfies the schema.
The following does not match the schema:
90-0456789 does not satisfy the schema since the delimiters are in wrong place.
900 45 6789 does not satisfy the schema since the delimiters must be dashes.
400-12-2334 does not satisfy the schema since the first digit group must be at least 900.
5.5 Employee Identification Number (EIN)
The EIN has the following schema:
{
"concat": [
{
"min_length": 2,
"max_length": 2,
"char_set": [["0", "9"]],
"constraints": {
"num_ne": [0]
}
},
{"literal": ["-", " ", ""]},
{
"min_length": 3,
"max_length": 3,
"char_set": [["0", "9"]]
},
{
"min_length": 4,
"max_length": 4,
"char_set": [["0", "9"]]
}
]
}
The format here is one of the following:
AA-BBBBBBB
AA BBBBBBB
AABBBBBBB
Where, AA and BBBBBBB are groups of digits, and the AA group cannot be 0.
6.0 MILITARY SERVICE NUMBERS (MSN) CATEGORY
6.1 Army and Air Force Service Number
The army and air force service number has the following schema:
{
"format": {
"concat": [
{
"char_set": [["0", "9"]],
"min_length": 2,
"max_length": 2
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
}
]
},
"description": "Army and Air Force Service Number (USA)"
}
The general format is AA-BBB-CCC, three digit groups separated by dashes (“-”).
6.2 Navy Service Number
The navy service number has the following schema:
{
"format": {
"concat": [
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 2,
"max_length": 2
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 2,
"max_length": 2
}
]
},
"description": "Navy Service Number (USA)"
}
The general format is AAA-BB-CC, three digit groups separated by dashes (“-”).
6.3 Coast Guard Service Number
The coast guard service number has the following schema:
{
"format": {
"concat": [
{
"char_set": [["0", "9"]],
"min_length": 3,
"max_length": 3
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 2,
"max_length": 2
},
{"literal": ["-"]},
{
"char_set": [["0", "9"]],
"min_length": 2,
"max_length": 2
}
]
},
"description": "Coast Guard Service Number"
}
The general format is AAAA-BBB, a group of four digits, followed by a dash (“-”) followed by a group of three digits.
6.4 Marine Corps Service Number
The marine corps service number has the following schema:
{
"name": "Marine Corps Service Number (USA)",
"radix": 10,
"min_length": 6,
"max_length": 6,
}
The token must have six numeric digits.
6.5 Military office Service Number
The military office service number has the following schema:
{
"name": "Military Officers Service Number (USA)",
"radix": 10,
"min_length": 5,
"max_length": 5,
}
The token must have five numeric digits.