Skip to main content
Versa Networks

DLP Predefined Patterns Quick-Reference

This guide is a detailed reference for all predefined DLP data patterns available in Versa Networks, including:

  • Pattern definitions
  • Matching logic
  • Severity levels
  • Supported formats
  • Real-world match examples

Each pattern entry documents the detection logic, keywords, proximity rules, and format specifications used by the DLP engine to identify sensitive data in real time.

DLP Predefined Data Patterns

Data Loss Prevention (DLP) helps organizations detect and prevent accidental or malicious leaks of sensitive information across network traffic, cloud applications, and endpoints.

Versa Networks DLP uses predefined data patterns to identify common forms of sensitive data such as credit card numbers, government-issued IDs, healthcare codes, or source code without requiring administrators to manually build complex regular expressions. These patterns are bundled in the Versa Security Pack (Spack) and can be applied directly in DLP Content Analysis rules.

How Predefined Patterns Work

Every predefined pattern is built using three detection components that must all fire together for a match to be reported:

Keyword (Mandatory) One or more trigger words that must be present near the sensitive data. Case-insensitive. The keyword may appear before or after the pattern.
Keyword also help reduce false positive with just matching patterns.
Without a keyword match, the pattern will not fire even if the regex matches.
Examples
:

"aadhaar",

"credit

card",

"ssn",

"passport".
Regex (Pattern) A structured regular expression defining the format of the sensitive data to detect (e.g., a 12-digit Aadhaar number or a Luhn-validated credit card number). Operates within the proximity window of the keyword.
Proximity (Range Window) Maximum distance allowed between the keyword and the regex match. Ranges from 25 to 200 bytes depending on the pattern. A match is only reported when both keyword and regex appear within this window.
Key Rule
A pattern match fires ONLY when BOTH a matching keyword AND a matching regex are found within the defined proximity window. Keyword-only or Regex-only matches are insufficient.

DLP Pattern Severity Levels

When you assign a predefined pattern to a DLP Content Analysis rule, you also assign a Severity Level. This controls how many times a pattern must match within the scanned content before the rule triggers an action.

Severity Level Default Threshold Recommended Use
Low 1 Minimum — flags even a single occurrence. Best for highly sensitive identifiers (SSN, Aadhaar, Passport).
Medium 10 Moderate requires 10 matches. Good for data common in business documents (dates, phone numbers).
High 20 High-confidence triggers 20+ matches. Useful for patterns with higher false-positive risk.
Critical 50 Extreme volume — designed for bulk-exfiltration scenarios.

Severity Value Override

If a Severity Value is explicitly defined in the rule configuration, it overrides the default threshold for that severity level.

Variation & Description Example(s)
High (default 20 matches) with Severity Value = 5 Rule triggers after 5 matches, not 20
Low (default 1 match) with Severity Value = 3 Rule triggers after 3 matches, not 1

Best Practice
Use Low severity for highly sensitive individual identifiers. Use Medium or High for patterns prone to false positives. Always test in Alert-only mode before switching to Block or Quarantine.

Available DLP Actions

The following actions can be configured when a DLP predefined pattern match triggers a rule:

Action Description
Allow Permits the traffic/content to pass without restriction.
Alert Allows content but generates a log and alert for administrator review.
Block Stops the transmission of the matched content.
Reject Blocks and sends a rejection response to the sender.
Quarantine Holds the content for administrator review before release or deletion.

Detailed Pattern Descriptions

This section provides in-depth descriptions for every documented pattern, organized by category and country, including all match examples from Versa's internal documentation.

Payment Card Patterns

Payment card patterns cover all elements of PCI DSS-sensitive data: card numbers, cardholder names, CVV codes, PINs, expiry dates, and service codes. These patterns are global and not country specific.

CREDIT_CARD_NUMBER

  • Continuous, spaced, and hyphenated PAN formats are supported.
  • Example numbers shown are for testing. For production validation, use Luhn-valid PANs to reduce false positives.
  • Some 6-series issuer ranges overlap (for example, RuPay, UnionPay, and Discover).
  • The pattern reports a match only when a valid payment-card keyword is present within the proximity window.
Data Type Payment Card Primary Account Number (PAN)
Keywords Rupay
Visa
mastercard, mastercards
quicksilver
capitalone
debitcard
united
credit, creditcard
amex
american express, americanexpress
master
diner's club, diners club, dinersclub
discover, discover card, discovercard, discover cards
JCB
BrandSmart
card numbers
CC No
CC#
credit card No
CCN
Proximity Must be within 100 bytes
Format PANs of 13–19 digits, continuous or grouped with spaces or hyphens. Supported issuers:
Visa: starts with 4, 16 digits
Mastercard: 51–55 or 2221–2720, 16 digits
Amex: 34/37, 15 digits, 4-6-5 grouping
Discover/RuPay/UnionPay: 6011, 65xx, 64[4–9], 62x, 16 digits
Diners Club: 300–305 / 360–369 / 380–389, 14 digits
JCB: 3528–3589, 16 digits
Maestro/Dankort: various 50/60-series prefixes, 16 digits

 

Examples: What It Will Match

Variation & Description Example(s)
Visa (16 digits, grouped or continuous) Visa 4111 1111 1111 1111
credit card 4539-1488-0343-6467
Mastercard (51–55 or 2221–2720, 16 digits) mastercard 5555 5555 5555 4444
creditcard 2223-0000-4841-0010
American Express (34/37, 15 digits, 4-6-5) american express 3714 496353 98431
amex 3782-822463-10005
Discover / 6-series (6011/65xx/64[4–9]/62x) discover card 6011 1111 1111 1117
credit 6510-0000-0000-0004
Diners Club (300–305 / 36 / 38–39, 14 digits) diners club 3056 930902 5904
diner's Club 3852-000002 3237
JCB (3528–3589, 16 digits) JCB 3566 0020 2036 0505
card numbers 3530-1113-3330-0000
Maestro / Dankort (common prefixes, 16 digits) CC No 5019 7170 0101 3742
CC# 6759-0000-0000-0008
Generic keyword variants creditcard 4716 7361 1384 2732
credit card No 5372-1653-9838-6508

CARD_HOLDER_NAME

  • Each name segment must be capitalized and can contain letters only, with an optional hyphen (-) or apostrophe (').
  • Non-Latin scripts, all-lowercase names, or names containing digits/symbols (other than - or ') will not match this predefined pattern.
  • Ensure a non-alphanumeric boundary immediately after the name (for example, a comma, period, or space) to satisfy the detector.
  • Names with apostrophe as the second character (e.g., D'Arcy) are not reliably matched, the apostrophe is treated as a boundary character. Suffixes (Jr., Sr., II, III, IV) require a comma before them (e.g., 'Smith, Jr.').
Data Type Cardholder / Account Holder Name
Keywords name
account holder
account name
ac name
Proximity Must be within 70 bytes
Format Allowed structure (must be bounded by non-alphanumeric characters):
Optional salutation: Mr/Ms/Mrs/Miss/Dr/Sir (with or without ".")
First name: capitalized, up to 15 letters, optional hyphenated segment
Optional middle name or initial
Last name: capitalized; may include hyphen or apostrophe (e.g., O'Connor, Smith-Jones)
Optional suffix: Jr., Sr., II, III, IV

Examples: What It Will Match

Variation & Description Example(s)
Simple first + last & with salutation name John Smith,
ac name Alice Brown.
account holder Mr. Rahul Verma:
account name Dr Anita Menon,
Middle initial / middle name Name Daniel K Gupta.
name Priya M Iyer,
Hyphenated last name & apostrophe in last name ac name Maria Lopez-Garcia.
name Sean O'Connor.
account name Darcy Patel,
Suffix present & compact spacing name Robert Singh, Jr.
account name Neeraj Kumar, II.
ac name R Sharma.

CVV

  • Matches only 3- or 4-digit numeric values.
  • CVV alone (without keywords) will not trigger a match.
Data Type Card Verification Value (CVV/CVV2/CVC2/CSC/Security Code)
Keywords cvv
cvv2
cvc2
csc
security code
security number
card identification number
card verification
issue number
pin block
Proximity Must be within 30 bytes
Format A 3- or 4-digit numeric code:
3-digit: Visa/Mastercard (typically printed on the back of the card)
4-digit: American Express (typically printed on the front of the card)

Examples: What It Will Match

Variation & Description Example(s)
Standard 3-digit CVV cvv 123
security code 456
csc 321
4-digit CVV (e.g., Amex) cvv2 9876
cvc2 number 1234
card identification number 678
Alternate labels & symbol boundaries security number 432
issue no 789
cvv# 654

PIN_NUMBER

Matches strictly numeric PINs only, no alphanumeric.

Data Type Personal Identification Number (PIN)
Keywords PIN
Proximity Must be within 30 bytes
Format Strictly numeric PIN values:
4-digit PINs (e.g., ATM PINs)
6-digit PINs (banking/2FA applications)

Examples: What It Will Match

Variation & Description Example(s)
4-digit PIN PIN 1234
PIN: 9876
6-digit PIN with delimiter PIN 456789
PIN=7410
PIN -- 556677

EXPIRY_DATE

  • Matches both short (MM/YY) and long (MM/YYYY) formats.
  • Accepts month names (short or full) in English followed by year.
  • Works in combination with CREDIT_CARD_NUMBER and CVV patterns to identify full payment card data exposure.
Data Type Payment Card Expiry Date
Keywords expiration
expire
expires
expiry
validity
valid thru
valid
expiry date
date expiry
Proximity Must be within 30 bytes
Format Supported expiry date formats:
Numeric: MM/YY, MM-YY, MM YYYY, MM-YYYY
Textual: Jan 25, December/2029, Oct-24

Examples: What It Will Match

Variation & Description Example(s)
Standard 2-digit year (MM/YY) expiry 05/25
valid thru 12-24
4-digit year (MM/YYYY) expiration date 07/2028
validity 03-2026
Textual month + 2-digit year expires Sep-25
expiry date October 29
Textual month + 4-digit year valid Dec 2029
expiry Jan 2031
Mixed delimiters (/, -, space) expire 04 25
valid thru 06-2027

SERVICE_CODE

Data Type Payment Card Service Code (ISO/IEC 7813)
Keywords service code
Proximity Must be within 30 bytes
Format Service code formats:
3- or 4-digit numeric code
Often found in payment card Track 1/Track 2 data alongside card number, expiry, and CVV

Examples: What It Will Match

Variation & Description Example(s)
3-digit service code service code 201
service code 601
4-digit with punctuation service code 2200
service code: 250
service code - 7210

SWIFT_CODE

  • Valid formats are 8 or 11 characters. The country code is validated against ISO 3166-1 alpha-2 country codes.
  • Regex ensures format compliance but does not validate if the SWIFT code is actively registered.
Data Type SWIFT / BIC Bank Identifier Code (ISO 9362)
Keywords swift#
swiftcode
swiftnumber
swiftroutingnumber
swift code
swift number #
swift routing number
bic number
bic code
bic#
bic #
bank identifier code
iso9362
iso 9362
international organization for standardization 9362
Organisation internationale de normalisation 9362
rapide #
code SWIFT
Proximity Must be within 70 bytes
Format SWIFT/BIC structure:
Length: 8 or 11 alphanumeric characters
Bank Code: 4 letters
Country Code: 2 letters (ISO 3166-1)
Location Code: 2 characters
Optional Branch Code: 3 characters
Examples: HSBCGB2LXXX, DEUTDEFF500
Must be bounded by non-alphabetic characters

Examples: What It Will Match

Variation & Description Example(s)
8-character SWIFT (no branch code) swift code HDFCINBB
bic number BOFAUS3N
11-character SWIFT (with branch code) swift# SBININBBXXX
bic code CHASUS33NYC
Other keyword variants & punctuation bank identifier code ICICINBB
swift code: HSBCGB2L,
iso9362 DEUTDEFF500

Banking Patterns

Banking patterns help identify common banking identifiers used in payment and funds-transfer workflows, including bank account numbers and routing/clearing codes (for example, US ABA routing numbers, UK sort codes, and AU BSB codes). These patterns are country-specific where applicable and are typically used together (e.g., account number + routing/sort/BSB) to reduce false positives and improve detection confidence.

US_BANK_ACCOUNT_NUMBER

  • Not intended for routing numbers, IBAN, or SWIFT — those have separate patterns.
  • Combine with routing number and account holder name patterns in a DLP rule for stricter detection.
Data Type US Bank Account Number (generic DDA/Savings/Checking)
Keywords Acct (No./Number/#)
Account (#/No.)
Checking Account
Checking Acct Number/#No.
Bank Account Number/#/No.
Bank Acct Number/#/No.
Savings Account Number/No./#
Savings Acct Number/#No.
Debit Account Number/#/No.
Debit Acct Number/#/No.
Proximity Must be within 50 bytes
Format A standalone numeric string (word-boundary delimited):
Length: 4–17 digits
Digits only (no spaces or hyphens inside the number)
Format-based and issuer-agnostic — does not validate against a specific bank or checksum

Examples: What It Will Match

Variation & Description Example(s)
Short account (4–6 digits) Acct No. 4729 | Account # 120345
Typical 8–12 digits Checking Account 00349872105 | Savings Acct Number 78543210987
Long form up to 17 digits Bank Account No. 12004567890123456
Debit account phrasing Debit Acct # 9988776655
With punctuation/spacing Account Number: 004567891234,
Savings Account No. 5500123499.

US_BANK_ROUTING_NUMBER

Data Type US Bank Routing Number (ABA / RTN / MICR)
Keywords aba number
aba#
aba no.
abarouting number/#/no.
americanbankassociationrouting number/#/no.
bankrouting number/#/no.
routing number
routing no.
routing #
routing transit number/no./#
RTN
MICR number/#/no.
Proximity Must be within 30 bytes
Format 9-digit ABA routing number with optional hyphens. First digit constrained to 0–3 or 6–8 per ABA specifications. Example:
021000021
1234-5678-9

Examples: What It Will Match

Variation & Description Example(s)
9 digits, with hyphens & MICR phrasing Routing Number 123456789
ABA No. 1234-5678-9
MICR # 021200025
Routing transit & RTN acronym Routing Transit Number: 2110-0000-3
RTN 021000021

UK_BANK_ACCOUNT_NUMBER

Data Type UK Bank Account Number
Keywords Acct (No./Number/#)
Account (#/No.)
Checking Account
Bank Account (Number/#/No.)
Savings Account (Number/No./#)
Debit Account (Number/#/No.)
Proximity Must be within 50 bytes
Format 7–8 digit numeric string, word-boundary delimited.

AUSTRALIA_BANK_NUMBER

Data Type Australian Bank Account Number
Keywords Bank Account
account number
banking information
correspondent bank
bank details
information account
Proximity Must be within 60 bytes
Format Supported formats:
6 digits (base value)
Optional grouping: NNN-NNN
Optional 4-digit extension is also supported (NNN-NNN-NNNN)
Examples: 123-456, 123-456-7890

AUSTRALIA_BANK_STATE_BRANCH

Data Type Australian Bank State Branch (BSB) Code
Keywords Bank state branch
bsb
bsd:
bsb-
Proximity Must be within 60
Format 6-digit code in NNN-NNN format. Example: 062-000.

SORT_CODE

Data Type UK Bank Sort Code
Keywords sort code
sort_code
Proximity Must be within 50 bytes
Format Exactly 6 digits, in one of the following formats (word boundaries enforced):
Continuous: NNNNNN
Spaced: NN NN NN
Hyphenated: NN-NN-NN

Examples: What It Will Match

Variation & Description Example(s)
Continuous, spaced & hyphenated formats sort code 123456
sortcode 12 34 56
sort_code 12-34-56
Sort Code: 12-34-56

Government & National ID Patterns - India

India government and national ID patterns help detect commonly used Indian identifiers such as Aadhaar, PAN, and GSTIN. These patterns are designed for high-confidence detection by combining country-specific formats with mandatory keyword context and strict boundary rules to reduce false positives in general text.

INDIA_AADHAAR_INDIVIDUAL

  • First digit must be 2–9 (Aadhaar numbers starting with 0 or 1 are invalid).
  • Uses the Verhoeff algorithm to detect valid numbers
Data Type India National ID (Aadhaar)
Keywords aadhaar
aadhaar card
Proximity Must be within 30 bytes
Format Aadhaar number formats:
Valid 12-digit number starting with 2–9
Separators: may be continuous, or grouped with spaces or hyphens
Grouping: XXXX XXXX XXXX or XXXX-XXXX-XXXX
Must have a non-alphanumeric boundary before and after the number (e.g., space, :, #, =, ., comma, ), newline)

Examples: What It Will Match

Variation & Description Example(s)
Continuous 12 digits aadhaar 745167489123.
(aadhaar) 293412345678,
Grouped with spaces (NNNN NNNN NNNN) aadhaar 2161 6729 3627,
aadhaar card 2008 8057 0558:
Grouped with hyphens (NNNN-NNNN-NNNN) aadhaar card 2934-1234-5678.
aadhaar 8384-2795-9970,
Surrounded by non-alphanumeric boundaries aadhaar: 875465433456
aadhaar#7451 2345 6789
aadhaar card = 9283 5567 4421
Keywords before number within 30-byte limit aadhaar (valid) 2008 8057 0558

INDIA_PAN_INDIVIDUAL

  • Ensure a non-alphanumeric character after the PAN (e.g., . , : or space) to satisfy the boundary requirement.
  • The 4th character is constrained to specific letters (A/B/C/F/G/H/L/J/P/T); other letters there will not match.
Data Type India Tax ID (PAN — Permanent Account Number)
Keywords pan#
pancard number
pan number
permanent account number
Proximity Must be within 40 bytes
Format 10 characters total (AAAAA9999A): Positions 1–3: letters A–Z.
Position 4: one of A, B, C, F, G, H, L, J, P, T (entity type per PAN rules).
Position 5: letter A–Z.
Positions 6–9: digits 0–9.
Position 10: letter A–Z.
Must have non-alphanumeric boundary before and after.

Examples: What It Will Match

Variation & Description Example(s)
Standard continuous PAN (5 letters + 4 digits + 1 letter) pan number ABCPE1234F.
Different 4th-char class (from allowed set) pancard number AABTF1234K,
pan# CCHPL6789Q:
With punctuation/spacing boundaries permanent account number: AAAPL1234C,
Short phrase before PAN (keyword within 40 bytes) pan number (applicant) AAAAA9999Z.
Alternate keyword forms pan# ABCPQ1234M,
pancard number ZZZFG4321P.

INDIA_GST_INDIVIDUAL

  • Regex enforces state code (01–37), PAN-based core, entity code, Z, and check digit.
  • Non-alphanumeric characters before and after GSTIN are required for a match.
Data Type India GSTIN (Goods and Services Tax Identification Number)
Keywords gst#
gst number
Proximity Must be within 70 bytes
Format GSTIN structure (15-character alphanumeric):
State code: 2 digits (01–37)
PAN-based core: 10 characters (5 letters + 4 digits + 1 letter)
The embedded PAN follows standard PAN rules, the 4th letter must be one of A, B, C, F, G, H, J, L, P, T (entity type).
Entity code: 1 character (alphanumeric 1–9 or A–Z)
Fixed value: 1 character always ‘Z’
Check digit: 1 alphanumeric character
Example: 22AAAAA0000A1Z4
Must have a non-alphanumeric boundary before and after

Examples: What It Will Match

Variation & Description Example(s)
Continuous GSTIN (15 chars) gst number 22AAAAA0000A1Z5
With punctuation before/after gst# 27ABCFE1234F2Z7,
gst number: 29AAACB2894G1ZJ
Different state codes (01–37) gst number 07ABCFE1234F1Z2
gst# 19ABCFE1234F1Z3
Entity code variations (13th char varies) gst number 22AAAAA0000A9Z1
gst# 27ABCFE1234FAZ6
Both keyword variants gst number 21AAAAA0000A1Z5
gst# 09ABCFE1234F1Z7

Government & National ID Patterns - United States

United States government and national ID patterns help detect widely used US identifiers such as Social Security Numbers (SSN), passports, and tax identifiers (EIN/ITIN), as well as regulated healthcare identifiers (e.g., MBI) and state-issued driver’s license numbers. These patterns rely on US-specific formats and mandatory keyword context (with defined proximity windows) to improve match precision and reduce false positives in general text.

US_SOCIAL_SECURITY_NUMBER

  • Regex enforces validity of SSN ranges, rejects impossible values like 000-12-3456 or 666-99-9999.
  • Hyphens/spaces are optional; detector accepts either.
Data Type US Social Security Number (SSN)
Keywords social security
ssn
social
security
SSA Number
social security number
social security #
social security#
social security no
soc sec
ss no
ssid
ssn#
SS#
SSNS
Proximity Must be within 50 bytes
Format AAA-GG-SSSS structure with the following validity constraints (hyphens/spaces between groups are optional):
Area Number (AAA): 001–665 (excludes 000 and 666) and 700–799
Group Number (GG): 01–99
Serial Number (SSSS): 0001–9999 (excludes 0000)

Examples: What It Will Match

Variation & Description Example(s)
Standard with hyphens SSN 123-45-6789
social security # 068-05-1120
Continuous digits (no separators) ssn# 123456789
social security 457889123
With spaces instead of hyphens social sec 123 45 6789
Keyword variants SSA Number 345-67-8901
ssid 223-45-9999
Non-alphanumeric boundaries social security number: 321-54-6789.

US_PASSPORT

  • A non-alphanumeric character must appear before and after the number to satisfy boundary requirements.
Data Type United States Passport Number
Keywords passport
passport#
passport #
passportid
passports
passportno
passport no
passport number
passportnumbers
passport numbers
Proximity Must be within 200 bytes
Format Either 1 letter + 8 digits (e.g., Z12345678) or 9 digits (e.g., 123456789). Must be a distinct token with non-alphanumeric boundaries before and after.

Examples: What It Will Match

Variation & Description Example(s)
Letter + 8 digits (current format) passport number Z12345678,
passportid A98765432.
9 digits only (older format) & keyword variants passport 123456789.
passport# B10293847;
passports C12345678,

US_EMPLOYER_IDENTIFICATION_NUMBER (EIN)

Standalone 9-digit numbers without a hyphen are not valid EINs under this pattern.

Data Type US Federal Tax Identifier (Employer Identification Number)
Keywords ein
employer identification number
employer identification#
ein#
Proximity Must be within 60 bytes
Format 9-digit EIN in NN-NNNNNNN format (always with a hyphen):
Prefix NN must fall within valid IRS-assigned ranges: 01–16, 20–27, 30–59, 60–68, 71–77, 80–89, 90–92, 94–99.
Standalone 9-digit numbers without a hyphen are NOT valid EINs under this pattern.

Examples: What It Will Match

Variation & Description Example(s)
Standard NN-NNNNNNN format with keyword variants ein 12-3456789
employer identification number 94-1234567
ein# 27-9876543

US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER (ITIN)

  • ITINs always begin with 9, distinguishing them from SSNs.
Data Type US Individual Taxpayer Identification Number (ITIN)
Keywords taxpayer
tax id
tax identification
itin
i.t.i.n
ssn
tin
social security
tax payer
itins
taxid
individual taxid
ITIN#
individual tax id#
individual tax id No
Proximity Must be within 50 bytes
Format 9-digit number beginning with 9.
The 4th digit must be 7 or 8 (issued in ranges 70–88, 90–92, 94–99).
Typically formatted as 9XX-7X-XXXX or 9XX-8X-XXXX.
May appear with hyphens, spaces, or no separators.
Must have non-alphanumeric boundary before and after.

Examples: What It Will Match

Variation & Description Example(s)
Hyphenated format (XXX-XX-XXXX) ITIN# 912-78-3456
individual tax id 987 83 4567
Continuous digits & boundary variants itin 912783456
taxpayer: 901-78-4567,
tax id=987 84 1234

 

US_MOBILE_NUMBER

  • Captures standard US 10-digit mobile numbers only.
  • Often paired with PII like name, address, or SSN, making it sensitive under HIPAA, PCI DSS, and GDPR.
Data Type United States Mobile/Cell Phone Number
Keywords cell number
Mobile Number
mobile
phone number
phone
contact number
Proximity Must be within 40 bytes
Format Valid US mobile numbers:
(XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, or XXX XXX XXXX
Area codes and exchanges must begin with digits 2–9
Total length = 10 digits

Examples: What It Will Match

Variation & Description Example(s)
Hyphen & dot separated Mobile Number: 415-678-9021
Phone 650.234.7788
Space separated & parentheses area code Contact number 213 555 7890
cell number (408) 333-4455

US_DEA_NUMBER

  • The detector enforces format only (2 letters + 7 digits). The internal DEA check-digit rule is NOT validated by this predefined pattern.
  • Numbers appear without spaces/hyphens; word boundaries ensure the token is not embedded in other text.
Data Type US DEA Registration Number (Drug Enforcement Administration)
Keywords DEA
DEA Registration Number
DEA Number
Proximity Must be within 60 bytes
Format 2 letters + 7 digits.
1st letter = registrant type (A, B, F, G, M, P, R).
2nd character = registrant's last-name initial (A–Z).
Final digit is a check digit.
Matched as a separate token (word boundaries).
Example: AB1234563.

Examples: What It Will Match

Variation & Description Example(s)
Standard 2-letter + 7-digit format DEA AB1234563
DEA Number MZ2468028
Keyword variants & punctuation DEA: BF1112226
DEA Registration Number AZ9301456
DEA Number -- GM3336669

US_MEDICARE_BENEFICIARY_IDENTIFIER (MBI)

  • Both with and without dashes are accepted. Consistent with official CMS rules for MBI format.
Data Type US Medicare Beneficiary Identifier (MBI)
Keywords mbi
mbi#
medicare beneficiary #
medicare beneficiary identifier
medicare beneficiary no
medicare beneficiary number
medicare beneficiary#
Proximity Must be within 60 bytes
Format Always 11 characters long (letters + digits).
Uses only approved letters, excludes S, L, O, I, B, Z.
Supports hyphenated (e.g., 3AD4-TR7-MC91) and continuous (e.g., 1EG4TE5MK73) formats.
Must be bounded by non-alphanumeric characters.

Examples: What It Will Match

Variation & Description Example(s)
Continuous & hyphenated formats mbi# 1EG4TE5MK73
Medicare Beneficiary Identifier: 3AD4-TR7-MC91
With punctuation & keyword after number (MBI) 9CN7-PT4-MD82,
Number 4EG7TN2MC91 is Medicare Beneficiary Number

US_DRIVERS_LICENSE_NUMBER (All States)

The US Drivers License patterns cover all 50 states plus Washington DC, each with the specific format used by that state's DMV. All state patterns share the same keywords and proximity setting:

Keywords (all states) DL
Driving
License
driving license
Proximity Must be within 200 bytes
Boundary Rule Most patterns require a non-alphanumeric (or word boundary) before/after the DL token.

US Drivers License - State-by-State Format Reference

Shared settings: Proximity = 200 bytes. Boundary = non-alphanumeric (or word boundary) before/after the DL token.

State Keywords Number Format Example(s)
Alabama DL, driving, license, driving license, Alabama 1–8 digits license 12345678,
Alaska DL, driving, license, driver license, Alaska 1–7 digits DL 7654321.
Arizona DL, driving, license, driving license Letter + 8 digits or 9 digits driving license A12345678, license 123456789.
Arkansas DL, driving, license, driving license 9 digits starting with 9 driving license 912345678.
California DL, driving, license, driving license 1 letter + 7 digits license A1234567,
Colorado DL, driving, license, driving license 1 letter + 3–6 digits, or 2 letters + 2–5 digits, or 2-3-4 numeric with hyphens DL C123456, license 12-345-6789.
Connecticut DL, driving, license, driving license 9 digits license 123456789.
Delaware DL, driving, license, driving license 1–7 digits DL 1234567,
District of Columbia DL, driving, license, driving license 7 or 9 digits license 1234567, driving 123456789.
Florida DL, driving, license, driving license 1 letter + 12 digits, grouped as L-###-###-##-###-# driving license F123-456-78-901-2
Georgia DL, driving, license, driver’s license 7–9 digits license 987654321.
Hawaii DL, driving, license, driver license 1 letter + 8 digits or ##-##-##### driver license H12345678,
Idaho DL, driving, license, driver’s license 2 letters + 6 digits + 1 letter or 9 digits DL AB123456C, license 123456789.
Illinois DL, driving, license, driver’s license Letter + 11 digits or L-###-####-#### driving S123-4567-8901, license T12345678901.
Indiana DL, driving, license, driver’s license Letter + 9 digits or 9 digits or ####-##-#### DL I123456789, license 1234-56-7890.
Iowa DL, driving, license, driver license ###LL#### or 9 digits driving 123AB4567, license 123456789.
Kansas DL, driving, license, driver license A9A9A or A##-##-#### or 9 digits license K1K1K, DL 123456789.
Kentucky DL, driving, license, driver’s license A##-###-### or 9 digits license K12-345-678, DL 123456789.
Louisiana DL, driving, license 1–9 digits driving 123456789.
Maine DL, driving, license, driver’s license 7–8 digits or 7 digits + 1 letter license 12345678, DL 1234567A.
Maryland DL, driving, license 1 letter + four groups of 3 digits driving M 123 456 789 012.
Massachusetts DL, driving, license 1 letter + 8–9 digits DL M12345678, license M123456789.
Michigan DL, driving, license 1 letter + 12 digits or 1 letter + four groups of 3 digits license A123456789012.
Minnesota DL, driving, license 1 letter + four groups of 3 digits DL M 123 456 789 012.
Mississippi DL, driving, license ###-##-#### (spaces/hyphens optional) driving 123-45-6789.
Missouri DL, driving, license Letter + 5–9 digits or ########AA or #########A license M123456, DL A123456r.
Montana DL, driving, license Letter + 8 digits or 9/13/14 digits license M12345678, DL 123456789.
Nebraska DL, driving, license 1 letter + 6–8 digits DL N123456, license N12345678.
Nevada DL, driving, license X######## or 9/10/12 digits driving X12345678, license 1234567890.
New Hampshire DL, driving, license MM(01–12) + LLL + YY + DD(01–31) + D (10 chars) license 08ABC25071
New Jersey DL, driving, license A + 14 digits or A####-#####-##### driving A12345678901234.
New Mexico DL, driving, license 8 or 9 digits DL 12345678, license 123456789.
New York DL, driving, license A####### or 9 digits or ###-###-### or 16 digits or 8 letters license A1234567, DL 123-456-789.
North Carolina DL, driving, license 1–12 digits DL 123456789012.
North Dakota DL, driving, license AAA##-#### or 9 digits license ABC-12-3456, driving 123456789.
Ohio DL, driving, license Letter + 4–8 digits or 2 letters + 3–7 digits or 8 digits DL A1234567, license AB12345.
Oklahoma DL, driving, license Letter + 9 digits or 9 digits license A123456789, driving 123456789.
Oregon DL, driving, license 1–9 digits DL 123456789.
Pennsylvania DL, driving, license ##-###-### (spaces/hyphens optional) license 12-345-678.
Rhode Island DL, driving, license 7 digits or Letter + 6 digits driving 1234567, license R123456.
South Carolina DL, driving, license 5–11 digits license 12345, DL 12345678901.
South Dakota DL, driving, license 6–10 digits or 12 digits driving 123456, license 123456789012.
Tennessee DL, driving, license 7–9 digits DL 1234567, license 123456789.
Texas DL, driving, license 7–8 digits driving 1234567, license 12345678.
Utah DL, driving, license 4–10 digits license 1234, DL 1234567890.
Vermont DL, driving, license 8 digits or 7 digits + letter (A/a) driving 12345678, license 1234567A.
Virginia DL, driving, license Letter + 2 digits + hyphen + 2 digits + hyphen + 4 digits DL V12-34-5678.
Washington DL, driving, license 3 letters + 2 chars + 2 letters + 3 digits + 2 alphanumeric (12 chars) license ABCXYEF123AB
West Virginia DL, driving, license 7 digits or 1–2 letters + 5–6 digits DL 1234567, license WV123456.
Wisconsin DL, driving, license 1 letter + 3 digits + two groups of 4 digits + 2 digits driving W123-4567-8901-23.
Wyoming DL, driving, license 9–10 digits or 6 digits + hyphen/space + 3 digits DL 123456789, license 123456-789.

Government & National ID Patterns - United Kingdom

United Kingdom government and national ID patterns help detect key UK identifiers such as National Insurance Numbers (NINO), NHS numbers, UK driving licence numbers, UK passports, and taxpayer references. These patterns use UK-specific formats along with mandatory keyword context and proximity windows to improve match precision and reduce false positives in general text.

UK_NATIONAL_INSURANCE_NUMBER (NINO)

  • Valid pattern: AA 99 99 99 A, with optional spaces/hyphens. Final letter is A/B/C/D/F/M/P.
  • Non-alphanumeric boundary required before and after the NINO to avoid matching inside longer tokens.
Data Type UK National Insurance Number (NINO)
Keywords national insurance number
national insurance contributions
protection act
insurance
insurance #
insurance no
insurance number
insurance application
medical insurance application
medical application
social insurance
medical attention
social security
NI Number
NI No
NI #
NI#
insurance#
insurancenumber
nationalinsurance#
national insurance #
nationalinsurancenumber
Proximity Must be within 200 bytes
Format Two letters (A–Z, with exclusions: first letter cannot be D/F/I/Q/U/V; second cannot be O) + six digits grouped as 2-2-2 + final letter (A/B/C/D/F/M/P).
Separators: optional spaces or hyphens between digit pairs; optional space before final letter.
Must have non-alphanumeric boundary before and after.
Example: AB 12 34 56 A.

Examples: What It Will Match

Variation & Description Example(s)
Standard spacing (AA 12 34 56 A) NI Number AB 12 34 56 A.
Continuous digits with final letter national insurance number JN123456C,
Hyphen separators (AA-12-34-56-A) insurance no JP-12-34-56 A:
Keyword variants & mixed case ni no GH 01 23 45 B.
Insurance # EQ 66 55 44 P,
social security CD 98 76 54 M;
With filler text within 200 bytes national insurance contributions (employee) TK 22 33 44 F.

UK_NATIONAL_HEALTH_SERVICE_NUMBER

  • Regex enforces structure but does not check whether the NHS number passes the official check digit validation.
Data Type UK NHS Patient Identifier
Keywords national health service
nhs
health services authority
health authority
patient id
patient identification
patient no
patient number
GP
Proximity Must be within 200 bytes
Format 10 digits total, usually grouped as 3-3-4 (e.g., 943 476 5919).
Groups may be separated by spaces.
Continuous 10-digit form is also valid.
Must have non-digit boundary before and after the number.

Examples: What It Will Match

Variation & Description Example(s)
Standard 3-3-4 spacing nhs 943 476 5919
patient number: 123 456 7890
Continuous 10-digit format health authority 9434765919,
patient id 9876543210.
Keyword variants national health service 123 456 7890
GP: 321 654 0987;

UK_DRIVERS_LICENSE_NUMBER

Data Type United Kingdom Driving Licence Number (DVLA format)
Keywords DL
driving
license
driving license
Proximity Must be within 200 bytes
Format Surname code: 5 characters (letters, with 9 used as padding)
Birth date/sex block: 6 digits (year decade, birth month 01-12 for male / 51-62 for female, birth day, year unit)
Initials: 2 characters (9 may be used as padding)
Check digit: 1 digit
Licence type suffix: 2 letters
Must have non-alphanumeric boundary before and after

Examples: What It Will Match

Variation & Description Example(s)
Male, standard month (01–12) driving license SMITH807153AB1XY,
Female, month +50 (51–62) license JONES962034CD2ZZ.
Surname / initials padded with '9' driving LEE99011237AB4GH:
DL MILLR159286A93QW,
With punctuation & alternate keywords driving license: BROWN012318DJ7KT;
driving TAYLO605012ZZ5AA.

UK_PASSPORT

Data Type United Kingdom Passport Number
Keywords passport#
passport #
passportid
passports
passportno
passport no
passport number
passportnumbers
passport numbers
Proximity Must be within 200 bytes
Format Either 1 letter followed by 8 digits (e.g., A12345678)
Or 9 digits (e.g., 123456789)
Must be bounded by non-alphanumeric characters before and after

Examples: What It Will Match

Variation & Description Example(s)
Letter + 8 digits (current UK format) passport number B12345678
passport# Z98765432,
9 digits (older format) & keyword variants passport no 123456789.
passportnumbers C23456789
Passport Number X12345678.

UK_TAXPAYER_REFERENCE

Data Type UK Taxpayer Reference (Self Assessment / TIN)
Keywords tax number
tax file
tax id
tax identification no
tax identification number
tax no#
tax no
tax registration number
taxid#
taxidno#
taxidnumber#
taxno#
taxnumber#
tax number #
taxnumber
tin id
tin no
tin#
Proximity Must be within 200 bytes
Format 10-digit numeric identifier
No letters or special symbols inside the number
Must be bounded by non-alphanumeric characters before and after

Examples: What It Will Match

Variation & Description Example(s)
10-digit format with keyword variants tax id 1234567890
tin no 9876543210
taxid# 8765432109
With punctuation & filler text tax number: 1029384756,
national tax file reference: 1122334455 was submitted.

Government & National ID Patterns - Australia

Australia government and national ID patterns help detect commonly used Australian identifiers such as Tax File Numbers (TFN), driver licence numbers, Medicare numbers, and Australian passport numbers. These patterns apply Australia-specific formats together with mandatory keyword context and proximity windows to improve match precision and reduce false positives in general text.

AUSTRALIA_TAX_FILE_NUMBER

  • Although the keyword list includes 'australian business number', this pattern expects TFN-length numbers (8–9 digits). Use a dedicated ABN pattern for 11-digit ABNs.
  • Keep the keyword before the number and within 100 bytes to satisfy detection.
Data Type Australia Tax File Number (TFN)
Keywords australian business number
tax file number
tfn number
tin number
Proximity Must be within 100 bytes
Format 8 or 9 digits total.
May be written as continuous digits, or grouped as 3-3-3 or 3-3-2 with spaces or hyphens.
Must have non-alphanumeric boundary before and after.
Example: 123 456 782 or 864-203-791.

Examples: What It Will Match

Variation & Description Example(s)
Continuous 9 digits tax file number 123456782.
Grouped with spaces (3-3-3) & hyphens (3-3-3) tfn number 579 246 813,
tin number 864-203-791:
Grouped 3-3-2 (8-digit style) & keyword variants tax file number 246 813 57;
tax file number: 345-678-129. tfn number -- 482 913 705,

AUSTRALIA_DRIVERS_LICENSE_NUMBER

Data Type Australia Driver Licence Number
Keywords Australia DriverLicence
Australia Driver Licence
Australia Driver'Licence
Australia Driver' Licence
Australia Driver'sLicence
Australia Driver's Licences
Australia Driver Lic
Australia DriversLic
Australia driving permit
Australia DriverLic#
Australia Driverlicence#
Australia Driver Lic#
Proximity Must be within 100 bytes
Format Three formats accepted: NNN[-]NNN[-]NNN (9 digits, optional spaces or hyphens, 3-3-3 groups); AA?NNNN..
(1–2 letters followed by 4–9 digits); or NNNNNNN..
(7–10 continuous digits).
Number must be a distinct token (not embedded in a longer word).

Examples: What It Will Match

Variation & Description Example(s)
9 digits grouped with spaces & hyphens (3-3-3) Australia Driver Licence 123 456 789
Australia Driver's Licence 123-456-789
Continuous 7–10 digits & letter prefix formats Australia driving permit 9876543210
Australia Driver Licence Q1234567
Australia DriversLic NS12345678
Keyword variants & hash-sign forms Australia Driver' Licence 456-123-789
Australia DriverLic# 321 654 987
Australia Driver Lic# 12345678

AUSTRALIA_MEDICARE_NUMBER

Data Type Australia Medicare Number
Keywords medical account
medicare account
Proximity Must be within 100 bytes
Format Starts with digits 2–6.
Can appear as 9 or 10 digits (optional check/issue digit at end).
May be grouped with spaces or hyphens as 4+5+optional 1 digit.
Must be a separate token (not embedded in text).

Examples: What It Will Match

Variation & Description Example(s)
Continuous 9 & 10 digits medicare account 246812345
medical account 3456789012
Grouped with spaces & hyphens (4+5+optional 1) medicare account 4567 89012 3
medicare account 5678-90123-4
medical account 2345-67890

AUSTRALIA_PASSPORT

Data Type Australia Passport Number
Keywords passport#
passport #
passportid
passports
passportno
passport no
passportnumber
passport number
passport details
Proximity Must be within 150 bytes
Format Either 1 letter from N, E, D, F, A, C, U, X followed by 7 digits
Or 2 letters starting with P (second letter from A, B, C, D, E, F, U, W, X, Z) followed by 7 digits
Always a distinct token (not embedded in another word)

Examples: What It Will Match

Variation & Description Example(s)
Single-letter prefix + 7 digits passport number N1234567
passportid E7654321
Two-letter prefix (starts with P) & keyword variants passport details PA1234567
passportno PW9876543
passport# D2345678

Healthcare & Medical Patterns

Healthcare and medical patterns help detect regulated health-related identifiers and clinical codes commonly found in claims, billing, and patient records. These patterns cover classification systems (for example, ICD codes) and other healthcare identifiers, using strict formats and mandatory keyword context to improve precision and reduce false positives in general text.

ICD10_CODE

  • Pattern accepts major codes (3 characters) as well as detailed subcodes (up to 7 characters with decimals).
  • Valid leading letters are A–T, V–Z (U is reserved and excluded).
  • Regex enforces format compliance but does not validate if a code is clinically meaningful.
Data Type ICD-10 Medical Classification Code
Keywords icd10
icd10_code
icd10 number
icd10 code
disease code
disease id
Proximity Must be within 70 bytes
Format A letter (A-Z, excluding U) followed by two digits (e.g., A10, C34)
Optional letter A or B in the third position (e.g., A01B)
Optional decimal point followed by up to 4 additional alphanumeric characters (e.g., E11.9, F32.1A)
Must be a distinct token (word boundaries required)

Examples: What It Will Match

Variation & Description Example(s)
Basic 3-char code & with A/B suffix icd10 A10 | disease code C34 | icd10_code A01B
With decimal subcodes & keyword variants icd10 number E11.9 | disease id F32.1A | icd10 G30.A | disease code B20

ICD9_CODE

  • Accepts ICD-9 numeric categories (3 digits), subcategories (with decimals), and special V/E codes.
  • Regex enforces format but does not validate whether the code is still in use (ICD-9 is largely retired but included for legacy data coverage).
Data Type ICD-9 Medical Classification Code
Keywords icd9_code
icd9
disease code
disease id
icd9 number
icd9 code
Proximity Must be within 70 bytes
Format V-codes: V + 2 digits, optional decimal + up to 2 digits (e.g., V01, V01.2, V12.34).
Numeric codes: 3 digits, optional decimal + up to 2 digits (e.g., 250, 250.0, 250.12).
E-codes (external cause): E + 3 digits, optional decimal + 1 digit (e.g., E800, E800.1).
Matched as standalone tokens.

Examples: What It Will Match

Variation & Description Example(s)
Numeric code (plain & with decimal) icd9 250 | disease code 401 | icd9 code 250.0 | icd9 number 401.9
V-code & E-code formats disease id V01 | icd9_code V12.34 | icd9 E800 | icd9 code E800.1

Personal Information (PII) Patterns

Personal Information (PII) patterns help detect commonly shared identifiers about an individual—such as names, dates of birth, email addresses, phone numbers, and addresses—as well as certain sensitive attributes. Because many PII elements can appear in everyday business text, these patterns rely on keyword context and proximity windows to improve precision and reduce false positives.

FULL_NAME

  • Keywords are mandatory and must precede the value (≤70 bytes).
  • Enforces capitalised segments and allows only letters plus - or ' inside names.
  • Non-Latin scripts or all-lowercase names will not match this predefined pattern.
Data Type Person's Full Name (Given / Middle / Surname with optional salutation & suffix)
Keywords full name
name
fullname
Proximity Must be within 70 bytes
Format Optional salutation: Mr, Ms, Miss, Mrs, Dr, Sir (with or without period)
First name: capitalized, up to 15 letters, may include a hyphenated segment
Optional middle name or initial
Last name: capitalized, up to 15 letters, may include hyphen (-) or apostrophe (')
Optional suffix: Jr., Sr., II, III, IV
Must be bounded by non-alphanumeric characters before and after

Examples: What It Will Match

Variation & Description Example(s)
First + Last (plain & with salutation) full name John Smith, | name Asha Menon. | fullname Mr. Rahul Verma: | full name Dr Anita Joseph,
Middle initial / middle name name Daniel K. Gupta. | full name Priya Meera Iyer,
Hyphenated last name & apostrophe in last name full name Maria Lopez-Garcia. | fullname Seán O'Connor. | name D'Arcy Patel,
With suffix & compact initials name Robert Singh, Jr. | full name Neeraj Kumar II. | fullname R Sharma.

LAST_NAME

  • Regex enforces capitalization at the start of each segment, reducing false positives.
  • Will not match all possible global last name formats (e.g., non-Latin alphabets).
Data Type Last Name / Surname
Keywords lastname
last name
Proximity Must be within 200 bytes
Format Begins with a capital letter, followed by up to 10 alphabetic characters
May contain an optional hyphen (-) or apostrophe (') for compound names
Optional second capitalised segment of up to 15 characters
Examples: Smith, O'Connor, Johnson-Smith

Examples: What It Will Match

Variation & Description Example(s)
Simple surname & shorter forms last name Smith
lastname Li
last name Kim
Hyphenated, apostrophe & two-part surnames last name Johnson-Smith
lastname O'Connor
last name McDonald

DATE

  • Keyword is mandatory and must precede the date (≤30 bytes).
  • Leading zeros in day/month are optional (9/3/2024 and 09/03/2024 both match).
  • This pattern focuses on format; impossible dates (e.g., 31/02/2024) may still match the format unless you add extra validation upstream.
Data Type Calendar Date
Keywords Date
Proximity Must be within 30 bytes
Format ISO style: YYYY-MM-DD / YYYY/MM/DD
US style: MM/DD/YYYY / MM-DD-YYYY
EU style: DD/MM/YYYY / DD-MM-YYYY
Month name first: Mon DD, YYYY / Month DD YYYY
Day first with month name: DD Mon YYYY
Year first with month name: YYYY Mon DD
2-digit year support: MM/DD/YY / DD/MM/YY
Separators: hyphen (-), slash (/), spaces, and optional commas
Matched as a word-boundary token

Examples: What It Will Match

Variation & Description Example(s)
ISO (YYYY-MM-DD / YYYY/MM/DD) Date 2024-09-30
Date: 2024/09/30
US (MM/DD/YYYY) & EU (DD/MM/YYYY) Date 09/30/2024
Date 30/09/2024
Month name first & day first with month name Date Sep 30, 2024
Date September 30 2024
Date 30 Sep 2024
Year first & 2-digit year Date 2024 Sep 30
Date 09/30/24
Date 30/09/24

DATE_OF_BIRTH

The keyword is mandatory and must appear before the date (≤30 bytes). Leading zeros in day/month are optional.

Data Type Date of Birth
Keywords DOB
date of birth
birthdate
Birth Date
Proximity Must be within 30 bytes
Format Same date format variations as the DATE pattern
ISO style: YYYY-MM-DD / YYYY/MM/DD
US style: MM/DD/YYYY / MM-DD-YYYY
EU style: DD/MM/YYYY / DD-MM-YYYY
Month name first: Mon DD, YYYY / Month DD YYYY
Day first with month name: DD Mon YYYY
Year first with month name: YYYY Mon DD
2-digit year support: MM/DD/YY / DD/MM/YY
Separators: hyphen (-), slash (/), spaces, and optional commas

Examples: What It Will Match

Variation & Description Example(s)
ISO (YYYY-MM-DD / YYYY/MM/DD) DOB 1985-07-15
Birth Date 2000/12/01
US (MM/DD/YYYY) & EU (DD/MM/YYYY) date of birth 07/15/1985
DOB 15/07/1985
Month name first & day first with month name Birth Date Jul 15, 1985
birthdate 01 December 2000
Year first & 2-digit year DOB 1985 Jul 15
Birth Date 15/07/85

EMAIL_ADDRESS

Validates common email formats with TLDs of 2–3 letters. Must start with a letter and include @ and valid domain.

Data Type Email Address
Keywords email
mail-id
e-mail
mail address
Proximity Must be within 100 bytes
Format Username: starts with a letter, up to 26 characters (letters, digits, period, underscore, hyphen)
Mandatory @ symbol
Domain: 1-25 characters (letters, digits, period, underscore, hyphen)
TLD: 2-3 letters
Must be bounded by non-alphanumeric characters before and after

Examples: What It Will Match

Variation & Description Example(s)
Standard format & special chars in username email john.doe@example.com
mail-id a_b-test@domain.in
e-mail user123@corp.org
Subdomain & different TLDs mail address jane@marketing.example.co
email raj@company.in
mail-id: a.brown@university.edu

GENDER

Accepted values are limited to Male, Female, M, F, others. Regex enforces non-alphanumeric boundaries so partial matches inside words are excluded.

Data Type Gender / Sex Identifier
Keywords gender
sex
Proximity Must be within 50 bytes
Format Values: Male, Female, M, F, and other gender identifiers
Must have non-alphanumeric characters before and after (space, colon, comma, etc.)
Ensures the value is not part of a larger word

Examples: What It Will Match

Variation & Description Example(s)
Full word values & single-letter abbreviation gender: Male
sex = Female
gender M
sex: F
Other category & punctuation boundaries gender: others
sex - Male
gender, Female

IP_ADDRESS

  • Only IPv4 addresses are matched (IPv6 is not included in this pattern).
  • Regex ensures octets are within valid range (0–255), avoiding false matches like 999.999.999.999.
Data Type IPv4 Address
Keywords ip
ip addr
ip address
internet protocol
ipv4
ip-address
IP number
Proximity Must be within 70 bytes
Format Four octets separated by dots.
Each octet: 0–255.
First octet must be 1–255 (not 0).
Must have non-numeric character before and after.
Example: 192.168.1.1, 10.0.0.255.

Examples: What It Will Match

Variation & Description Example(s)
Private, loopback & broadcast ranges ip address 192.168.0.1
ip 127.0.0.1
ipv4 255.255.255.255
Keyword variants & punctuation internet protocol: 172.16.254.1,
ip-address 8.8.8.8
IP number 10.0.0.255

STREET_ADDRESS

  • The detector expects a US ZIP or ZIP+4 at the end; non-US postcodes will not match this predefined pattern.
  • Hyphens/apostrophes within street names and house-number qualifiers are supported.
Data Type Street / Postal Address (US format)
Keywords Address
Street
street name
residential address
resident at
apartment location
home location
home address
house number
complex number
postal address
Proximity Must be within 100 bytes
Format House number: 1–5 digits, with optional sub-number (e.g., 12-3A)
Optional door/side/unit letter qualifier (e.g., B in 221B)
Street name: letters, hyphens, and apostrophes supported (e.g., O'Connell)
Street type indicator: Boulevard, Drive, Lane, Ave, Dr, Rd, Blvd, Plaza, Road, Strasse, Street, Walk, Way
City name followed by US ZIP code: 5-digit (e.g., 01103) or ZIP+4 (e.g., 02110-1234)
Ordinal street names supported when spelled out (e.g., Fifth Avenue), numeric form (e.g., 5th) is not matched
PO Box format (e.g., P.O. Box 123) is not reliably matched, requires street-style structure after the box number

Examples: What It Will Match

Variation & Description Example(s)
Ordinal street name (spelled out) residential address 100 Fifth Avenue, New York, 10011
Standard number + street + city + ZIP home address 55 Market Street, Springfield, 01103
House sub-number & street with apostrophe/hyphen house number 12-3A Elm Road, Denver, 80203
home location 221B O'Connell Avenue, Austin, 78701
Street type abbreviations Address 100 Maple Dr, Raleigh, 27601
ZIP+4 format home address 123 Main Road, Boston, 02110-1234

EYE_COLOR

Data Type Eye Colour / Eye Color
Keywords eye colour
eye color
eye col
Proximity Must be within 50 bytes
Format Valid values:
Amber
Black
Blue
Brown
Gray
Green
Hazel. Case-insensitive. Must be surrounded by non-word characters to avoid false matches inside larger strings.

Examples: What It Will Match

Variation & Description Example(s)
Standard names & abbreviated keyword Eye Colour: Blue
eye color Brown
eye col Green
Different cases & punctuation EYE COLOUR Amber
(eye color: gray)
eye colour -- black

HAIR_COLOR

Data Type Hair Colour / Hair Color
Keywords hair colour
hair col
Proximity Must be within 50 bytes
Format Case-insensitive. Must be surrounded by non-word characters to avoid false positives inside unrelated words.
Valid values:
Black
Blonde
Brown
Red
Gray
Bald
Other

Examples: What It Will Match

Variation & Description Example(s)
Standard names & abbreviated keyword Hair Colour: Brown
hair color Black
hair col Blonde
Different cases & punctuation HAIR COLOUR Red
(hair colour: bald)
hair col -- other

HEIGHT

Boundary conditions ensure valid heights only — not mismatched as random decimal numbers.

Data Type Physical Attribute Height
Keywords height
high
body length
bodylength
Proximity Must be within 30 bytes
Format Metric (meters): X.Y where X = 1–8, Y = 00–11.
Examples: 1.75, 2.05, 1.9.
Imperial (feet & inches): X' Y" or X'Y" formats.
Examples: 5'11", 6' 0", 5'09".
Surrounded by non-alphanumeric characters to avoid partial matches.

POLITICS

Sensitive PII category. Recommended at Medium or High severity to avoid excessive alerts in normal business communications.

Data Type Political Affiliation / Views
Keywords political views
political affiliation
political
politics
Proximity Must be within 50 bytes
Format Matches known political ideologies and their variants: republican, democrat, liberal, liberalism, conservative, conservatism, populism, populist, libertarianism, libertarian, socialist, socialism, communist, communism, fascist, fascism, statist, statism.

Source Code Detection Patterns

Versa DLP includes multiple specialized patterns to detect source code files and snippets across a wide range of programming languages. These are critical for preventing intellectual property leakage and protecting proprietary algorithms.

C_CODES

Data Type C / C++ Source Code
Keywords #include
#define
#ifndef
def
unsigned
Proximity Must be within 100 bytes
Pattern Detects C data types and constructs: const, double, union, struct, short, bool, int, uint, void, for, while, switch, do, return statements.

JAVA_CODES

Data Type Java Source Code Snippet
Keywords import
package
Proximity Must be within 100 bytes
Pattern Detects public class declarations, public or private method definitions, main or static method references, and Java data types (boolean, string, int, float, char).

Examples: What It Will Match

Variation & Description Example(s)
Import + public class + main + int import java.util.*; public class Demo { public static void main(String[] args) { int x = 10; } }
Package declaration + private static + String package myapp.core; public class HelloWorld { private static String msg = "Hi"; public static void main(String[] args) { System.out.println(msg); } }
Different data types (boolean, char, float) public class Flags { public static boolean isActive() { return true; } } public class Letters { private char grade = 'A'; private float rate = 3.14f; }

PHP_CODES

Data Type PHP Source Code Snippet
Keywords <?php (mandatory opening tag)
Proximity Must be within 100 bytes
Pattern Detects presence of common PHP constructs: namespace, use statements, print, echo, loops (for, while, do, foreach), control structures (switch), function/class declarations (function, class, extends), access modifiers (protected).

Examples: What It Will Match

Variation & Description Example(s)
Basic script & namespace/imports <?php echo "Hello, World!"; ?>
<?php namespace MyApp; use Library\Module; ?>
Function, class & loops <?php function add($a, $b) { return $a + $b; } ?> <?php class Dog extends Animal { protected $breed; } ?> <?php for ($i=0; $i<10; $i++) { echo $i; } ?>

JAVASCRIPT_CODES

  • The text/javascript anchor is required — without it, the predefined pattern will not fire.
  • Ensures closing tags and recognizable JS constructs are present to reduce false positives.
Data Type JavaScript / HTML Source Snippet
Keywords text/javascript (MIME type hint — mandatory)
Proximity Must be within 100 bytes
Pattern Detects HTML document fragments containing <html><head>...</head><body>...</body></html> scaffolding with at least one of <title>, <script>, \ and a JavaScript construct such as document, function(...), for(...), while(...), switch(...), do...while, or foreach.

Examples: What It Will Match

Variation & Description Example(s)
Inline script with function text/javascript ... <html><head><script> function greet(){ document.write('hi'); } </script></head><body></body></html>
Loop & switch constructs text/javascript ... <html><head><script> for (let i=0;i<3;i++){ } </script></head><body></body></html> text/javascript ... <html><head><script> switch(x){case 1:break;} while(x<5){x++;} </script></head>...</html>

ASM_CODES

  • Designed to catch source code leakage in tickets, emails, and chats — not compiled binaries.
  • Supports labels (name:), identifiers with underscores, hex immediates (0x...), and optional trailing comments starting with ;
Data Type Assembly-language Source Snippet (generic x86/x64-style tokens)
Keywords section '.text
section '.bss
section.data
_start:
start:
add
mov
inc
msg
len (case-sensitive as written)
Proximity Must be within 50 bytes
Pattern A single assembly line or label/instruction line containing a valid label/identifier, optional operands (registers, symbols, hex immediates like 0xNN), optional comma-separated arguments, and optional end-of-line comment starting with ;.

Examples: What It Will Match

Variation & Description Example(s)
Section declarations & entry labels section '.text
section '.bss
section.data
_start:
start:
Instructions with registers, operands & comments mov eax, 0x1
mov rdx, msg ; load addr
add ebx, [len]
inc rcx

AWK_CODES

Data Type AWK Source Snippets (inline / command-line)
Keywords awk
gawk
AWK (detected even when preceded by non-alphanumeric boundaries)
Proximity Must be within 40 bytes
Pattern Detects AWK command-line options (-f, -F, -v, -b, -c, etc.) and script constructs: BEGIN { ... }, END { ... }, field and record variables (NF, NR, FS, RS, OFS, ORS), and print/printf commands.

Examples: What It Will Match

Variation & Description Example(s)
Simple inline & with option flags awk '{print $1, $2}' file.txt
gawk -F: '{print NF, $1}' /etc/passwd
BEGIN/END blocks & field/record variables awk 'BEGIN { FS=","; OFS="|" } {print $1, $2}'
awk '{total+=$1} END {print total}' sales.txt awk '{printf "%s has %d fields\n", $0, NF}' input.txt

AWK_CODES_FILE

This detector is for script files, not one-liners. The shebang is mandatory and acts as the anchor.

Data Type AWK Script File (shebang-style)
Keywords #!/bin/awk -f or #!/usr/bin/awk -f (shebang — mandatory)
Proximity Must be within 40 bytes
Pattern A real AWK script file starting with the AWK shebang, containing a BEGIN { } or END { } program block AND at least one AWK construct: find, searchterm, replaceterm, if(...), print, function <name>(...), or break.

Examples: What It Will Match

Variation & Description Example(s)
Shebang + BEGIN block + print / if() #!/usr/bin/awk -f BEGIN { FS=","; print "start" }
#!/bin/awk -f BEGIN { if (NF > 2) print $1 }
Shebang + function & END block + search keywords

#!/usr/bin/awk -f function clean(x)

{ gsub(/+/, "", x); print x } #!/bin/awk -f BEGIN { find="foo"; replaceterm="bar"; print find }

NAWK_CODES

Data Type NAWK Script File (shebang-based)
Keywords #!/bin/nawk -f or #!/usr/bin/nawk -f (shebang — mandatory)
Proximity Must be within 40 bytes
Pattern A valid NAWK script file with the NAWK shebang containing BEGIN { } or END { } block plus at least one AWK construct: find, searchterm, replaceterm, if(...), print, function <name>(...), or break.

Examples: What It Will Match

Variation & Description Example(s)
Shebang + BEGIN block + print / conditional #!/usr/bin/nawk -f BEGIN { print "NAWK start" }
#!/bin/nawk -f BEGIN { if (NF > 3) print $2 }
Shebang + function & keyword references #!/usr/bin/nawk -f function clean(val) { gsub(/[ ]+/, "", val); print val } #!/usr/bin/nawk -f BEGIN { searchterm="foo"; replaceterm="bar"; print searchterm }

GIT_DIFF_CODE

Keyword 'diff --git' is non-optional — ensures context-specific accuracy and prevents false positives from unrelated 'diff' text.

Data Type Git Diff / Patch Snippet
Keywords diff --git (mandatory — indicates the start of a Git patch or diff block)
Proximity Must be within 40 bytes
Pattern Matches Git diff headers including diff --git a/file b/file, index <hash>..<hash>, context markers with --- (original file) and +++ (new file). Ensures presence of commit hashes, index references, and file path markers.

Examples: What It Will Match

Variation & Description Example(s)
Standard file change & multiple file diff diff --git a/file1.txt b/file1.txt index 6dcb09b..d9e2647 100644 --- a/file1.txt +++ b/file1.txt
diff --git a/app/main.py b/app/main.py index 91f2ab4..cdbe873 100644
Binary file diff & renamed file diff --git a/image.png b/image.png index e69de29..c22f8b7
diff --git a/old_name.c b/new_name.c index 123abcd..456efgh 100644

DIFF_CODE

  • Complements GIT_DIFF_CODE by catching non-Git diffs (legacy or standalone diff outputs).
  • Often used in UNIX/Linux patches where version control is not Git. Helps prevent leaks of patch data where sensitive lines (e.g., config, credentials) may be exposed.
Data Type Classic UNIX diff / patch snippet (non-Git unified diff format)
Keywords Line-based diff markers using patterns like 1
23c45 / 2a7 / 10d12 (traditional UNIX diff command outputs — not Git diffs)
Proximity Must be within 40 bytes
Pattern Identifies change indicators (a=addition, c=change, d=deletion) between line ranges. Validates context structure including < and > markers used in patch blocks. Looks for --- separator lines and lines starting with < (removals) or > (additions). Non-alphanumeric boundaries prevent false positives from random numbers.

Examples: What It Will Match

Variation & Description Example(s)
Change (c) & addition (a) 1,4c1,4 < text line old --- > text line new
7a8,10 > new line 1 > new line 2
Deletion (d) & mixed edit 10,12d7 < old line 1 < old line 2
3c3 < old content --- > new content

LUA_CODES_1 - Module Loading

Data Type Lua Source Code Snippet (module-style)
Keywords #!/usr/bin/lua
__lua__
.Class:new():register
Proximity Must be within 100 bytes
Pattern A Lua snippet containing two local assignments, with a require in the first: local <name> = require ... followed (nearby) by local <name> = ...

Examples: What It Will Match

Variation & Description Example(s)
Shebang + require + second local #!/usr/bin/lua local json = require "cjson" local data = { id = 1 }
__lua__ & .Class anchors with require + local -- __lua__ local http = require("socket.http") local body = ""
-- .Class:new():register local app = require "myapp.core" local router = app:router()

LUA_CODES_2 - Functions & Control Flow

Data Type Lua Source Code Snippet (functions / control flow)
Keywords #!/usr/bin/lua
__lua__
.Class:new():register
Proximity Must be within 100 bytes
Pattern Function definitions (_init(), _something()), general function name(...) ... end blocks, control flow (if ... then ... end), and for <var> ... do ... end loops.

Examples: What It Will Match

Variation & Description Example(s)
Shebang + _init() & anchor + private function #!/usr/bin/lua function _init() print("boot") end
-- __lua__ function _refresh() -- refresh cache end
Generic function, if/then & for loop function process(data) return data.id end
if enabled then print("on") end
for i=1,3 do print(i) end

LUA_CODES_3 - Initialization & Return Pattern

  • LUA_CODES_1 covers module loading (require, locals).
  • LUA_CODES_2 covers function/control flow (_init, if, for).
  • LUA_CODES_3 covers initialization methods / return true...end blocks.
  • All three together provide comprehensive Lua detection coverage.
Data Type Lua Source Code Snippet (specialized initialization / return pattern)
Keywords #!/usr/bin/lua
__lua__
.Class:new():register
Proximity Must be within 100 bytes
Pattern Function definitions containing :initialize( (OOP-style method initialization), or code blocks ending with return true ... end.

Examples: What It Will Match

Variation & Description Example(s)
Shebang + :initialize() & class-style anchor #!/usr/bin/lua function Player:initialize(name) self.name = name end
.Class:new():register function Engine:initialize(config) self.cfg = config end
Return true...end block -- __lua__ if ready then return true end

PYTHON_CODE_1

Targets the canonical Python entry-point guard. Does not match variations like __name__ == '__main__' with extra whitespace or formatting differences beyond quote style.

Data Type Python Source Code Snippet (entry-point detection)
Keywords import, def
Proximity 100 bytes
Pattern Detects the standard Python entry-point idiom: if __name__ == "__main__": (with single or double quotes). Fires when this construct appears near an import or def keyword.

Examples: What It Will Match

Variation & Description Example(s)
import + main guard (double quotes) import sys
def run(): pass
if __name__ == "__main__": run()
def + main guard (single quotes) def main(): pass
if __name__ == '__main__': main()

 

PYTHON_CODE_2

  • Broadest of the three Python detectors, covering multiple shebang styles, error handling (try/except), and branching (if/elif/else).
  • Complements PYTHON_CODE_1 (entry point) and PYTHON_CODE_3 (minimal). Together, the three patterns provide layered Python detection.
Data Type Python Source Code Snippet (control flow and error handling)
Keywords import, if, try:, #!/usr/bin/python, #!/usr/local/bin/python, #!/usr/bin/env python
Proximity 100 bytes
Pattern Detects Python control-flow and error-handling constructs: elif <condition>:, else:, except, import, and try: blocks. Fires when any of these appear near a keyword such as import, if, try:, or a Python shebang line.

Examples: What It Will Match

Variation & Description Example(s)
Shebang + try/except block #!/usr/bin/python
try:
import os
except ImportError:
pass
if/elif/else chain import sys
if x > 0:
print(x)
elif x == 0:
pass
else:
exit()
env-style shebang + import + except #!/usr/bin/env python
import json
try:
data = json.load(f)
except ValueError:
pass

PYTHON_CODE_3

A lighter variant of PYTHON_CODE_2, detecting only elif or try: near import/def. Use when minimal Python detection is sufficient.

Data Type Python Source Code Snippet (minimal control flow)
Keywords import, def
Proximity 100 bytes
Pattern Detects minimal Python control-flow constructs: elif <condition>: or try: blocks. Fires when either appears near an import or def keyword. A lighter variant of PYTHON_CODE_2.

Examples: What It Will Match

Variation & Description Example(s)
import + elif import math
if n < 0:
return None
elif n == 0:
return 1
def + try block def load(path):
try:
open(path)
except IOError:
pass

PERL_CODES

  • The Perl shebang (#!/usr/bin/perl or #!/usr/local/bin/perl) is mandatory. Without it, the pattern will not fire.
  • Covers both procedural Perl (sub, print, for) and module-style Perl (package, use, =head1 POD documentation).
Data Type Perl Source Code Snippet
Keywords #!/usr/bin/perl, #!/usr/local/bin/perl
Proximity 100 bytes
Pattern Detects common Perl constructs when preceded by a Perl shebang: =head1 (POD documentation), use/package declarations, =encoding, subroutine definitions (sub), print statements, and control-flow keywords (if, for, while, foreach).

Examples: What It Will Match

Variation & Description Example(s)
Shebang + use + subroutine + print #!/usr/bin/perl
use strict;
use warnings;
sub greet { print "Hello\n"; }
Shebang + package + control flow #!/usr/local/bin/perl
package MyApp;
if ($ready) { for my $i (1..10) { print $i; } }
Shebang + POD + foreach #!/usr/bin/perl
=head1 NAME
MyModule
=cut
foreach my $item (@list) { process($item); }

PASCAL_CODES

  • The keyword list is broad (program, unit, function, var, begin, etc.) but the value regex is narrow (end;), so both must be present to fire.
  • Designed for Object Pascal / Delphi as well as standard Pascal.
Data Type Pascal Source Code Snippet
Keywords program, unit, function, uses type, for, if, var, while, else, interface, class, Procedure, implementation, begin (each followed by identifiers and Pascal punctuation: ; , : ( ' { )
Proximity 100 bytes
Pattern Detects the Pascal end; terminator. Fires when end; appears near any of the Pascal keywords listed above. The keyword list covers program structure (program, unit, uses, interface, implementation, begin), declarations (var, function, Procedure, class), and control flow (for, if, while, else).

Examples: What It Will Match

Variation & Description Example(s)
program + begin/end program Hello;
begin
writeln('Hello');
end;
unit + interface + implementation unit Utils;
interface
function Add(a, b: Integer): Integer;
implementation
function Add(a, b: Integer): Integer;
begin Result := a + b; end;
end;
Procedure + var + control flow Procedure Process;
var i: Integer;
begin
for i := 1 to 10 do
if i mod 2 = 0 then writeln(i);
end;

TCL_CODES

Requires a Tcl anchor keyword (package require Tcl or #!/usr/bin/tclsh). Without it, common words like set, if, for will not trigger a match on their own.

Data Type Tcl Source Code Snippet
Keywords package require Tcl, #!/usr/bin/tclsh
Proximity 100 bytes
Pattern Detects common Tcl constructs: namespace, proc (procedure definitions), set (variable assignment), put (output), if, foreach, while, and for. Fires when any of these appear near a Tcl keyword anchor (package require Tcl or tclsh shebang).

Examples: What It Will Match

Variation & Description Example(s)
Shebang + proc + set + if #!/usr/bin/tclsh
proc greet {name} { set msg "Hello $name"; if {$name eq ""} { return } ; puts $msg }
package require + namespace + foreach package require Tcl 8.6
namespace eval myapp { foreach item $list { puts $item } }
Shebang + while + set #!/usr/bin/tclsh
set i 0
while {$i < 10} { puts $i; set i [expr {$i + 1}] }

 

Usage Guidance and Best Practices

This section provides practical guidance for using predefined patterns effectively in DLP Content Analysis policies. It covers recommended ways to combine patterns, tune severity thresholds, and validate detections to minimize false positives while maintaining strong protection for sensitive data across different industries and use cases.

Applying Predefined Patterns in Policies

Step Action
Step 1 Navigate to Security Policies > DLP > Content Analysis in the Versa Director.
Step 2 Create or edit a DLP rule and select the predefined pattern(s) to match against.
Step 3 Assign a Severity Level (Low / Medium / High / Critical) for each pattern.
Step 4 Optionally override the threshold using Severity Value if the default is too permissive or strict.
Step 5 Set the Action (Alert, Block, Quarantine, Redact, etc.) appropriate for your use case.
Step 6 Test in Alert-only (monitor) mode before switching to blocking actions.

Industry-Specific Pattern Recommendations

Industry Recommended Patterns
Financial Services CREDIT_CARD_NUMBER, CVV, PIN_NUMBER, SWIFT_CODE, US_BANK_ACCOUNT_NUMBER, US_BANK_ROUTING_NUMBER, US_SOCIAL_SECURITY_NUMBER, US_EMPLOYER_IDENTIFICATION_NUMBER, IBAN_CODE
Healthcare / Hospitals ICD9_CODE, ICD10_CODE, US_MEDICARE_BENEFICIARY_IDENTIFIER, UK_NATIONAL_HEALTH_SERVICE_NUMBER, US_DEA_NUMBER, DATE_OF_BIRTH, HEALTH_CONDITION, US_MEDICAL_ACCOUNT_NUMBER
Technology / Software C_CODES, JAVA_CODES, PYTHON_CODE_1/2/3, GIT_DIFF_CODE, AWS_CREDENTIALS, PASSWORD, AUTH_TOKEN, JSON_WEB_TOKEN, GCP_API_KEY, AZURE_AUTH_TOKEN
Retail / eCommerce CREDIT_CARD_NUMBER, CARD_HOLDER_NAME, EXPIRY_DATE, CVV, EMAIL_ADDRESS, STREET_ADDRESS, US_MOBILE_NUMBER, US_BANK_ACCOUNT_NUMBER
Government / Public Sector US_SOCIAL_SECURITY_NUMBER, US_PASSPORT, INDIA_AADHAAR_INDIVIDUAL, INDIA_PAN_INDIVIDUAL, UK_NATIONAL_INSURANCE_NUMBER, DOCUMENT_TYPE/LEGAL/*
Global Enterprise (Multi-region) Country-specific passport and national ID patterns + IBAN_CODE, SWIFT_CODE, EMAIL_ADDRESS, IP_ADDRESS, FULL_NAME, DATE_OF_BIRTH, SORT_CODE

Detection Context and Scope

This section explains where DLP inspection and predefined pattern matching occur, and what content is in scope for detection. It also outlines the primary threat types these patterns are designed to identify (for example, accidental data leakage, policy violations, or data exfiltration attempts) so you can align rules to your deployment and risk model.

Where DLP Patterns Are Applied

Context Description
Header Scans HTTP/email headers for sensitive metadata (e.g., Subject lines, custom headers).
Body Inspects the body of HTTP traffic, emails, chat messages, and web forms.
Attachment Analyzes file attachments in emails and uploads for sensitive content within files.
App ID Matches against application identifiers , enables app-specific DLP policies.
Device ID Ties detection to specific device identifiers for endpoint-aware policies.

Complete Pattern Reference

All predefined DLP patterns available in the Versa Security Pack, organized by category. Patterns marked '(ML-based)' use machine learning or file-type heuristics and do not require a keyword.

Payment & Financial

Pattern Name Category
CREDIT_CARD_NUMBER Payment Cards
CARD_HOLDER_NAME Payment Cards
CVV Payment Cards
PIN_NUMBER Payment Cards
EXPIRY_DATE Payment Cards
PRIMARY_ACCOUNT_NUMBER Payment Cards
SERVICE_CODE Payment Cards
CREDIT_CARD_TRACK_NUMBER Payment Cards
IBAN_CODE Banking
SWIFT_CODE Banking
US_BANK_ACCOUNT_NUMBER Banking - USA
US_BANK_ROUTING_NUMBER Banking - USA
US_BANK_ROUTING_FRACTION Banking - USA
US_BANK_NAME Banking - USA
UK_BANK_ACCOUNT_NUMBER Banking - UK
UK_BANK_NAME Banking - UK
AUSTRALIA_BANK_NUMBER Banking - Australia
AUSTRALIA_BANK_STATE_BRANCH Banking - Australia
SORT_CODE Banking - UK
AMERICAN_BANKERS_CUSIP_ID Finance

Government & National ID

Pattern Name Category
INDIA_AADHAAR_INDIVIDUAL India
INDIA_PAN_INDIVIDUAL India
INDIA_GST_INDIVIDUAL India
US_SOCIAL_SECURITY_NUMBER USA
US_PASSPORT USA
US_DRIVERS_LICENSE_NUMBER_ALL_STATES USA
US_EMPLOYER_IDENTIFICATION_NUMBER USA
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER USA
US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER USA
UK_NATIONAL_INSURANCE_NUMBER UK
UK_NATIONAL_HEALTH_SERVICE_NUMBER UK
UK_DRIVERS_LICENSE_NUMBER UK
UK_TAXPAYER_REFERENCE UK
UK_PASSPORT UK
AUSTRALIA_DRIVERS_LICENSE_NUMBER Australia
AUSTRALIA_MEDICARE_NUMBER Australia
AUSTRALIA_PASSPORT Australia
AUSTRALIA_TAX_FILE_NUMBER Australia
BRAZIL_CPF_NUMBER Brazil
BRAZIL_TIN_PERSONAL Brazil
BRAZIL_TIN_CORPORATE Brazil
CANADA_SOCIAL_INSURANCE_NUMBER Canada
CANADA_PASSPORT Canada
FRANCE_PASSPORT France
FRANCE_NIR France
FRANCE_CNI France
GERMANY_IDENTITY_CARD_NUMBER Germany
GERMANY_PASSPORT Germany
GERMANY_DRIVERS_LICENSE_NUMBER Germany
GERMANY_TAXPAYER_IDENTIFICATION_NUMBER Germany
KOREA_RRN South Korea
KOREA_PASSPORT South Korea
CHINA_RESIDENT_ID_NUMBER China
CHINA_PASSPORT China
JAPAN_INDIVIDUAL_NUMBER Japan
JAPAN_PASSPORT Japan
SINGAPORE_NATIONAL_REGISTRATION_ID_NUMBER Singapore
SINGAPORE_PASSPORT Singapore
IRELAND_PPSN Ireland
IRELAND_PASSPORT Ireland
SPAIN_DNI_NUMBER Spain
SPAIN_PASSPORT Spain
PASSPORT Global

Healthcare & Medical

Pattern Name Category
ICD10_CODE Healthcare
ICD9_CODE Healthcare
US_MEDICARE_BENEFICIARY_IDENTIFIER USA - Healthcare
US_DEA_NUMBER USA - Healthcare
US_MEDICAL_ACCOUNT_NUMBER USA - Healthcare
UK_NATIONAL_HEALTH_SERVICE_NUMBER UK - Healthcare
AUSTRALIA_MEDICARE_NUMBER Australia - Healthcare
HEALTH_CONDITION Healthcare
MEDICAL_TERM Healthcare
FDA_CODE Healthcare

Personal Information (PII)

Pattern Name Category
FULL_NAME PII - General
CARD_HOLDER_NAME PII - General
LAST_NAME PII - General
DATE_OF_BIRTH PII - General
DATE PII - General
AGE PII - General
GENDER PII - General
EMAIL_ADDRESS PII - General
PHONE_NUMBER PII - General
US_MOBILE_NUMBER PII - USA
STREET_ADDRESS PII - General
EYE_COLOUR PII - Physical
HAIR_COLOUR PII - Physical
HEIGHT PII - Physical
WEIGHT PII - Physical
RACE PII - Sensitive
RELIGION PII - Sensitive
POLITICAL PII - Sensitive
BIOMETRIC PII - Sensitive
ETHNIC_GROUP PII - Sensitive
CRIMINAL_RECORD PII - Sensitive
IP_ADDRESS Network
MAC_ADDRESS Network
DOMAIN_NAME Network
URL Network

Credentials & Security

Pattern Name Category
PASSWORD Credentials
AUTH_TOKEN Credentials
BASIC_AUTH_HEADER Credentials
ENCRYPTION_KEY Credentials
JSON_WEB_TOKEN Credentials
AWS_CREDENTIALS Credentials
AZURE_AUTH_TOKEN Credentials
GCP_API_KEY Credentials
GCP_CREDENTIALS Credentials
STORAGE_SIGNED_URL Credentials
STORAGE_SIGNED_POLICY_DOCUMENT Credentials
WEAK_PASSWORD_HASH Credentials
HTTP_COOKIE Credentials
BAD_FQDN Network / Threat
BAD_IP_ADDRESS Network / Threat

Source Code Detection

Pattern Name Category
C_CODES Source Code - C/C++
JAVA_CODES Source Code - Java
PHP_CODES Source Code - PHP
JAVASCRIPT_CODES Source Code - JavaScript
PYTHON_CODE_1 Source Code - Python
PYTHON_CODE_2 Source Code - Python
PYTHON_CODE_3 Source Code - Python
PERL_CODES Source Code - Perl
PASCAL_CODES Source Code - Pascal
AWK_CODES Source Code - AWK
AWK_CODES_FILE Source Code - AWK
NAWK_CODES Source Code - NAWK
LUA_CODES_1 Source Code - Lua
LUA_CODES_2 Source Code - Lua
LUA_CODES_3 Source Code - Lua
TCL_CODES Source Code - TCL
ASM_CODES Source Code - Assembly
GIT_DIFF_CODE Source Code - Diff
DIFF_CODE Source Code - Diff
DOCUMENT_TYPE/FINANCE/REGULATORY Document Type
DOCUMENT_TYPE/FINANCE/SEC_FILING Document Type
DOCUMENT_TYPE/HR/RESUME Document Type
DOCUMENT_TYPE/LEGAL/BRIEF Document Type
DOCUMENT_TYPE/LEGAL/COURT_ORDER Document Type
DOCUMENT_TYPE/LEGAL/LAW Document Type
DOCUMENT_TYPE/R&D/SOURCE_CODE Document Type
DOCUMENT_TYPE/R&D/PATENT Document Type
DOCUMENT_TYPE/R&D/DATABASE_BACKUP Document Type
GLOBAL_BAD_WORDS Content Policy
ALL_BASIC Catch-All
GENERIC_ID Catch-All
ADVERTISING_ID Device
IMEI_HARDWARE_ID Device

Glossary of Terms

Term Definition
DLP Data Loss Prevention. Technology to detect and prevent unauthorized transmission of sensitive data.
Predefined Pattern A pre-built detection rule combining keyword, regex, and proximity. Ready to use without custom configuration.
Spack Versa Security Pack. The bundle that delivers DLP patterns, signatures, and updates to Versa devices.
Regex Regular Expression. A pattern used to match character sequences in text.
Proximity Maximum byte distance between a keyword and a regex match for a detection to fire.
Severity Level Controls the minimum number of pattern matches required before a DLP rule triggers.
Severity Value An optional override that replaces the severity level's default match threshold.
PAN Primary Account Number. The main card number on a payment card.
CVV Card Verification Value. The 3 or 4 digit security code on a payment card.
SSN Social Security Number. A 9-digit US government ID number.
ITIN Individual Taxpayer Identification Number. A US tax processing number starting with 9.
EIN Employer Identification Number. A US business tax ID.
NINO National Insurance Number. UK social security equivalent.
NHS National Health Service — UK public health system; NHS numbers are patient identifiers.
TFN Tax File Number — Australia's individual tax identification number.
ICD International Classification of Diseases — standard system for coding medical diagnoses.
DEA Drug Enforcement Administration - DEA numbers identify US licensed prescribers.
MBI Medicare Beneficiary Identifier - US Medicare patient ID replacing SSN-based HIC.
BIC / SWIFT Bank Identifier Code - international standard identifying specific banks globally.
ML Machine Learning - some patterns use trained models instead of regex for detection.
PII Personally Identifiable Information - data that can identify a specific individual.
PHI Protected Health Information - health data covered by HIPAA and similar regulations.
MIP Microsoft Information Protection - Microsoft's sensitivity label and document classification system.
GSTIN Goods and Services Tax Identification Number - India's business tax registration number.
BSB Bank State Branch. Australian code identifying bank branches.

 

  • Was this article helpful?