DLP Predefined Patterns Quick-Reference
This guide is a detailed reference for all predefined DLP data patterns available in Versa Networks, including:
- Pattern definitions
- Matching logic
- Severity levels
- Supported formats
- Real-world match examples
Each pattern entry documents the detection logic, keywords, proximity rules, and format specifications used by the DLP engine to identify sensitive data in real time.
DLP Predefined Data Patterns
Data Loss Prevention (DLP) helps organizations detect and prevent accidental or malicious leaks of sensitive information across network traffic, cloud applications, and endpoints.
Versa Networks DLP uses predefined data patterns to identify common forms of sensitive data such as credit card numbers, government-issued IDs, healthcare codes, or source code without requiring administrators to manually build complex regular expressions. These patterns are bundled in the Versa Security Pack (Spack) and can be applied directly in DLP Content Analysis rules.
How Predefined Patterns Work
Every predefined pattern is built using three detection components that must all fire together for a match to be reported:
| Keyword (Mandatory) | One or more trigger words that must be present near the sensitive data. Case-insensitive. The keyword may appear before or after the pattern. Keyword also help reduce false positive with just matching patterns. Without a keyword match, the pattern will not fire even if the regex matches. Examples : "aadhaar", "credit card", "ssn", "passport". |
| Regex (Pattern) | A structured regular expression defining the format of the sensitive data to detect (e.g., a 12-digit Aadhaar number or a Luhn-validated credit card number). Operates within the proximity window of the keyword. |
| Proximity (Range Window) | Maximum distance allowed between the keyword and the regex match. Ranges from 25 to 200 bytes depending on the pattern. A match is only reported when both keyword and regex appear within this window. |
| Key Rule A pattern match fires ONLY when BOTH a matching keyword AND a matching regex are found within the defined proximity window. Keyword-only or Regex-only matches are insufficient. |
DLP Pattern Severity Levels
When you assign a predefined pattern to a DLP Content Analysis rule, you also assign a Severity Level. This controls how many times a pattern must match within the scanned content before the rule triggers an action.
| Severity Level | Default Threshold | Recommended Use |
|---|---|---|
| Low | 1 | Minimum — flags even a single occurrence. Best for highly sensitive identifiers (SSN, Aadhaar, Passport). |
| Medium | 10 | Moderate requires 10 matches. Good for data common in business documents (dates, phone numbers). |
| High | 20 | High-confidence triggers 20+ matches. Useful for patterns with higher false-positive risk. |
| Critical | 50 | Extreme volume — designed for bulk-exfiltration scenarios. |
Severity Value Override
If a Severity Value is explicitly defined in the rule configuration, it overrides the default threshold for that severity level.
| Variation & Description | Example(s) |
|---|---|
| High (default 20 matches) with Severity Value = 5 | Rule triggers after 5 matches, not 20 |
| Low (default 1 match) with Severity Value = 3 | Rule triggers after 3 matches, not 1 |
Best Practice
Use Low severity for highly sensitive individual identifiers. Use Medium or High for patterns prone to false positives. Always test in Alert-only mode before switching to Block or Quarantine.
Available DLP Actions
The following actions can be configured when a DLP predefined pattern match triggers a rule:
| Action | Description |
|---|---|
| Allow | Permits the traffic/content to pass without restriction. |
| Alert | Allows content but generates a log and alert for administrator review. |
| Block | Stops the transmission of the matched content. |
| Reject | Blocks and sends a rejection response to the sender. |
| Quarantine | Holds the content for administrator review before release or deletion. |
Detailed Pattern Descriptions
This section provides in-depth descriptions for every documented pattern, organized by category and country, including all match examples from Versa's internal documentation.
Payment Card Patterns
Payment card patterns cover all elements of PCI DSS-sensitive data: card numbers, cardholder names, CVV codes, PINs, expiry dates, and service codes. These patterns are global and not country specific.
CREDIT_CARD_NUMBER
- Continuous, spaced, and hyphenated PAN formats are supported.
- Example numbers shown are for testing. For production validation, use Luhn-valid PANs to reduce false positives.
- Some 6-series issuer ranges overlap (for example, RuPay, UnionPay, and Discover).
- The pattern reports a match only when a valid payment-card keyword is present within the proximity window.
| Data Type | Payment Card Primary Account Number (PAN) |
|---|---|
| Keywords | Rupay Visa mastercard, mastercards quicksilver capitalone debitcard united credit, creditcard amex american express, americanexpress master diner's club, diners club, dinersclub discover, discover card, discovercard, discover cards JCB BrandSmart card numbers CC No CC# credit card No CCN |
| Proximity | Must be within 100 bytes |
| Format | PANs of 13–19 digits, continuous or grouped with spaces or hyphens. Supported issuers: Visa: starts with 4, 16 digits Mastercard: 51–55 or 2221–2720, 16 digits Amex: 34/37, 15 digits, 4-6-5 grouping Discover/RuPay/UnionPay: 6011, 65xx, 64[4–9], 62x, 16 digits Diners Club: 300–305 / 360–369 / 380–389, 14 digits JCB: 3528–3589, 16 digits Maestro/Dankort: various 50/60-series prefixes, 16 digits |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Visa (16 digits, grouped or continuous) | Visa 4111 1111 1111 1111 credit card 4539-1488-0343-6467 |
| Mastercard (51–55 or 2221–2720, 16 digits) | mastercard 5555 5555 5555 4444 creditcard 2223-0000-4841-0010 |
| American Express (34/37, 15 digits, 4-6-5) | american express 3714 496353 98431 amex 3782-822463-10005 |
| Discover / 6-series (6011/65xx/64[4–9]/62x) | discover card 6011 1111 1111 1117 credit 6510-0000-0000-0004 |
| Diners Club (300–305 / 36 / 38–39, 14 digits) | diners club 3056 930902 5904 diner's Club 3852-000002 3237 |
| JCB (3528–3589, 16 digits) | JCB 3566 0020 2036 0505 card numbers 3530-1113-3330-0000 |
| Maestro / Dankort (common prefixes, 16 digits) | CC No 5019 7170 0101 3742 CC# 6759-0000-0000-0008 |
| Generic keyword variants | creditcard 4716 7361 1384 2732 credit card No 5372-1653-9838-6508 |
CARD_HOLDER_NAME
- Each name segment must be capitalized and can contain letters only, with an optional hyphen (-) or apostrophe (').
- Non-Latin scripts, all-lowercase names, or names containing digits/symbols (other than - or ') will not match this predefined pattern.
- Ensure a non-alphanumeric boundary immediately after the name (for example, a comma, period, or space) to satisfy the detector.
- Names with apostrophe as the second character (e.g., D'Arcy) are not reliably matched, the apostrophe is treated as a boundary character. Suffixes (Jr., Sr., II, III, IV) require a comma before them (e.g., 'Smith, Jr.').
| Data Type | Cardholder / Account Holder Name |
| Keywords | name account holder account name ac name |
| Proximity | Must be within 70 bytes |
| Format | Allowed structure (must be bounded by non-alphanumeric characters): Optional salutation: Mr/Ms/Mrs/Miss/Dr/Sir (with or without ".") First name: capitalized, up to 15 letters, optional hyphenated segment Optional middle name or initial Last name: capitalized; may include hyphen or apostrophe (e.g., O'Connor, Smith-Jones) Optional suffix: Jr., Sr., II, III, IV |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Simple first + last & with salutation | name John Smith, ac name Alice Brown. account holder Mr. Rahul Verma: account name Dr Anita Menon, |
| Middle initial / middle name | Name Daniel K Gupta. name Priya M Iyer, |
| Hyphenated last name & apostrophe in last name | ac name Maria Lopez-Garcia. name Sean O'Connor. account name Darcy Patel, |
| Suffix present & compact spacing | name Robert Singh, Jr. account name Neeraj Kumar, II. ac name R Sharma. |
CVV
- Matches only 3- or 4-digit numeric values.
- CVV alone (without keywords) will not trigger a match.
| Data Type | Card Verification Value (CVV/CVV2/CVC2/CSC/Security Code) |
| Keywords | cvv cvv2 cvc2 csc security code security number card identification number card verification issue number pin block |
| Proximity | Must be within 30 bytes |
| Format | A 3- or 4-digit numeric code: 3-digit: Visa/Mastercard (typically printed on the back of the card) 4-digit: American Express (typically printed on the front of the card) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard 3-digit CVV | cvv 123 security code 456 csc 321 |
| 4-digit CVV (e.g., Amex) | cvv2 9876 cvc2 number 1234 card identification number 678 |
| Alternate labels & symbol boundaries | security number 432 issue no 789 cvv# 654 |
PIN_NUMBER
Matches strictly numeric PINs only, no alphanumeric.
| Data Type | Personal Identification Number (PIN) |
| Keywords | PIN |
| Proximity | Must be within 30 bytes |
| Format | Strictly numeric PIN values: 4-digit PINs (e.g., ATM PINs) 6-digit PINs (banking/2FA applications) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 4-digit PIN | PIN 1234 PIN: 9876 |
| 6-digit PIN with delimiter | PIN 456789 PIN=7410 PIN -- 556677 |
EXPIRY_DATE
- Matches both short (MM/YY) and long (MM/YYYY) formats.
- Accepts month names (short or full) in English followed by year.
- Works in combination with CREDIT_CARD_NUMBER and CVV patterns to identify full payment card data exposure.
| Data Type | Payment Card Expiry Date |
| Keywords | expiration expire expires expiry validity valid thru valid expiry date date expiry |
| Proximity | Must be within 30 bytes |
| Format | Supported expiry date formats: Numeric: MM/YY, MM-YY, MM YYYY, MM-YYYY Textual: Jan 25, December/2029, Oct-24 |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard 2-digit year (MM/YY) | expiry 05/25 valid thru 12-24 |
| 4-digit year (MM/YYYY) | expiration date 07/2028 validity 03-2026 |
| Textual month + 2-digit year | expires Sep-25 expiry date October 29 |
| Textual month + 4-digit year | valid Dec 2029 expiry Jan 2031 |
| Mixed delimiters (/, -, space) | expire 04 25 valid thru 06-2027 |
SERVICE_CODE
| Data Type | Payment Card Service Code (ISO/IEC 7813) |
| Keywords | service code |
| Proximity | Must be within 30 bytes |
| Format | Service code formats: 3- or 4-digit numeric code Often found in payment card Track 1/Track 2 data alongside card number, expiry, and CVV |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 3-digit service code | service code 201 service code 601 |
| 4-digit with punctuation | service code 2200 service code: 250 service code - 7210 |
SWIFT_CODE
- Valid formats are 8 or 11 characters. The country code is validated against ISO 3166-1 alpha-2 country codes.
- Regex ensures format compliance but does not validate if the SWIFT code is actively registered.
| Data Type | SWIFT / BIC Bank Identifier Code (ISO 9362) |
| Keywords | swift# swiftcode swiftnumber swiftroutingnumber swift code swift number # swift routing number bic number bic code bic# bic # bank identifier code iso9362 iso 9362 international organization for standardization 9362 Organisation internationale de normalisation 9362 rapide # code SWIFT |
| Proximity | Must be within 70 bytes |
| Format | SWIFT/BIC structure: Length: 8 or 11 alphanumeric characters Bank Code: 4 letters Country Code: 2 letters (ISO 3166-1) Location Code: 2 characters Optional Branch Code: 3 characters Examples: HSBCGB2LXXX, DEUTDEFF500 Must be bounded by non-alphabetic characters |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 8-character SWIFT (no branch code) | swift code HDFCINBB bic number BOFAUS3N |
| 11-character SWIFT (with branch code) | swift# SBININBBXXX bic code CHASUS33NYC |
| Other keyword variants & punctuation | bank identifier code ICICINBB swift code: HSBCGB2L, iso9362 DEUTDEFF500 |
Banking Patterns
Banking patterns help identify common banking identifiers used in payment and funds-transfer workflows, including bank account numbers and routing/clearing codes (for example, US ABA routing numbers, UK sort codes, and AU BSB codes). These patterns are country-specific where applicable and are typically used together (e.g., account number + routing/sort/BSB) to reduce false positives and improve detection confidence.
US_BANK_ACCOUNT_NUMBER
- Not intended for routing numbers, IBAN, or SWIFT — those have separate patterns.
- Combine with routing number and account holder name patterns in a DLP rule for stricter detection.
| Data Type | US Bank Account Number (generic DDA/Savings/Checking) |
| Keywords | Acct (No./Number/#) Account (#/No.) Checking Account Checking Acct Number/#No. Bank Account Number/#/No. Bank Acct Number/#/No. Savings Account Number/No./# Savings Acct Number/#No. Debit Account Number/#/No. Debit Acct Number/#/No. |
| Proximity | Must be within 50 bytes |
| Format | A standalone numeric string (word-boundary delimited): Length: 4–17 digits Digits only (no spaces or hyphens inside the number) Format-based and issuer-agnostic — does not validate against a specific bank or checksum |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Short account (4–6 digits) | Acct No. 4729 | Account # 120345 |
| Typical 8–12 digits | Checking Account 00349872105 | Savings Acct Number 78543210987 |
| Long form up to 17 digits | Bank Account No. 12004567890123456 |
| Debit account phrasing | Debit Acct # 9988776655 |
| With punctuation/spacing | Account Number: 004567891234, Savings Account No. 5500123499. |
US_BANK_ROUTING_NUMBER
| Data Type | US Bank Routing Number (ABA / RTN / MICR) |
| Keywords | aba number aba# aba no. abarouting number/#/no. americanbankassociationrouting number/#/no. bankrouting number/#/no. routing number routing no. routing # routing transit number/no./# RTN MICR number/#/no. |
| Proximity | Must be within 30 bytes |
| Format | 9-digit ABA routing number with optional hyphens. First digit constrained to 0–3 or 6–8 per ABA specifications. Example: 021000021 1234-5678-9 |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 9 digits, with hyphens & MICR phrasing | Routing Number 123456789 ABA No. 1234-5678-9 MICR # 021200025 |
| Routing transit & RTN acronym | Routing Transit Number: 2110-0000-3 RTN 021000021 |
UK_BANK_ACCOUNT_NUMBER
| Data Type | UK Bank Account Number |
| Keywords | Acct (No./Number/#) Account (#/No.) Checking Account Bank Account (Number/#/No.) Savings Account (Number/No./#) Debit Account (Number/#/No.) |
| Proximity | Must be within 50 bytes |
| Format | 7–8 digit numeric string, word-boundary delimited. |
AUSTRALIA_BANK_NUMBER
| Data Type | Australian Bank Account Number |
| Keywords | Bank Account account number banking information correspondent bank bank details information account |
| Proximity | Must be within 60 bytes |
| Format | Supported formats: 6 digits (base value) Optional grouping: NNN-NNN Optional 4-digit extension is also supported (NNN-NNN-NNNN) Examples: 123-456, 123-456-7890 |
AUSTRALIA_BANK_STATE_BRANCH
| Data Type | Australian Bank State Branch (BSB) Code |
| Keywords | Bank state branch bsb bsd: bsb- |
| Proximity | Must be within 60 |
| Format | 6-digit code in NNN-NNN format. Example: 062-000. |
SORT_CODE
| Data Type | UK Bank Sort Code |
| Keywords | sort code sort_code |
| Proximity | Must be within 50 bytes |
| Format | Exactly 6 digits, in one of the following formats (word boundaries enforced): Continuous: NNNNNN Spaced: NN NN NN Hyphenated: NN-NN-NN |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous, spaced & hyphenated formats | sort code 123456 sortcode 12 34 56 sort_code 12-34-56 Sort Code: 12-34-56 |
Government & National ID Patterns - India
India government and national ID patterns help detect commonly used Indian identifiers such as Aadhaar, PAN, and GSTIN. These patterns are designed for high-confidence detection by combining country-specific formats with mandatory keyword context and strict boundary rules to reduce false positives in general text.
INDIA_AADHAAR_INDIVIDUAL
- First digit must be 2–9 (Aadhaar numbers starting with 0 or 1 are invalid).
- Uses the Verhoeff algorithm to detect valid numbers
| Data Type | India National ID (Aadhaar) |
| Keywords | aadhaar aadhaar card |
| Proximity | Must be within 30 bytes |
| Format | Aadhaar number formats: Valid 12-digit number starting with 2–9 Separators: may be continuous, or grouped with spaces or hyphens Grouping: XXXX XXXX XXXX or XXXX-XXXX-XXXX Must have a non-alphanumeric boundary before and after the number (e.g., space, :, #, =, ., comma, ), newline) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous 12 digits | aadhaar 745167489123. (aadhaar) 293412345678, |
| Grouped with spaces (NNNN NNNN NNNN) | aadhaar 2161 6729 3627, aadhaar card 2008 8057 0558: |
| Grouped with hyphens (NNNN-NNNN-NNNN) | aadhaar card 2934-1234-5678. aadhaar 8384-2795-9970, |
| Surrounded by non-alphanumeric boundaries | aadhaar: 875465433456 aadhaar#7451 2345 6789 aadhaar card = 9283 5567 4421 |
| Keywords before number within 30-byte limit | aadhaar (valid) 2008 8057 0558 |
INDIA_PAN_INDIVIDUAL
- Ensure a non-alphanumeric character after the PAN (e.g., . , : or space) to satisfy the boundary requirement.
- The 4th character is constrained to specific letters (A/B/C/F/G/H/L/J/P/T); other letters there will not match.
| Data Type | India Tax ID (PAN — Permanent Account Number) |
| Keywords | pan# pancard number pan number permanent account number |
| Proximity | Must be within 40 bytes |
| Format | 10 characters total (AAAAA9999A): Positions 1–3: letters A–Z. Position 4: one of A, B, C, F, G, H, L, J, P, T (entity type per PAN rules). Position 5: letter A–Z. Positions 6–9: digits 0–9. Position 10: letter A–Z. Must have non-alphanumeric boundary before and after. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard continuous PAN (5 letters + 4 digits + 1 letter) | pan number ABCPE1234F. |
| Different 4th-char class (from allowed set) | pancard number AABTF1234K, pan# CCHPL6789Q: |
| With punctuation/spacing boundaries | permanent account number: AAAPL1234C, |
| Short phrase before PAN (keyword within 40 bytes) | pan number (applicant) AAAAA9999Z. |
| Alternate keyword forms | pan# ABCPQ1234M, pancard number ZZZFG4321P. |
INDIA_GST_INDIVIDUAL
- Regex enforces state code (01–37), PAN-based core, entity code, Z, and check digit.
- Non-alphanumeric characters before and after GSTIN are required for a match.
| Data Type | India GSTIN (Goods and Services Tax Identification Number) |
| Keywords | gst# gst number |
| Proximity | Must be within 70 bytes |
| Format | GSTIN structure (15-character alphanumeric): State code: 2 digits (01–37) PAN-based core: 10 characters (5 letters + 4 digits + 1 letter) The embedded PAN follows standard PAN rules, the 4th letter must be one of A, B, C, F, G, H, J, L, P, T (entity type). Entity code: 1 character (alphanumeric 1–9 or A–Z) Fixed value: 1 character always ‘Z’ Check digit: 1 alphanumeric character Example: 22AAAAA0000A1Z4 Must have a non-alphanumeric boundary before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous GSTIN (15 chars) | gst number 22AAAAA0000A1Z5 |
| With punctuation before/after | gst# 27ABCFE1234F2Z7, gst number: 29AAACB2894G1ZJ |
| Different state codes (01–37) | gst number 07ABCFE1234F1Z2 gst# 19ABCFE1234F1Z3 |
| Entity code variations (13th char varies) | gst number 22AAAAA0000A9Z1 gst# 27ABCFE1234FAZ6 |
| Both keyword variants | gst number 21AAAAA0000A1Z5 gst# 09ABCFE1234F1Z7 |
Government & National ID Patterns - United States
United States government and national ID patterns help detect widely used US identifiers such as Social Security Numbers (SSN), passports, and tax identifiers (EIN/ITIN), as well as regulated healthcare identifiers (e.g., MBI) and state-issued driver’s license numbers. These patterns rely on US-specific formats and mandatory keyword context (with defined proximity windows) to improve match precision and reduce false positives in general text.
US_SOCIAL_SECURITY_NUMBER
- Regex enforces validity of SSN ranges, rejects impossible values like 000-12-3456 or 666-99-9999.
- Hyphens/spaces are optional; detector accepts either.
| Data Type | US Social Security Number (SSN) |
| Keywords | social security ssn social security SSA Number social security number social security # social security# social security no soc sec ss no ssid ssn# SS# SSNS |
| Proximity | Must be within 50 bytes |
| Format | AAA-GG-SSSS structure with the following validity constraints (hyphens/spaces between groups are optional): Area Number (AAA): 001–665 (excludes 000 and 666) and 700–799 Group Number (GG): 01–99 Serial Number (SSSS): 0001–9999 (excludes 0000) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard with hyphens | SSN 123-45-6789 social security # 068-05-1120 |
| Continuous digits (no separators) | ssn# 123456789 social security 457889123 |
| With spaces instead of hyphens | social sec 123 45 6789 |
| Keyword variants | SSA Number 345-67-8901 ssid 223-45-9999 |
| Non-alphanumeric boundaries | social security number: 321-54-6789. |
US_PASSPORT
- A non-alphanumeric character must appear before and after the number to satisfy boundary requirements.
| Data Type | United States Passport Number |
| Keywords | passport passport# passport # passportid passports passportno passport no passport number passportnumbers passport numbers |
| Proximity | Must be within 200 bytes |
| Format | Either 1 letter + 8 digits (e.g., Z12345678) or 9 digits (e.g., 123456789). Must be a distinct token with non-alphanumeric boundaries before and after. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Letter + 8 digits (current format) | passport number Z12345678, passportid A98765432. |
| 9 digits only (older format) & keyword variants | passport 123456789. passport# B10293847; passports C12345678, |
US_EMPLOYER_IDENTIFICATION_NUMBER (EIN)
Standalone 9-digit numbers without a hyphen are not valid EINs under this pattern.
| Data Type | US Federal Tax Identifier (Employer Identification Number) |
| Keywords | ein employer identification number employer identification# ein# |
| Proximity | Must be within 60 bytes |
| Format | 9-digit EIN in NN-NNNNNNN format (always with a hyphen): Prefix NN must fall within valid IRS-assigned ranges: 01–16, 20–27, 30–59, 60–68, 71–77, 80–89, 90–92, 94–99. Standalone 9-digit numbers without a hyphen are NOT valid EINs under this pattern. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard NN-NNNNNNN format with keyword variants | ein 12-3456789 employer identification number 94-1234567 ein# 27-9876543 |
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER (ITIN)
- ITINs always begin with 9, distinguishing them from SSNs.
| Data Type | US Individual Taxpayer Identification Number (ITIN) |
| Keywords | taxpayer tax id tax identification itin i.t.i.n ssn tin social security tax payer itins taxid individual taxid ITIN# individual tax id# individual tax id No |
| Proximity | Must be within 50 bytes |
| Format | 9-digit number beginning with 9. The 4th digit must be 7 or 8 (issued in ranges 70–88, 90–92, 94–99). Typically formatted as 9XX-7X-XXXX or 9XX-8X-XXXX. May appear with hyphens, spaces, or no separators. Must have non-alphanumeric boundary before and after. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Hyphenated format (XXX-XX-XXXX) | ITIN# 912-78-3456 individual tax id 987 83 4567 |
| Continuous digits & boundary variants | itin 912783456 taxpayer: 901-78-4567, tax id=987 84 1234 |
US_MOBILE_NUMBER
- Captures standard US 10-digit mobile numbers only.
- Often paired with PII like name, address, or SSN, making it sensitive under HIPAA, PCI DSS, and GDPR.
| Data Type | United States Mobile/Cell Phone Number |
| Keywords | cell number Mobile Number mobile phone number phone contact number |
| Proximity | Must be within 40 bytes |
| Format | Valid US mobile numbers: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, or XXX XXX XXXX Area codes and exchanges must begin with digits 2–9 Total length = 10 digits |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Hyphen & dot separated | Mobile Number: 415-678-9021 Phone 650.234.7788 |
| Space separated & parentheses area code | Contact number 213 555 7890 cell number (408) 333-4455 |
US_DEA_NUMBER
- The detector enforces format only (2 letters + 7 digits). The internal DEA check-digit rule is NOT validated by this predefined pattern.
- Numbers appear without spaces/hyphens; word boundaries ensure the token is not embedded in other text.
| Data Type | US DEA Registration Number (Drug Enforcement Administration) |
| Keywords | DEA DEA Registration Number DEA Number |
| Proximity | Must be within 60 bytes |
| Format | 2 letters + 7 digits. 1st letter = registrant type (A, B, F, G, M, P, R). 2nd character = registrant's last-name initial (A–Z). Final digit is a check digit. Matched as a separate token (word boundaries). Example: AB1234563. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard 2-letter + 7-digit format | DEA AB1234563 DEA Number MZ2468028 |
| Keyword variants & punctuation | DEA: BF1112226 DEA Registration Number AZ9301456 DEA Number -- GM3336669 |
US_MEDICARE_BENEFICIARY_IDENTIFIER (MBI)
- Both with and without dashes are accepted. Consistent with official CMS rules for MBI format.
| Data Type | US Medicare Beneficiary Identifier (MBI) |
| Keywords | mbi mbi# medicare beneficiary # medicare beneficiary identifier medicare beneficiary no medicare beneficiary number medicare beneficiary# |
| Proximity | Must be within 60 bytes |
| Format | Always 11 characters long (letters + digits). Uses only approved letters, excludes S, L, O, I, B, Z. Supports hyphenated (e.g., 3AD4-TR7-MC91) and continuous (e.g., 1EG4TE5MK73) formats. Must be bounded by non-alphanumeric characters. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous & hyphenated formats | mbi# 1EG4TE5MK73 Medicare Beneficiary Identifier: 3AD4-TR7-MC91 |
| With punctuation & keyword after number | (MBI) 9CN7-PT4-MD82, Number 4EG7TN2MC91 is Medicare Beneficiary Number |
US_DRIVERS_LICENSE_NUMBER (All States)
The US Drivers License patterns cover all 50 states plus Washington DC, each with the specific format used by that state's DMV. All state patterns share the same keywords and proximity setting:
| Keywords (all states) | DL Driving License driving license |
| Proximity | Must be within 200 bytes |
| Boundary Rule | Most patterns require a non-alphanumeric (or word boundary) before/after the DL token. |
US Drivers License - State-by-State Format Reference
Shared settings: Proximity = 200 bytes. Boundary = non-alphanumeric (or word boundary) before/after the DL token.
| State | Keywords | Number Format | Example(s) |
|---|---|---|---|
| Alabama | DL, driving, license, driving license, Alabama | 1–8 digits | license 12345678, |
| Alaska | DL, driving, license, driver license, Alaska | 1–7 digits | DL 7654321. |
| Arizona | DL, driving, license, driving license | Letter + 8 digits or 9 digits | driving license A12345678, license 123456789. |
| Arkansas | DL, driving, license, driving license | 9 digits starting with 9 | driving license 912345678. |
| California | DL, driving, license, driving license | 1 letter + 7 digits | license A1234567, |
| Colorado | DL, driving, license, driving license | 1 letter + 3–6 digits, or 2 letters + 2–5 digits, or 2-3-4 numeric with hyphens | DL C123456, license 12-345-6789. |
| Connecticut | DL, driving, license, driving license | 9 digits | license 123456789. |
| Delaware | DL, driving, license, driving license | 1–7 digits | DL 1234567, |
| District of Columbia | DL, driving, license, driving license | 7 or 9 digits | license 1234567, driving 123456789. |
| Florida | DL, driving, license, driving license | 1 letter + 12 digits, grouped as L-###-###-##-###-# | driving license F123-456-78-901-2 |
| Georgia | DL, driving, license, driver’s license | 7–9 digits | license 987654321. |
| Hawaii | DL, driving, license, driver license | 1 letter + 8 digits or ##-##-##### | driver license H12345678, |
| Idaho | DL, driving, license, driver’s license | 2 letters + 6 digits + 1 letter or 9 digits | DL AB123456C, license 123456789. |
| Illinois | DL, driving, license, driver’s license | Letter + 11 digits or L-###-####-#### | driving S123-4567-8901, license T12345678901. |
| Indiana | DL, driving, license, driver’s license | Letter + 9 digits or 9 digits or ####-##-#### | DL I123456789, license 1234-56-7890. |
| Iowa | DL, driving, license, driver license | ###LL#### or 9 digits | driving 123AB4567, license 123456789. |
| Kansas | DL, driving, license, driver license | A9A9A or A##-##-#### or 9 digits | license K1K1K, DL 123456789. |
| Kentucky | DL, driving, license, driver’s license | A##-###-### or 9 digits | license K12-345-678, DL 123456789. |
| Louisiana | DL, driving, license | 1–9 digits | driving 123456789. |
| Maine | DL, driving, license, driver’s license | 7–8 digits or 7 digits + 1 letter | license 12345678, DL 1234567A. |
| Maryland | DL, driving, license | 1 letter + four groups of 3 digits | driving M 123 456 789 012. |
| Massachusetts | DL, driving, license | 1 letter + 8–9 digits | DL M12345678, license M123456789. |
| Michigan | DL, driving, license | 1 letter + 12 digits or 1 letter + four groups of 3 digits | license A123456789012. |
| Minnesota | DL, driving, license | 1 letter + four groups of 3 digits | DL M 123 456 789 012. |
| Mississippi | DL, driving, license | ###-##-#### (spaces/hyphens optional) | driving 123-45-6789. |
| Missouri | DL, driving, license | Letter + 5–9 digits or ########AA or #########A | license M123456, DL A123456r. |
| Montana | DL, driving, license | Letter + 8 digits or 9/13/14 digits | license M12345678, DL 123456789. |
| Nebraska | DL, driving, license | 1 letter + 6–8 digits | DL N123456, license N12345678. |
| Nevada | DL, driving, license | X######## or 9/10/12 digits | driving X12345678, license 1234567890. |
| New Hampshire | DL, driving, license | MM(01–12) + LLL + YY + DD(01–31) + D (10 chars) | license 08ABC25071 |
| New Jersey | DL, driving, license | A + 14 digits or A####-#####-##### | driving A12345678901234. |
| New Mexico | DL, driving, license | 8 or 9 digits | DL 12345678, license 123456789. |
| New York | DL, driving, license | A####### or 9 digits or ###-###-### or 16 digits or 8 letters | license A1234567, DL 123-456-789. |
| North Carolina | DL, driving, license | 1–12 digits | DL 123456789012. |
| North Dakota | DL, driving, license | AAA##-#### or 9 digits | license ABC-12-3456, driving 123456789. |
| Ohio | DL, driving, license | Letter + 4–8 digits or 2 letters + 3–7 digits or 8 digits | DL A1234567, license AB12345. |
| Oklahoma | DL, driving, license | Letter + 9 digits or 9 digits | license A123456789, driving 123456789. |
| Oregon | DL, driving, license | 1–9 digits | DL 123456789. |
| Pennsylvania | DL, driving, license | ##-###-### (spaces/hyphens optional) | license 12-345-678. |
| Rhode Island | DL, driving, license | 7 digits or Letter + 6 digits | driving 1234567, license R123456. |
| South Carolina | DL, driving, license | 5–11 digits | license 12345, DL 12345678901. |
| South Dakota | DL, driving, license | 6–10 digits or 12 digits | driving 123456, license 123456789012. |
| Tennessee | DL, driving, license | 7–9 digits | DL 1234567, license 123456789. |
| Texas | DL, driving, license | 7–8 digits | driving 1234567, license 12345678. |
| Utah | DL, driving, license | 4–10 digits | license 1234, DL 1234567890. |
| Vermont | DL, driving, license | 8 digits or 7 digits + letter (A/a) | driving 12345678, license 1234567A. |
| Virginia | DL, driving, license | Letter + 2 digits + hyphen + 2 digits + hyphen + 4 digits | DL V12-34-5678. |
| Washington | DL, driving, license | 3 letters + 2 chars + 2 letters + 3 digits + 2 alphanumeric (12 chars) | license ABCXYEF123AB |
| West Virginia | DL, driving, license | 7 digits or 1–2 letters + 5–6 digits | DL 1234567, license WV123456. |
| Wisconsin | DL, driving, license | 1 letter + 3 digits + two groups of 4 digits + 2 digits | driving W123-4567-8901-23. |
| Wyoming | DL, driving, license | 9–10 digits or 6 digits + hyphen/space + 3 digits | DL 123456789, license 123456-789. |
Government & National ID Patterns - United Kingdom
United Kingdom government and national ID patterns help detect key UK identifiers such as National Insurance Numbers (NINO), NHS numbers, UK driving licence numbers, UK passports, and taxpayer references. These patterns use UK-specific formats along with mandatory keyword context and proximity windows to improve match precision and reduce false positives in general text.
UK_NATIONAL_INSURANCE_NUMBER (NINO)
- Valid pattern: AA 99 99 99 A, with optional spaces/hyphens. Final letter is A/B/C/D/F/M/P.
- Non-alphanumeric boundary required before and after the NINO to avoid matching inside longer tokens.
| Data Type | UK National Insurance Number (NINO) |
| Keywords | national insurance number national insurance contributions protection act insurance insurance # insurance no insurance number insurance application medical insurance application medical application social insurance medical attention social security NI Number NI No NI # NI# insurance# insurancenumber nationalinsurance# national insurance # nationalinsurancenumber |
| Proximity | Must be within 200 bytes |
| Format | Two letters (A–Z, with exclusions: first letter cannot be D/F/I/Q/U/V; second cannot be O) + six digits grouped as 2-2-2 + final letter (A/B/C/D/F/M/P). Separators: optional spaces or hyphens between digit pairs; optional space before final letter. Must have non-alphanumeric boundary before and after. Example: AB 12 34 56 A. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard spacing (AA 12 34 56 A) | NI Number AB 12 34 56 A. |
| Continuous digits with final letter | national insurance number JN123456C, |
| Hyphen separators (AA-12-34-56-A) | insurance no JP-12-34-56 A: |
| Keyword variants & mixed case | ni no GH 01 23 45 B. Insurance # EQ 66 55 44 P, social security CD 98 76 54 M; |
| With filler text within 200 bytes | national insurance contributions (employee) TK 22 33 44 F. |
UK_NATIONAL_HEALTH_SERVICE_NUMBER
- Regex enforces structure but does not check whether the NHS number passes the official check digit validation.
| Data Type | UK NHS Patient Identifier |
| Keywords | national health service nhs health services authority health authority patient id patient identification patient no patient number GP |
| Proximity | Must be within 200 bytes |
| Format | 10 digits total, usually grouped as 3-3-4 (e.g., 943 476 5919). Groups may be separated by spaces. Continuous 10-digit form is also valid. Must have non-digit boundary before and after the number. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard 3-3-4 spacing | nhs 943 476 5919 patient number: 123 456 7890 |
| Continuous 10-digit format | health authority 9434765919, patient id 9876543210. |
| Keyword variants | national health service 123 456 7890 GP: 321 654 0987; |
UK_DRIVERS_LICENSE_NUMBER
| Data Type | United Kingdom Driving Licence Number (DVLA format) |
| Keywords | DL driving license driving license |
| Proximity | Must be within 200 bytes |
| Format | Surname code: 5 characters (letters, with 9 used as padding) Birth date/sex block: 6 digits (year decade, birth month 01-12 for male / 51-62 for female, birth day, year unit) Initials: 2 characters (9 may be used as padding) Check digit: 1 digit Licence type suffix: 2 letters Must have non-alphanumeric boundary before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Male, standard month (01–12) | driving license SMITH807153AB1XY, |
| Female, month +50 (51–62) | license JONES962034CD2ZZ. |
| Surname / initials padded with '9' | driving LEE99011237AB4GH: DL MILLR159286A93QW, |
| With punctuation & alternate keywords | driving license: BROWN012318DJ7KT; driving TAYLO605012ZZ5AA. |
UK_PASSPORT
| Data Type | United Kingdom Passport Number |
| Keywords | passport# passport # passportid passports passportno passport no passport number passportnumbers passport numbers |
| Proximity | Must be within 200 bytes |
| Format | Either 1 letter followed by 8 digits (e.g., A12345678) Or 9 digits (e.g., 123456789) Must be bounded by non-alphanumeric characters before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Letter + 8 digits (current UK format) | passport number B12345678 passport# Z98765432, |
| 9 digits (older format) & keyword variants | passport no 123456789. passportnumbers C23456789 Passport Number X12345678. |
UK_TAXPAYER_REFERENCE
| Data Type | UK Taxpayer Reference (Self Assessment / TIN) |
| Keywords | tax number tax file tax id tax identification no tax identification number tax no# tax no tax registration number taxid# taxidno# taxidnumber# taxno# taxnumber# tax number # taxnumber tin id tin no tin# |
| Proximity | Must be within 200 bytes |
| Format | 10-digit numeric identifier No letters or special symbols inside the number Must be bounded by non-alphanumeric characters before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 10-digit format with keyword variants | tax id 1234567890 tin no 9876543210 taxid# 8765432109 |
| With punctuation & filler text | tax number: 1029384756, national tax file reference: 1122334455 was submitted. |
Government & National ID Patterns - Australia
Australia government and national ID patterns help detect commonly used Australian identifiers such as Tax File Numbers (TFN), driver licence numbers, Medicare numbers, and Australian passport numbers. These patterns apply Australia-specific formats together with mandatory keyword context and proximity windows to improve match precision and reduce false positives in general text.
AUSTRALIA_TAX_FILE_NUMBER
- Although the keyword list includes 'australian business number', this pattern expects TFN-length numbers (8–9 digits). Use a dedicated ABN pattern for 11-digit ABNs.
- Keep the keyword before the number and within 100 bytes to satisfy detection.
| Data Type | Australia Tax File Number (TFN) |
| Keywords | australian business number tax file number tfn number tin number |
| Proximity | Must be within 100 bytes |
| Format | 8 or 9 digits total. May be written as continuous digits, or grouped as 3-3-3 or 3-3-2 with spaces or hyphens. Must have non-alphanumeric boundary before and after. Example: 123 456 782 or 864-203-791. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous 9 digits | tax file number 123456782. |
| Grouped with spaces (3-3-3) & hyphens (3-3-3) | tfn number 579 246 813, tin number 864-203-791: |
| Grouped 3-3-2 (8-digit style) & keyword variants | tax file number 246 813 57; tax file number: 345-678-129. tfn number -- 482 913 705, |
AUSTRALIA_DRIVERS_LICENSE_NUMBER
| Data Type | Australia Driver Licence Number |
| Keywords | Australia DriverLicence Australia Driver Licence Australia Driver'Licence Australia Driver' Licence Australia Driver'sLicence Australia Driver's Licences Australia Driver Lic Australia DriversLic Australia driving permit Australia DriverLic# Australia Driverlicence# Australia Driver Lic# |
| Proximity | Must be within 100 bytes |
| Format | Three formats accepted: NNN[-]NNN[-]NNN (9 digits, optional spaces or hyphens, 3-3-3 groups); AA?NNNN.. (1–2 letters followed by 4–9 digits); or NNNNNNN.. (7–10 continuous digits). Number must be a distinct token (not embedded in a longer word). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| 9 digits grouped with spaces & hyphens (3-3-3) | Australia Driver Licence 123 456 789 Australia Driver's Licence 123-456-789 |
| Continuous 7–10 digits & letter prefix formats | Australia driving permit 9876543210 Australia Driver Licence Q1234567 Australia DriversLic NS12345678 |
| Keyword variants & hash-sign forms | Australia Driver' Licence 456-123-789 Australia DriverLic# 321 654 987 Australia Driver Lic# 12345678 |
AUSTRALIA_MEDICARE_NUMBER
| Data Type | Australia Medicare Number |
| Keywords | medical account medicare account |
| Proximity | Must be within 100 bytes |
| Format | Starts with digits 2–6. Can appear as 9 or 10 digits (optional check/issue digit at end). May be grouped with spaces or hyphens as 4+5+optional 1 digit. Must be a separate token (not embedded in text). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Continuous 9 & 10 digits | medicare account 246812345 medical account 3456789012 |
| Grouped with spaces & hyphens (4+5+optional 1) | medicare account 4567 89012 3 medicare account 5678-90123-4 medical account 2345-67890 |
AUSTRALIA_PASSPORT
| Data Type | Australia Passport Number |
| Keywords | passport# passport # passportid passports passportno passport no passportnumber passport number passport details |
| Proximity | Must be within 150 bytes |
| Format | Either 1 letter from N, E, D, F, A, C, U, X followed by 7 digits Or 2 letters starting with P (second letter from A, B, C, D, E, F, U, W, X, Z) followed by 7 digits Always a distinct token (not embedded in another word) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Single-letter prefix + 7 digits | passport number N1234567 passportid E7654321 |
| Two-letter prefix (starts with P) & keyword variants | passport details PA1234567 passportno PW9876543 passport# D2345678 |
Healthcare & Medical Patterns
Healthcare and medical patterns help detect regulated health-related identifiers and clinical codes commonly found in claims, billing, and patient records. These patterns cover classification systems (for example, ICD codes) and other healthcare identifiers, using strict formats and mandatory keyword context to improve precision and reduce false positives in general text.
ICD10_CODE
- Pattern accepts major codes (3 characters) as well as detailed subcodes (up to 7 characters with decimals).
- Valid leading letters are A–T, V–Z (U is reserved and excluded).
- Regex enforces format compliance but does not validate if a code is clinically meaningful.
| Data Type | ICD-10 Medical Classification Code |
| Keywords | icd10 icd10_code icd10 number icd10 code disease code disease id |
| Proximity | Must be within 70 bytes |
| Format | A letter (A-Z, excluding U) followed by two digits (e.g., A10, C34) Optional letter A or B in the third position (e.g., A01B) Optional decimal point followed by up to 4 additional alphanumeric characters (e.g., E11.9, F32.1A) Must be a distinct token (word boundaries required) |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Basic 3-char code & with A/B suffix | icd10 A10 | disease code C34 | icd10_code A01B |
| With decimal subcodes & keyword variants | icd10 number E11.9 | disease id F32.1A | icd10 G30.A | disease code B20 |
ICD9_CODE
- Accepts ICD-9 numeric categories (3 digits), subcategories (with decimals), and special V/E codes.
- Regex enforces format but does not validate whether the code is still in use (ICD-9 is largely retired but included for legacy data coverage).
| Data Type | ICD-9 Medical Classification Code |
| Keywords | icd9_code icd9 disease code disease id icd9 number icd9 code |
| Proximity | Must be within 70 bytes |
| Format | V-codes: V + 2 digits, optional decimal + up to 2 digits (e.g., V01, V01.2, V12.34). Numeric codes: 3 digits, optional decimal + up to 2 digits (e.g., 250, 250.0, 250.12). E-codes (external cause): E + 3 digits, optional decimal + 1 digit (e.g., E800, E800.1). Matched as standalone tokens. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Numeric code (plain & with decimal) | icd9 250 | disease code 401 | icd9 code 250.0 | icd9 number 401.9 |
| V-code & E-code formats | disease id V01 | icd9_code V12.34 | icd9 E800 | icd9 code E800.1 |
Personal Information (PII) Patterns
Personal Information (PII) patterns help detect commonly shared identifiers about an individual—such as names, dates of birth, email addresses, phone numbers, and addresses—as well as certain sensitive attributes. Because many PII elements can appear in everyday business text, these patterns rely on keyword context and proximity windows to improve precision and reduce false positives.
FULL_NAME
- Keywords are mandatory and must precede the value (≤70 bytes).
- Enforces capitalised segments and allows only letters plus - or ' inside names.
- Non-Latin scripts or all-lowercase names will not match this predefined pattern.
| Data Type | Person's Full Name (Given / Middle / Surname with optional salutation & suffix) |
| Keywords | full name name fullname |
| Proximity | Must be within 70 bytes |
| Format | Optional salutation: Mr, Ms, Miss, Mrs, Dr, Sir (with or without period) First name: capitalized, up to 15 letters, may include a hyphenated segment Optional middle name or initial Last name: capitalized, up to 15 letters, may include hyphen (-) or apostrophe (') Optional suffix: Jr., Sr., II, III, IV Must be bounded by non-alphanumeric characters before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| First + Last (plain & with salutation) | full name John Smith, | name Asha Menon. | fullname Mr. Rahul Verma: | full name Dr Anita Joseph, |
| Middle initial / middle name | name Daniel K. Gupta. | full name Priya Meera Iyer, |
| Hyphenated last name & apostrophe in last name | full name Maria Lopez-Garcia. | fullname Seán O'Connor. | name D'Arcy Patel, |
| With suffix & compact initials | name Robert Singh, Jr. | full name Neeraj Kumar II. | fullname R Sharma. |
LAST_NAME
- Regex enforces capitalization at the start of each segment, reducing false positives.
- Will not match all possible global last name formats (e.g., non-Latin alphabets).
| Data Type | Last Name / Surname |
| Keywords | lastname last name |
| Proximity | Must be within 200 bytes |
| Format | Begins with a capital letter, followed by up to 10 alphabetic characters May contain an optional hyphen (-) or apostrophe (') for compound names Optional second capitalised segment of up to 15 characters Examples: Smith, O'Connor, Johnson-Smith |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Simple surname & shorter forms | last name Smith lastname Li last name Kim |
| Hyphenated, apostrophe & two-part surnames | last name Johnson-Smith lastname O'Connor last name McDonald |
DATE
- Keyword is mandatory and must precede the date (≤30 bytes).
- Leading zeros in day/month are optional (9/3/2024 and 09/03/2024 both match).
- This pattern focuses on format; impossible dates (e.g., 31/02/2024) may still match the format unless you add extra validation upstream.
| Data Type | Calendar Date |
| Keywords | Date |
| Proximity | Must be within 30 bytes |
| Format | ISO style: YYYY-MM-DD / YYYY/MM/DD US style: MM/DD/YYYY / MM-DD-YYYY EU style: DD/MM/YYYY / DD-MM-YYYY Month name first: Mon DD, YYYY / Month DD YYYY Day first with month name: DD Mon YYYY Year first with month name: YYYY Mon DD 2-digit year support: MM/DD/YY / DD/MM/YY Separators: hyphen (-), slash (/), spaces, and optional commas Matched as a word-boundary token |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| ISO (YYYY-MM-DD / YYYY/MM/DD) | Date 2024-09-30 Date: 2024/09/30 |
| US (MM/DD/YYYY) & EU (DD/MM/YYYY) | Date 09/30/2024 Date 30/09/2024 |
| Month name first & day first with month name | Date Sep 30, 2024 Date September 30 2024 Date 30 Sep 2024 |
| Year first & 2-digit year | Date 2024 Sep 30 Date 09/30/24 Date 30/09/24 |
DATE_OF_BIRTH
The keyword is mandatory and must appear before the date (≤30 bytes). Leading zeros in day/month are optional.
| Data Type | Date of Birth |
| Keywords | DOB date of birth birthdate Birth Date |
| Proximity | Must be within 30 bytes |
| Format | Same date format variations as the DATE pattern ISO style: YYYY-MM-DD / YYYY/MM/DD US style: MM/DD/YYYY / MM-DD-YYYY EU style: DD/MM/YYYY / DD-MM-YYYY Month name first: Mon DD, YYYY / Month DD YYYY Day first with month name: DD Mon YYYY Year first with month name: YYYY Mon DD 2-digit year support: MM/DD/YY / DD/MM/YY Separators: hyphen (-), slash (/), spaces, and optional commas |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| ISO (YYYY-MM-DD / YYYY/MM/DD) | DOB 1985-07-15 Birth Date 2000/12/01 |
| US (MM/DD/YYYY) & EU (DD/MM/YYYY) | date of birth 07/15/1985 DOB 15/07/1985 |
| Month name first & day first with month name | Birth Date Jul 15, 1985 birthdate 01 December 2000 |
| Year first & 2-digit year | DOB 1985 Jul 15 Birth Date 15/07/85 |
EMAIL_ADDRESS
Validates common email formats with TLDs of 2–3 letters. Must start with a letter and include @ and valid domain.
| Data Type | Email Address |
| Keywords | email mail-id mail address |
| Proximity | Must be within 100 bytes |
| Format | Username: starts with a letter, up to 26 characters (letters, digits, period, underscore, hyphen) Mandatory @ symbol Domain: 1-25 characters (letters, digits, period, underscore, hyphen) TLD: 2-3 letters Must be bounded by non-alphanumeric characters before and after |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard format & special chars in username | email john.doe@example.com mail-id a_b-test@domain.in e-mail user123@corp.org |
| Subdomain & different TLDs | mail address jane@marketing.example.co email raj@company.in mail-id: a.brown@university.edu |
GENDER
Accepted values are limited to Male, Female, M, F, others. Regex enforces non-alphanumeric boundaries so partial matches inside words are excluded.
| Data Type | Gender / Sex Identifier |
| Keywords | gender sex |
| Proximity | Must be within 50 bytes |
| Format | Values: Male, Female, M, F, and other gender identifiers Must have non-alphanumeric characters before and after (space, colon, comma, etc.) Ensures the value is not part of a larger word |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Full word values & single-letter abbreviation | gender: Male sex = Female gender M sex: F |
| Other category & punctuation boundaries | gender: others sex - Male gender, Female |
IP_ADDRESS
- Only IPv4 addresses are matched (IPv6 is not included in this pattern).
- Regex ensures octets are within valid range (0–255), avoiding false matches like 999.999.999.999.
| Data Type | IPv4 Address |
| Keywords | ip ip addr ip address internet protocol ipv4 ip-address IP number |
| Proximity | Must be within 70 bytes |
| Format | Four octets separated by dots. Each octet: 0–255. First octet must be 1–255 (not 0). Must have non-numeric character before and after. Example: 192.168.1.1, 10.0.0.255. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Private, loopback & broadcast ranges | ip address 192.168.0.1 ip 127.0.0.1 ipv4 255.255.255.255 |
| Keyword variants & punctuation | internet protocol: 172.16.254.1, ip-address 8.8.8.8 IP number 10.0.0.255 |
STREET_ADDRESS
- The detector expects a US ZIP or ZIP+4 at the end; non-US postcodes will not match this predefined pattern.
- Hyphens/apostrophes within street names and house-number qualifiers are supported.
| Data Type | Street / Postal Address (US format) |
| Keywords | Address Street street name residential address resident at apartment location home location home address house number complex number postal address |
| Proximity | Must be within 100 bytes |
| Format | House number: 1–5 digits, with optional sub-number (e.g., 12-3A) Optional door/side/unit letter qualifier (e.g., B in 221B) Street name: letters, hyphens, and apostrophes supported (e.g., O'Connell) Street type indicator: Boulevard, Drive, Lane, Ave, Dr, Rd, Blvd, Plaza, Road, Strasse, Street, Walk, Way City name followed by US ZIP code: 5-digit (e.g., 01103) or ZIP+4 (e.g., 02110-1234) Ordinal street names supported when spelled out (e.g., Fifth Avenue), numeric form (e.g., 5th) is not matched PO Box format (e.g., P.O. Box 123) is not reliably matched, requires street-style structure after the box number |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Ordinal street name (spelled out) | residential address 100 Fifth Avenue, New York, 10011 |
| Standard number + street + city + ZIP | home address 55 Market Street, Springfield, 01103 |
| House sub-number & street with apostrophe/hyphen | house number 12-3A Elm Road, Denver, 80203 home location 221B O'Connell Avenue, Austin, 78701 |
| Street type abbreviations | Address 100 Maple Dr, Raleigh, 27601 |
| ZIP+4 format | home address 123 Main Road, Boston, 02110-1234 |
EYE_COLOR
| Data Type | Eye Colour / Eye Color |
| Keywords | eye colour eye color eye col |
| Proximity | Must be within 50 bytes |
| Format | Valid values: Amber Black Blue Brown Gray Green Hazel. Case-insensitive. Must be surrounded by non-word characters to avoid false matches inside larger strings. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard names & abbreviated keyword | Eye Colour: Blue eye color Brown eye col Green |
| Different cases & punctuation | EYE COLOUR Amber (eye color: gray) eye colour -- black |
HAIR_COLOR
| Data Type | Hair Colour / Hair Color |
| Keywords | hair colour hair col |
| Proximity | Must be within 50 bytes |
| Format | Case-insensitive. Must be surrounded by non-word characters to avoid false positives inside unrelated words. Valid values: Black Blonde Brown Red Gray Bald Other |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard names & abbreviated keyword | Hair Colour: Brown hair color Black hair col Blonde |
| Different cases & punctuation | HAIR COLOUR Red (hair colour: bald) hair col -- other |
HEIGHT
Boundary conditions ensure valid heights only — not mismatched as random decimal numbers.
| Data Type | Physical Attribute Height |
| Keywords | height high body length bodylength |
| Proximity | Must be within 30 bytes |
| Format | Metric (meters): X.Y where X = 1–8, Y = 00–11. Examples: 1.75, 2.05, 1.9. Imperial (feet & inches): X' Y" or X'Y" formats. Examples: 5'11", 6' 0", 5'09". Surrounded by non-alphanumeric characters to avoid partial matches. |
POLITICS
Sensitive PII category. Recommended at Medium or High severity to avoid excessive alerts in normal business communications.
| Data Type | Political Affiliation / Views |
| Keywords | political views political affiliation political politics |
| Proximity | Must be within 50 bytes |
| Format | Matches known political ideologies and their variants: republican, democrat, liberal, liberalism, conservative, conservatism, populism, populist, libertarianism, libertarian, socialist, socialism, communist, communism, fascist, fascism, statist, statism. |
Source Code Detection Patterns
Versa DLP includes multiple specialized patterns to detect source code files and snippets across a wide range of programming languages. These are critical for preventing intellectual property leakage and protecting proprietary algorithms.
C_CODES
| Data Type | C / C++ Source Code |
| Keywords | #include #define #ifndef def unsigned |
| Proximity | Must be within 100 bytes |
| Pattern | Detects C data types and constructs: const, double, union, struct, short, bool, int, uint, void, for, while, switch, do, return statements. |
JAVA_CODES
| Data Type | Java Source Code Snippet |
| Keywords | import package |
| Proximity | Must be within 100 bytes |
| Pattern | Detects public class declarations, public or private method definitions, main or static method references, and Java data types (boolean, string, int, float, char). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Import + public class + main + int | import java.util.*; public class Demo { public static void main(String[] args) { int x = 10; } } |
| Package declaration + private static + String | package myapp.core; public class HelloWorld { private static String msg = "Hi"; public static void main(String[] args) { System.out.println(msg); } } |
| Different data types (boolean, char, float) | public class Flags { public static boolean isActive() { return true; } } public class Letters { private char grade = 'A'; private float rate = 3.14f; } |
PHP_CODES
| Data Type | PHP Source Code Snippet |
| Keywords | <?php (mandatory opening tag) |
| Proximity | Must be within 100 bytes |
| Pattern | Detects presence of common PHP constructs: namespace, use statements, print, echo, loops (for, while, do, foreach), control structures (switch), function/class declarations (function, class, extends), access modifiers (protected). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Basic script & namespace/imports | <?php echo "Hello, World!"; ?> <?php namespace MyApp; use Library\Module; ?> |
| Function, class & loops | <?php function add($a, $b) { return $a + $b; } ?> <?php class Dog extends Animal { protected $breed; } ?> <?php for ($i=0; $i<10; $i++) { echo $i; } ?> |
JAVASCRIPT_CODES
- The text/javascript anchor is required — without it, the predefined pattern will not fire.
- Ensures closing tags and recognizable JS constructs are present to reduce false positives.
| Data Type | JavaScript / HTML Source Snippet |
| Keywords | text/javascript (MIME type hint — mandatory) |
| Proximity | Must be within 100 bytes |
| Pattern | Detects HTML document fragments containing <html><head>...</head><body>...</body></html> scaffolding with at least one of <title>, <script>, \ and a JavaScript construct such as document, function(...), for(...), while(...), switch(...), do...while, or foreach. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Inline script with function | text/javascript ... <html><head><script> function greet(){ document.write('hi'); } </script></head><body></body></html> |
| Loop & switch constructs | text/javascript ... <html><head><script> for (let i=0;i<3;i++){ } </script></head><body></body></html> text/javascript ... <html><head><script> switch(x){case 1:break;} while(x<5){x++;} </script></head>...</html> |
ASM_CODES
- Designed to catch source code leakage in tickets, emails, and chats — not compiled binaries.
- Supports labels (name:), identifiers with underscores, hex immediates (0x...), and optional trailing comments starting with ;
| Data Type | Assembly-language Source Snippet (generic x86/x64-style tokens) |
| Keywords | section '.text section '.bss section.data _start: start: add mov inc msg len (case-sensitive as written) |
| Proximity | Must be within 50 bytes |
| Pattern | A single assembly line or label/instruction line containing a valid label/identifier, optional operands (registers, symbols, hex immediates like 0xNN), optional comma-separated arguments, and optional end-of-line comment starting with ;. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Section declarations & entry labels | section '.text section '.bss section.data _start: start: |
| Instructions with registers, operands & comments | mov eax, 0x1 mov rdx, msg ; load addr add ebx, [len] inc rcx |
AWK_CODES
| Data Type | AWK Source Snippets (inline / command-line) |
| Keywords | awk gawk AWK (detected even when preceded by non-alphanumeric boundaries) |
| Proximity | Must be within 40 bytes |
| Pattern | Detects AWK command-line options (-f, -F, -v, -b, -c, etc.) and script constructs: BEGIN { ... }, END { ... }, field and record variables (NF, NR, FS, RS, OFS, ORS), and print/printf commands. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Simple inline & with option flags | awk '{print $1, $2}' file.txt gawk -F: '{print NF, $1}' /etc/passwd |
| BEGIN/END blocks & field/record variables | awk 'BEGIN { FS=","; OFS="|" } {print $1, $2}' awk '{total+=$1} END {print total}' sales.txt awk '{printf "%s has %d fields\n", $0, NF}' input.txt |
AWK_CODES_FILE
This detector is for script files, not one-liners. The shebang is mandatory and acts as the anchor.
| Data Type | AWK Script File (shebang-style) |
| Keywords | #!/bin/awk -f or #!/usr/bin/awk -f (shebang — mandatory) |
| Proximity | Must be within 40 bytes |
| Pattern | A real AWK script file starting with the AWK shebang, containing a BEGIN { } or END { } program block AND at least one AWK construct: find, searchterm, replaceterm, if(...), print, function <name>(...), or break. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + BEGIN block + print / if() | #!/usr/bin/awk -f BEGIN { FS=","; print "start" } #!/bin/awk -f BEGIN { if (NF > 2) print $1 } |
| Shebang + function & END block + search keywords |
#!/usr/bin/awk -f function clean(x) { gsub(/+/, "", x); print x } #!/bin/awk -f BEGIN { find="foo"; replaceterm="bar"; print find } |
NAWK_CODES
| Data Type | NAWK Script File (shebang-based) |
| Keywords | #!/bin/nawk -f or #!/usr/bin/nawk -f (shebang — mandatory) |
| Proximity | Must be within 40 bytes |
| Pattern | A valid NAWK script file with the NAWK shebang containing BEGIN { } or END { } block plus at least one AWK construct: find, searchterm, replaceterm, if(...), print, function <name>(...), or break. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + BEGIN block + print / conditional | #!/usr/bin/nawk -f BEGIN { print "NAWK start" } #!/bin/nawk -f BEGIN { if (NF > 3) print $2 } |
| Shebang + function & keyword references | #!/usr/bin/nawk -f function clean(val) { gsub(/[ ]+/, "", val); print val } #!/usr/bin/nawk -f BEGIN { searchterm="foo"; replaceterm="bar"; print searchterm } |
GIT_DIFF_CODE
Keyword 'diff --git' is non-optional — ensures context-specific accuracy and prevents false positives from unrelated 'diff' text.
| Data Type | Git Diff / Patch Snippet |
| Keywords | diff --git (mandatory — indicates the start of a Git patch or diff block) |
| Proximity | Must be within 40 bytes |
| Pattern | Matches Git diff headers including diff --git a/file b/file, index <hash>..<hash>, context markers with --- (original file) and +++ (new file). Ensures presence of commit hashes, index references, and file path markers. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Standard file change & multiple file diff | diff --git a/file1.txt b/file1.txt index 6dcb09b..d9e2647 100644 --- a/file1.txt +++ b/file1.txt diff --git a/app/main.py b/app/main.py index 91f2ab4..cdbe873 100644 |
| Binary file diff & renamed file | diff --git a/image.png b/image.png index e69de29..c22f8b7 diff --git a/old_name.c b/new_name.c index 123abcd..456efgh 100644 |
DIFF_CODE
- Complements GIT_DIFF_CODE by catching non-Git diffs (legacy or standalone diff outputs).
- Often used in UNIX/Linux patches where version control is not Git. Helps prevent leaks of patch data where sensitive lines (e.g., config, credentials) may be exposed.
| Data Type | Classic UNIX diff / patch snippet (non-Git unified diff format) |
| Keywords | Line-based diff markers using patterns like 1 23c45 / 2a7 / 10d12 (traditional UNIX diff command outputs — not Git diffs) |
| Proximity | Must be within 40 bytes |
| Pattern | Identifies change indicators (a=addition, c=change, d=deletion) between line ranges. Validates context structure including < and > markers used in patch blocks. Looks for --- separator lines and lines starting with < (removals) or > (additions). Non-alphanumeric boundaries prevent false positives from random numbers. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Change (c) & addition (a) | 1,4c1,4 < text line old --- > text line new 7a8,10 > new line 1 > new line 2 |
| Deletion (d) & mixed edit | 10,12d7 < old line 1 < old line 2 3c3 < old content --- > new content |
LUA_CODES_1 - Module Loading
| Data Type | Lua Source Code Snippet (module-style) |
| Keywords | #!/usr/bin/lua __lua__ .Class:new():register |
| Proximity | Must be within 100 bytes |
| Pattern | A Lua snippet containing two local assignments, with a require in the first: local <name> = require ... followed (nearby) by local <name> = ... |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + require + second local | #!/usr/bin/lua local json = require "cjson" local data = { id = 1 } |
| __lua__ & .Class anchors with require + local | -- __lua__ local http = require("socket.http") local body = "" -- .Class:new():register local app = require "myapp.core" local router = app:router() |
LUA_CODES_2 - Functions & Control Flow
| Data Type | Lua Source Code Snippet (functions / control flow) |
| Keywords | #!/usr/bin/lua __lua__ .Class:new():register |
| Proximity | Must be within 100 bytes |
| Pattern | Function definitions (_init(), _something()), general function name(...) ... end blocks, control flow (if ... then ... end), and for <var> ... do ... end loops. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + _init() & anchor + private function | #!/usr/bin/lua function _init() print("boot") end -- __lua__ function _refresh() -- refresh cache end |
| Generic function, if/then & for loop | function process(data) return data.id end if enabled then print("on") end for i=1,3 do print(i) end |
LUA_CODES_3 - Initialization & Return Pattern
- LUA_CODES_1 covers module loading (require, locals).
- LUA_CODES_2 covers function/control flow (_init, if, for).
- LUA_CODES_3 covers initialization methods / return true...end blocks.
- All three together provide comprehensive Lua detection coverage.
| Data Type | Lua Source Code Snippet (specialized initialization / return pattern) |
| Keywords | #!/usr/bin/lua __lua__ .Class:new():register |
| Proximity | Must be within 100 bytes |
| Pattern | Function definitions containing :initialize( (OOP-style method initialization), or code blocks ending with return true ... end. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + :initialize() & class-style anchor | #!/usr/bin/lua function Player:initialize(name) self.name = name end .Class:new():register function Engine:initialize(config) self.cfg = config end |
| Return true...end block | -- __lua__ if ready then return true end |
PYTHON_CODE_1
Targets the canonical Python entry-point guard. Does not match variations like __name__ == '__main__' with extra whitespace or formatting differences beyond quote style.
| Data Type | Python Source Code Snippet (entry-point detection) |
| Keywords | import, def |
| Proximity | 100 bytes |
| Pattern | Detects the standard Python entry-point idiom: if __name__ == "__main__": (with single or double quotes). Fires when this construct appears near an import or def keyword. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| import + main guard (double quotes) | import sys def run(): pass if __name__ == "__main__": run() |
| def + main guard (single quotes) | def main(): pass if __name__ == '__main__': main() |
PYTHON_CODE_2
- Broadest of the three Python detectors, covering multiple shebang styles, error handling (try/except), and branching (if/elif/else).
- Complements PYTHON_CODE_1 (entry point) and PYTHON_CODE_3 (minimal). Together, the three patterns provide layered Python detection.
| Data Type | Python Source Code Snippet (control flow and error handling) |
| Keywords | import, if, try:, #!/usr/bin/python, #!/usr/local/bin/python, #!/usr/bin/env python |
| Proximity | 100 bytes |
| Pattern | Detects Python control-flow and error-handling constructs: elif <condition>:, else:, except, import, and try: blocks. Fires when any of these appear near a keyword such as import, if, try:, or a Python shebang line. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + try/except block | #!/usr/bin/python try: import os except ImportError: pass |
| if/elif/else chain | import sys if x > 0: print(x) elif x == 0: pass else: exit() |
| env-style shebang + import + except | #!/usr/bin/env python import json try: data = json.load(f) except ValueError: pass |
PYTHON_CODE_3
A lighter variant of PYTHON_CODE_2, detecting only elif or try: near import/def. Use when minimal Python detection is sufficient.
| Data Type | Python Source Code Snippet (minimal control flow) |
| Keywords | import, def |
| Proximity | 100 bytes |
| Pattern | Detects minimal Python control-flow constructs: elif <condition>: or try: blocks. Fires when either appears near an import or def keyword. A lighter variant of PYTHON_CODE_2. |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| import + elif | import math if n < 0: return None elif n == 0: return 1 |
| def + try block | def load(path): try: open(path) except IOError: pass |
PERL_CODES
- The Perl shebang (#!/usr/bin/perl or #!/usr/local/bin/perl) is mandatory. Without it, the pattern will not fire.
- Covers both procedural Perl (sub, print, for) and module-style Perl (package, use, =head1 POD documentation).
| Data Type | Perl Source Code Snippet |
| Keywords | #!/usr/bin/perl, #!/usr/local/bin/perl |
| Proximity | 100 bytes |
| Pattern | Detects common Perl constructs when preceded by a Perl shebang: =head1 (POD documentation), use/package declarations, =encoding, subroutine definitions (sub), print statements, and control-flow keywords (if, for, while, foreach). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + use + subroutine + print | #!/usr/bin/perl use strict; use warnings; sub greet { print "Hello\n"; } |
| Shebang + package + control flow | #!/usr/local/bin/perl package MyApp; if ($ready) { for my $i (1..10) { print $i; } } |
| Shebang + POD + foreach | #!/usr/bin/perl =head1 NAME MyModule =cut foreach my $item (@list) { process($item); } |
PASCAL_CODES
- The keyword list is broad (program, unit, function, var, begin, etc.) but the value regex is narrow (end;), so both must be present to fire.
- Designed for Object Pascal / Delphi as well as standard Pascal.
| Data Type | Pascal Source Code Snippet |
| Keywords | program, unit, function, uses type, for, if, var, while, else, interface, class, Procedure, implementation, begin (each followed by identifiers and Pascal punctuation: ; , : ( ' { ) |
| Proximity | 100 bytes |
| Pattern | Detects the Pascal end; terminator. Fires when end; appears near any of the Pascal keywords listed above. The keyword list covers program structure (program, unit, uses, interface, implementation, begin), declarations (var, function, Procedure, class), and control flow (for, if, while, else). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| program + begin/end | program Hello; begin writeln('Hello'); end; |
| unit + interface + implementation | unit Utils; interface function Add(a, b: Integer): Integer; implementation function Add(a, b: Integer): Integer; begin Result := a + b; end; end; |
| Procedure + var + control flow | Procedure Process; var i: Integer; begin for i := 1 to 10 do if i mod 2 = 0 then writeln(i); end; |
TCL_CODES
Requires a Tcl anchor keyword (package require Tcl or #!/usr/bin/tclsh). Without it, common words like set, if, for will not trigger a match on their own.
| Data Type | Tcl Source Code Snippet |
| Keywords | package require Tcl, #!/usr/bin/tclsh |
| Proximity | 100 bytes |
| Pattern | Detects common Tcl constructs: namespace, proc (procedure definitions), set (variable assignment), put (output), if, foreach, while, and for. Fires when any of these appear near a Tcl keyword anchor (package require Tcl or tclsh shebang). |
Examples: What It Will Match
| Variation & Description | Example(s) |
|---|---|
| Shebang + proc + set + if | #!/usr/bin/tclsh proc greet {name} { set msg "Hello $name"; if {$name eq ""} { return } ; puts $msg } |
| package require + namespace + foreach | package require Tcl 8.6 namespace eval myapp { foreach item $list { puts $item } } |
| Shebang + while + set | #!/usr/bin/tclsh set i 0 while {$i < 10} { puts $i; set i [expr {$i + 1}] } |
Usage Guidance and Best Practices
This section provides practical guidance for using predefined patterns effectively in DLP Content Analysis policies. It covers recommended ways to combine patterns, tune severity thresholds, and validate detections to minimize false positives while maintaining strong protection for sensitive data across different industries and use cases.
Applying Predefined Patterns in Policies
| Step | Action |
|---|---|
| Step 1 | Navigate to Security Policies > DLP > Content Analysis in the Versa Director. |
| Step 2 | Create or edit a DLP rule and select the predefined pattern(s) to match against. |
| Step 3 | Assign a Severity Level (Low / Medium / High / Critical) for each pattern. |
| Step 4 | Optionally override the threshold using Severity Value if the default is too permissive or strict. |
| Step 5 | Set the Action (Alert, Block, Quarantine, Redact, etc.) appropriate for your use case. |
| Step 6 | Test in Alert-only (monitor) mode before switching to blocking actions. |
Industry-Specific Pattern Recommendations
| Industry | Recommended Patterns |
|---|---|
| Financial Services | CREDIT_CARD_NUMBER, CVV, PIN_NUMBER, SWIFT_CODE, US_BANK_ACCOUNT_NUMBER, US_BANK_ROUTING_NUMBER, US_SOCIAL_SECURITY_NUMBER, US_EMPLOYER_IDENTIFICATION_NUMBER, IBAN_CODE |
| Healthcare / Hospitals | ICD9_CODE, ICD10_CODE, US_MEDICARE_BENEFICIARY_IDENTIFIER, UK_NATIONAL_HEALTH_SERVICE_NUMBER, US_DEA_NUMBER, DATE_OF_BIRTH, HEALTH_CONDITION, US_MEDICAL_ACCOUNT_NUMBER |
| Technology / Software | C_CODES, JAVA_CODES, PYTHON_CODE_1/2/3, GIT_DIFF_CODE, AWS_CREDENTIALS, PASSWORD, AUTH_TOKEN, JSON_WEB_TOKEN, GCP_API_KEY, AZURE_AUTH_TOKEN |
| Retail / eCommerce | CREDIT_CARD_NUMBER, CARD_HOLDER_NAME, EXPIRY_DATE, CVV, EMAIL_ADDRESS, STREET_ADDRESS, US_MOBILE_NUMBER, US_BANK_ACCOUNT_NUMBER |
| Government / Public Sector | US_SOCIAL_SECURITY_NUMBER, US_PASSPORT, INDIA_AADHAAR_INDIVIDUAL, INDIA_PAN_INDIVIDUAL, UK_NATIONAL_INSURANCE_NUMBER, DOCUMENT_TYPE/LEGAL/* |
| Global Enterprise (Multi-region) | Country-specific passport and national ID patterns + IBAN_CODE, SWIFT_CODE, EMAIL_ADDRESS, IP_ADDRESS, FULL_NAME, DATE_OF_BIRTH, SORT_CODE |
Detection Context and Scope
This section explains where DLP inspection and predefined pattern matching occur, and what content is in scope for detection. It also outlines the primary threat types these patterns are designed to identify (for example, accidental data leakage, policy violations, or data exfiltration attempts) so you can align rules to your deployment and risk model.
Where DLP Patterns Are Applied
| Context | Description |
|---|---|
| Header | Scans HTTP/email headers for sensitive metadata (e.g., Subject lines, custom headers). |
| Body | Inspects the body of HTTP traffic, emails, chat messages, and web forms. |
| Attachment | Analyzes file attachments in emails and uploads for sensitive content within files. |
| App ID | Matches against application identifiers , enables app-specific DLP policies. |
| Device ID | Ties detection to specific device identifiers for endpoint-aware policies. |
Complete Pattern Reference
All predefined DLP patterns available in the Versa Security Pack, organized by category. Patterns marked '(ML-based)' use machine learning or file-type heuristics and do not require a keyword.
Payment & Financial
| Pattern Name | Category |
|---|---|
| CREDIT_CARD_NUMBER | Payment Cards |
| CARD_HOLDER_NAME | Payment Cards |
| CVV | Payment Cards |
| PIN_NUMBER | Payment Cards |
| EXPIRY_DATE | Payment Cards |
| PRIMARY_ACCOUNT_NUMBER | Payment Cards |
| SERVICE_CODE | Payment Cards |
| CREDIT_CARD_TRACK_NUMBER | Payment Cards |
| IBAN_CODE | Banking |
| SWIFT_CODE | Banking |
| US_BANK_ACCOUNT_NUMBER | Banking - USA |
| US_BANK_ROUTING_NUMBER | Banking - USA |
| US_BANK_ROUTING_FRACTION | Banking - USA |
| US_BANK_NAME | Banking - USA |
| UK_BANK_ACCOUNT_NUMBER | Banking - UK |
| UK_BANK_NAME | Banking - UK |
| AUSTRALIA_BANK_NUMBER | Banking - Australia |
| AUSTRALIA_BANK_STATE_BRANCH | Banking - Australia |
| SORT_CODE | Banking - UK |
| AMERICAN_BANKERS_CUSIP_ID | Finance |
Government & National ID
| Pattern Name | Category |
|---|---|
| INDIA_AADHAAR_INDIVIDUAL | India |
| INDIA_PAN_INDIVIDUAL | India |
| INDIA_GST_INDIVIDUAL | India |
| US_SOCIAL_SECURITY_NUMBER | USA |
| US_PASSPORT | USA |
| US_DRIVERS_LICENSE_NUMBER_ALL_STATES | USA |
| US_EMPLOYER_IDENTIFICATION_NUMBER | USA |
| US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER | USA |
| US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER | USA |
| UK_NATIONAL_INSURANCE_NUMBER | UK |
| UK_NATIONAL_HEALTH_SERVICE_NUMBER | UK |
| UK_DRIVERS_LICENSE_NUMBER | UK |
| UK_TAXPAYER_REFERENCE | UK |
| UK_PASSPORT | UK |
| AUSTRALIA_DRIVERS_LICENSE_NUMBER | Australia |
| AUSTRALIA_MEDICARE_NUMBER | Australia |
| AUSTRALIA_PASSPORT | Australia |
| AUSTRALIA_TAX_FILE_NUMBER | Australia |
| BRAZIL_CPF_NUMBER | Brazil |
| BRAZIL_TIN_PERSONAL | Brazil |
| BRAZIL_TIN_CORPORATE | Brazil |
| CANADA_SOCIAL_INSURANCE_NUMBER | Canada |
| CANADA_PASSPORT | Canada |
| FRANCE_PASSPORT | France |
| FRANCE_NIR | France |
| FRANCE_CNI | France |
| GERMANY_IDENTITY_CARD_NUMBER | Germany |
| GERMANY_PASSPORT | Germany |
| GERMANY_DRIVERS_LICENSE_NUMBER | Germany |
| GERMANY_TAXPAYER_IDENTIFICATION_NUMBER | Germany |
| KOREA_RRN | South Korea |
| KOREA_PASSPORT | South Korea |
| CHINA_RESIDENT_ID_NUMBER | China |
| CHINA_PASSPORT | China |
| JAPAN_INDIVIDUAL_NUMBER | Japan |
| JAPAN_PASSPORT | Japan |
| SINGAPORE_NATIONAL_REGISTRATION_ID_NUMBER | Singapore |
| SINGAPORE_PASSPORT | Singapore |
| IRELAND_PPSN | Ireland |
| IRELAND_PASSPORT | Ireland |
| SPAIN_DNI_NUMBER | Spain |
| SPAIN_PASSPORT | Spain |
| PASSPORT | Global |
Healthcare & Medical
| Pattern Name | Category |
|---|---|
| ICD10_CODE | Healthcare |
| ICD9_CODE | Healthcare |
| US_MEDICARE_BENEFICIARY_IDENTIFIER | USA - Healthcare |
| US_DEA_NUMBER | USA - Healthcare |
| US_MEDICAL_ACCOUNT_NUMBER | USA - Healthcare |
| UK_NATIONAL_HEALTH_SERVICE_NUMBER | UK - Healthcare |
| AUSTRALIA_MEDICARE_NUMBER | Australia - Healthcare |
| HEALTH_CONDITION | Healthcare |
| MEDICAL_TERM | Healthcare |
| FDA_CODE | Healthcare |
Personal Information (PII)
| Pattern Name | Category |
|---|---|
| FULL_NAME | PII - General |
| CARD_HOLDER_NAME | PII - General |
| LAST_NAME | PII - General |
| DATE_OF_BIRTH | PII - General |
| DATE | PII - General |
| AGE | PII - General |
| GENDER | PII - General |
| EMAIL_ADDRESS | PII - General |
| PHONE_NUMBER | PII - General |
| US_MOBILE_NUMBER | PII - USA |
| STREET_ADDRESS | PII - General |
| EYE_COLOUR | PII - Physical |
| HAIR_COLOUR | PII - Physical |
| HEIGHT | PII - Physical |
| WEIGHT | PII - Physical |
| RACE | PII - Sensitive |
| RELIGION | PII - Sensitive |
| POLITICAL | PII - Sensitive |
| BIOMETRIC | PII - Sensitive |
| ETHNIC_GROUP | PII - Sensitive |
| CRIMINAL_RECORD | PII - Sensitive |
| IP_ADDRESS | Network |
| MAC_ADDRESS | Network |
| DOMAIN_NAME | Network |
| URL | Network |
Credentials & Security
| Pattern Name | Category |
|---|---|
| PASSWORD | Credentials |
| AUTH_TOKEN | Credentials |
| BASIC_AUTH_HEADER | Credentials |
| ENCRYPTION_KEY | Credentials |
| JSON_WEB_TOKEN | Credentials |
| AWS_CREDENTIALS | Credentials |
| AZURE_AUTH_TOKEN | Credentials |
| GCP_API_KEY | Credentials |
| GCP_CREDENTIALS | Credentials |
| STORAGE_SIGNED_URL | Credentials |
| STORAGE_SIGNED_POLICY_DOCUMENT | Credentials |
| WEAK_PASSWORD_HASH | Credentials |
| HTTP_COOKIE | Credentials |
| BAD_FQDN | Network / Threat |
| BAD_IP_ADDRESS | Network / Threat |
Source Code Detection
| Pattern Name | Category |
|---|---|
| C_CODES | Source Code - C/C++ |
| JAVA_CODES | Source Code - Java |
| PHP_CODES | Source Code - PHP |
| JAVASCRIPT_CODES | Source Code - JavaScript |
| PYTHON_CODE_1 | Source Code - Python |
| PYTHON_CODE_2 | Source Code - Python |
| PYTHON_CODE_3 | Source Code - Python |
| PERL_CODES | Source Code - Perl |
| PASCAL_CODES | Source Code - Pascal |
| AWK_CODES | Source Code - AWK |
| AWK_CODES_FILE | Source Code - AWK |
| NAWK_CODES | Source Code - NAWK |
| LUA_CODES_1 | Source Code - Lua |
| LUA_CODES_2 | Source Code - Lua |
| LUA_CODES_3 | Source Code - Lua |
| TCL_CODES | Source Code - TCL |
| ASM_CODES | Source Code - Assembly |
| GIT_DIFF_CODE | Source Code - Diff |
| DIFF_CODE | Source Code - Diff |
| DOCUMENT_TYPE/FINANCE/REGULATORY | Document Type |
| DOCUMENT_TYPE/FINANCE/SEC_FILING | Document Type |
| DOCUMENT_TYPE/HR/RESUME | Document Type |
| DOCUMENT_TYPE/LEGAL/BRIEF | Document Type |
| DOCUMENT_TYPE/LEGAL/COURT_ORDER | Document Type |
| DOCUMENT_TYPE/LEGAL/LAW | Document Type |
| DOCUMENT_TYPE/R&D/SOURCE_CODE | Document Type |
| DOCUMENT_TYPE/R&D/PATENT | Document Type |
| DOCUMENT_TYPE/R&D/DATABASE_BACKUP | Document Type |
| GLOBAL_BAD_WORDS | Content Policy |
| ALL_BASIC | Catch-All |
| GENERIC_ID | Catch-All |
| ADVERTISING_ID | Device |
| IMEI_HARDWARE_ID | Device |
Glossary of Terms
| Term | Definition |
|---|---|
| DLP | Data Loss Prevention. Technology to detect and prevent unauthorized transmission of sensitive data. |
| Predefined Pattern | A pre-built detection rule combining keyword, regex, and proximity. Ready to use without custom configuration. |
| Spack | Versa Security Pack. The bundle that delivers DLP patterns, signatures, and updates to Versa devices. |
| Regex | Regular Expression. A pattern used to match character sequences in text. |
| Proximity | Maximum byte distance between a keyword and a regex match for a detection to fire. |
| Severity Level | Controls the minimum number of pattern matches required before a DLP rule triggers. |
| Severity Value | An optional override that replaces the severity level's default match threshold. |
| PAN | Primary Account Number. The main card number on a payment card. |
| CVV | Card Verification Value. The 3 or 4 digit security code on a payment card. |
| SSN | Social Security Number. A 9-digit US government ID number. |
| ITIN | Individual Taxpayer Identification Number. A US tax processing number starting with 9. |
| EIN | Employer Identification Number. A US business tax ID. |
| NINO | National Insurance Number. UK social security equivalent. |
| NHS | National Health Service — UK public health system; NHS numbers are patient identifiers. |
| TFN | Tax File Number — Australia's individual tax identification number. |
| ICD | International Classification of Diseases — standard system for coding medical diagnoses. |
| DEA | Drug Enforcement Administration - DEA numbers identify US licensed prescribers. |
| MBI | Medicare Beneficiary Identifier - US Medicare patient ID replacing SSN-based HIC. |
| BIC / SWIFT | Bank Identifier Code - international standard identifying specific banks globally. |
| ML | Machine Learning - some patterns use trained models instead of regex for detection. |
| PII | Personally Identifiable Information - data that can identify a specific individual. |
| PHI | Protected Health Information - health data covered by HIPAA and similar regulations. |
| MIP | Microsoft Information Protection - Microsoft's sensitivity label and document classification system. |
| GSTIN | Goods and Services Tax Identification Number - India's business tax registration number. |
| BSB | Bank State Branch. Australian code identifying bank branches. |
