Regular expression examples

Regular expressions can be used to help locate sensitive information including contact information, credit card numbers, and personal identification numbers. Examples of regular expressions that can be used to create rules are provided below.

Note: The provided examples are meant to be a starting point for your Redact projects. They are not guaranteed to find and apply markups to every piece of sensitive information. We recommend double-checking documents before production to ensure that all sensitive information is properly redacted.

Before you get started

Image markup considerations

If you are using regular expressions for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.

To do so replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.

Named groups

To specify that only part of a regular expression match should be redacted, use the <redact> named group. Only one named group can be used per rule.

For example, the regular expression TFN(:|:\s|\s|)(?<redact>(\d{8,9}) will match TFN: 12345678 but will only apply a markup on the 12345678.

Email addresses and phone numbers regular expressions

Name Description Example
France Phone Numbers This regular expression can be used to redact french phone numbers that include the country code but do not have delimiters. \b([0O]?[1lI][1lI])?[3E][3E][0O]?[\dOIlZEASB]{9}\b
Germany Phone Numbers This regular expression can be used to redact German phone numbers. \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b
UK Phone Numbers This regular expression can be used to redact UK phone numbers that include country code but does not have delimiters. \b([0O]?[1lI][1lI])?[4A][4A][\dOIlZEASB]{10,11}\b
US Phone Numbers This regular expression can be used to redact US phone numbers. It is recommended to test this regex on a website like regex 101 with the phone numbers that appear in your document set before running it in a project to validate that it will match. This regular expression may be overly aggressive so sampling is recommended. \b((\+|\b)[1l][\-\. ])?\(?\b[\dOlZSB]{3,5}([\-\. ]|\) ?)[\dOlZSB]{3}[\-\. ][\dOlZSB]{4}\b
US Street Address This regular expression uses a multi-state conditional hint to redact street addresses. Similar to phone numbers, this regular expression should be sampled for desired results before running against the full document set. \b\d{1,8}\b[\s\S]{10,100}?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID|
IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|
OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY)\b\s\d{5}\b

Universal regular expressions

Name Description Example
Dates This regular expression will match dates. \b([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul| aug|sep|oct|nov|dec)[/\- ]([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[/\- ]\d{2,4}\b
Email Addresses This regular expression will match and redact full email addresses. \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}\b
Redact specific email domains   \b[a-z0-9._%\+\-—|]+@(gmail|Relativity)\.[a-z|]{2,6}\b
Exclude specific email domains   \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}(?<!Relativity.com|gmail.com)\b
Partial text (redact local email)   (?<redact>[a-z0-9._%\+\-—|]+)@[a-z0-9.\-—|]+\.[a-z|]{2,6}
Birth Dates

This regular expression uses contextual hints to locate and redact dates that are in proximity to words that typically denote birth dates. It's important to understand that while dates are regular patterns, birth dates are not.

If a non-birth date exists within close proximity of our contextual hits, it will be redacted. The contextual hint words in this regular expression are:

  • birth

  • birth date

  • birthday

  • date of birth

  • born

\b(birth|birthdate|birthday|dob|born)\W+(?:\w+\W+){0,5}?(?<REDACT>(\d{4}|\d{1,2})[\/\-]\d{1,2}[\/\-](\d{4}|\d{1,2}))\b
IPv4 This regular expression will match and general IPv4 addresses. \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
IPv6 This regular expression will match and general IPv6 addresses. \b([\d\w]{4}|0)(\:([\d\w]{4}|0)){7}\b
Pre-OCR or HOCR images without any redaction rules This regular expression does not find matches or apply markups. Instead, you can use this regular expression to begin running OCR for a large project without needing to create other rules first. You can then add the rules for matching terms while the OCR is running at your convenience. \A
Redact text after key term   \bTerm\s?(?<redact>[\d]{9})

Financial accounts regular expressions

Credit cards

If using the regular expressions below for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.

To do so simply replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.

Visa, MasterCard, AmEx

\b((4\d{3}|5[1-5]\d{2}|2\d{3}|3[47]\d{1,2})[\s\-]?\d{4,6}[\s\-]?\d{4,6}?([\s\-]\d{3,4})?(\d{3})?)\b

This regular expression matches the following samples.

Visa

  • 4532613257548007

  • 4716563756075937

  • 4929038415234561233

  • 4718 4123 4142 4124

  • 4716-5637-5607-5937

MasterCard

  • 2720-9928-3988-7281

  • 2720992839887281

  • 5461718001676921

  • 5489790994470834

  • 5489 7909 9447 0834

  • 5489-7909-9447-0834

American Express

  • 372714876128394

  • 346781676352683

  • 376506566639896

  • 3765 065666 39896

  • 3467 816763 52683

  • 3400 0000 0000 009

Individual brand credit cards

Card name Regular Expressions
American Express card \b3[47][0-9]{13}\b
BCGlobal \b(6541|6556)[0-9]{12}\b
Carte Blanche card \b389[0-9]{11}\b
Diners Club card \b3(?:0[0-5]|[68][0-9])[0-9]{11}\b
Discover card \b65[4-9][0-9]{13}|64[4-9][0-9]{13}|6011[0-9]{12}|(622(?:12[6-9]|1[3-9][0-9]|[2-8][0-9][0-9]|9[01][0-9]|92[0-5])[0-9]{10})\b
Insta Payment card \b63[7-9][0-9]{13}\b
JCB card \b(?:2131|1800|35\d{3})\d{11}\b
Korean Local card \b9[0-9]{15}\b
Laser card \b(6304|6706|6709|6771)[0-9]{12,15}\b
Maestro card \b(5018|5020|5038|6304|6759|6761|6763)[0-9]{8,15}\b
Mastercard \b(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})\b
Solo card \b(6334|6767)[0-9]{12}|(6334|6767)[0-9]{14}|(6334|6767)[0-9]{15}\b
Switch card \b(4903|4905|4911|4936|6333|6759)[0-9]{12}|(4903|4905|4911|4936|6333|6759)[0-9]{14}|(4903|4905|4911|4936|6333|6759)[0-9]{15}|564182[0-9]{10}|564182[0-9]{12}|564182[0-9]{13}|633110[0-9]{10}|633110[0-9]{12}|633110[0-9]{13}\b
Union Pay card \b(62[0-9]{14,17})\b
Visa card \b4[0-9]{12}(?:[0-9]{3})?\b
Visa Mastercard \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b

Other financial accounts

Name Description Example
American Banker Association (ABA) transit routing numbers

This regular expression matches the following ABA routing numbers:

  • 011103093

  • Florida 067014822

  • Maine 211274450

  • Massachusetts/Rhode Island 211370545

  • Metro DC/Maryland/Virginia 054001725

  • New Hampshire 011400071

  • New Jersey/Delaware 031201360

  • New York – Metro NYC or former Commerce customers 026013673

  • New York – Upstate NY or former Banknorth customers 021302567

  • North Carolina/South Carolina 05390219

  • Pennsylvania 036001808

  • Vermont 011600033

\b((0[0-9])|(1[0-2])|(2[1-9])|(3[0-2])|(6[1-9])|(7[0-2])|80)([0-9]{7})\b
US ABA Routing Transit Number This regular expression can be used to apply markups to an ABA routing number. \b(0[0-9]|1[0-2]|2[1-9]|3[0-2]|6[1-9]|7[0-2]|80)\d{7}\b
SWIFT Code This regular expression can be used to redact SWIFT codes for payment instruction information. \b[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?\b
IBAN Codes This regular expression can be used to redact IBAN codes for payment instruction information. (?:(?:IT|SM)\d{2}[\w]\d{22}|CY\d{2}[\w]\d{23}|NL\d{2}[\w]{4}\d{10}|LV\d{2}[\w]{4}\d{13}|(?:BG|BH|GB|IE)\d{2}[\w]{4}\d{14}|GI\d{2}[\w]{4}\d{15}|RO\d{2}[\w]{4}\d{16}|KW\d{2}[\w]{4}\d{22}|MT\d{2}[\w]{4}\d{23}|NO\d{13}|(?:DK|FI|GL|FO)\d{16}|MK\d{17}|(?:AT|EE|KZ|LU|XK)\d{18}|(?:BA|HR|LI|CH|CR)\d{19}|(?:GE|DE|LT|ME|RS)\d{20}|IL\d{21}|(?:AD|CZ|ES|MD|SA)\d{22}|PT\d{23}|(?:BE|IS)\d{24}|(?:FR|MR|MC)\d{25}|(?:AL|DO|LB|PL)\d{26}|(?:AZ|HU)\d{27}|(?:GR|MU)\d{28})

Government identification numbers regular expressions

These regular expressions can be used to help you apply markups to personally identifiable information related to government records such as tax IDs, government issued IDs, and more.

Name Description Example
Argentina National Identity (DNI) Number

This regular expression matches the following examples:

  • 34.960.099

  • 63.889.141

  • 40.571.278

  • 45.855.200

  • 80.933.831

\d{2}\.\d{3}\.\d{3}
Australia Tax File Number

The Australian tax file number is typically 8-9 digit numbers preceded by TFN. Other indicators in the government website may say things like payee's tax file number or tax file number.

Typically, you can use these indicators to confirm that the 8-9 digits are a TFN. To apply markups, you will need to select the Character option for the Markup Scope field as well.

For example, when the TFN appears like TFN: 12345678, you can search for TFN: and then use the Redact named group redaction hint to only redact the numbers.

TFN(:|:\s|\s|)(?<redact>(\d{8,9})

This regular expression will match TFN: 12345678 but will only place a redaction on the 12345678.

 
Canada Passport ID This regular expression will match Canadian passport IDs. \b[\w]{2}[\d]{6}\b
Canada Postal code This regular expression will match Canadian postal codes. \b[a-z]\d[a-z][ -]?\d[a-z]\d\b
Canada Social Insurance Number This regular expression will match Canadian insurance numbers. \b(\d{3}[\—\-_]\d{3}[\—\-_]\d{3})|(\d{9})\b
Croatia Vat ID card number This regular expression will match Croatian VAT ID card number. \bHR\d{11}\b
Czech Republic Vat ID card number This regular expression will match Czech Republic VAT ID card number. \bCZ\d{8,10}\b
Denmark Personal ID number This regular expression will match Denmark Personal ID number. \b\d{10}|\d{6}[-\s]\d{4}\b
France National ID card (CNI) This regular expression will match France's National ID card (CNI). \b\b\d{12}\b\b
France Social Security Number (INSEE) This regular expression will match France's Social Security Number (INSEE). \b\d{13}|\d{13}\s\d{2}\b
France Driver's License ID This regular expression will match France's Driver's license ID. \b\d{12}\b
France Passport ID This regular expression will match France's Passport ID. \b\d{2}11\d{5}\b
Germany ID card number This regular expression will match Germany's ID card number. \bl\d{8}\b
Germany Passport ID This regular expression will match Germany's Passport ID. \b[cfghjk]\d{3}\w{5}\d\b
Germany Driver's License ID This regular expression will match Germany's Driver's License ID. \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b
Ireland Personal Public Service (PPS) Number This regular expression will match Personal Public Service (PPS) Number. \b\d{7}\w{1,2}\b
Netherlands Citizen's Service (BSN) number This regular expression will match Citizen's Service (BSN) number. \b\d{8}|\d{3}[-\.\s]\d{3}[-\.\s]\d{3}\b
Poland National ID (PESEL) This regular expression will match Poland's National ID (PESEL). \b\d{11}\b
Portugal Citizen Card Number This regular expression will match Portugal's Citizen Card Number. \d{9}[\w\d]{2}|\d{8}-\d[\d\w]{2}\d
Spain Social Security Number (SSN) This regular expression will match Spain's Social Security Number. \b\d{2}\/?\d{8}\/?\d{2}\b
Sweden Passport ID This regular expression will match Sweden's Passport ID. \b\d{8}\b
United Kingdom Passport ID This regular expression will match United Kingdom's Passport ID. \b\d{9}\b
United Kingdom Driver's License ID This regular expression will match United Kingdom's Driver's License ID. \b[\w9]{5}\d{6}[\w9]{2}\d{5}\b
United Kingdom National Health Service (NHS) number This regular expression will match United Kingdom's National Health Service (NHS) number. \b\d{3}\s\d{3}\s\d{4}\b
United States Social Security Number (SSN) These regular expressions are optimized for image and spreadsheet documents respectively. The image version of the SSN Regex is specifically created to be defensive against common OCR mistakes such as 1 being read as l, i, or I.

Image Projects

\b[\dlZEASBO]{3} [\dlZEASBO]{2} [\dlZEASBO]{4}|([\dlZEASBO] ?){3}[\—\-_] ?([\dlZEASBO] ?){2}[\—\-_] ?([\dlZEASBO] ?){4}\b

 

Spreadsheet Projects

\b[\d]{3} [\d]{2} [\d]{4}|([\d] ?){3}[\—\-_] ?([\d] ?){2}[\—\-_] ?([\d] ?){4}\b

 

Spreadsheet and PDF projects redact all but the last four digits of SSN with spaces

(?<redact>\b[\d]{3} [\d]{2}) [\d]{4}

 

Spreadsheet and PDF projects redact all but the last four digits of SSN with hyphens

(?<redact>([\d] ?){3}[\—\-_] ?([\d] ?){2}[\—\-_]) ?([\d] ?){4}\b

US Federal Employer Identification Number This regular expression will match the Federal Employer ID number. \b[0-9]{2}[—\-][0-9]{7}\b