Regular expression examples
Regular expressions can be used to help locate sensitive information including contact information, credit card numbers, and personal identification numbers. Examples of regular expressions that can be used to create rules are provided below.
The provided examples are meant to be a starting point for your Redact projects. They are not guaranteed to find and apply markups to every piece of sensitive information. We recommend double-checking documents before production to ensure that all sensitive information is properly redacted.
Before you begin
Image markup considerations
If you are using regular expressions for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
Named groups
To specify that only part of a regular expression match should be redacted, use the <redact> named group. Only one named group can be used per rule.
For example, the regular expression TFN(:|:\s|\s|)(?<redact>(\d{8,9})
will match TFN: 12345678 but will only apply a markup on the 12345678.
Email addresses and phone numbers regular expressions
Name | Description | Example |
---|
France Phone Numbers | This regular expression can be used to redact french phone numbers that include the country code but do not have delimiters. | \b([0O]?[1lI][1lI])?[3E][3E][0O]?[\dOIlZEASB]{9}\b |
Germany Phone Numbers | This regular expression can be used to redact German phone numbers. | \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b |
UK Phone Numbers | This regular expression can be used to redact UK phone numbers that include country code but does not have delimiters. | \b([0O]?[1lI][1lI])?[4A][4A][\dOIlZEASB]{10,11}\b |
US Phone Numbers | This regular expression can be used to redact US phone numbers. It is recommended to test this regex on a website like regex 101 with the phone numbers that appear in your document set before running it in a project to validate that it will match. This regular expression may be overly aggressive so sampling is recommended. | \b((\+|\b)[1l][\-\. ])?\(?\b[\dOlZSB]{3,5}([\-\. ]|\) ?)[\dOlZSB]{3}[\-\. ][\dOlZSB]{4}\b |
US Street Address | This regular expression uses a multi-state conditional hint to redact street addresses. Similar to phone numbers, this regular expression should be sampled for desired results before running against the full document set. | \b\d{1,8}\b[\s\S]{10,100}?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID| IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH| OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY)\b\s\d{5}\b |
Universal regular expressions
Name | Description | Example |
---|
Dates | This regular expression will match dates. | \b([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul| aug|sep|oct|nov|dec)[/\- ]([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[/\- ]\d{2,4}\b |
Email Addresses | This regular expression will match and redact full email addresses. | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}\b |
Redact specific email domains | | \b[a-z0-9._%\+\-—|]+@(gmail|Relativity)\.[a-z|]{2,6}\b |
Exclude specific email domains | | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}(?<!Relativity.com|gmail.com)\b |
Partial text (redact local email) | | (?<redact>[a-z0-9._%\+\-—|]+)@[a-z0-9.\-—|]+\.[a-z|]{2,6} |
Birth Dates | This regular expression uses contextual hints to locate and redact dates that are in proximity to words that typically denote birth dates. It's important to understand that while dates are regular patterns, birth dates are not.
If a non-birth date exists within close proximity of our contextual hits, it will be redacted. The contextual hint words in this regular expression are:
-
birth
- birth date
-
birthday
- date of birth
-
born
| \b(birth|birthdate|birthday|dob|born)\W+(?:\w+\W+){0,5}?(?<REDACT>(\d{4}|\d{1,2})[\/\-]\d{1,2}[\/\-](\d{4}|\d{1,2}))\b |
IPv4 | This regular expression will match and general IPv4 addresses. | \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b |
IPv6 | This regular expression will match and general IPv6 addresses. | \b([\d\w]{4}|0)(\:([\d\w]{4}|0)){7}\b |
Pre-OCR or HOCR images without any redaction rules | This regular expression does not find matches or apply markups. Instead, you can use this regular expression to begin running OCR for a large project without needing to create other rules first. You can then add the rules for matching terms while the OCR is running at your convenience. | \A |
Redact text after key term | | \bTerm\s?(?<redact>[\d]{9}) |
Financial accounts regular expressions
Credit cards
If using the regular expressions below for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so simply replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
Visa, MasterCard, AmEx
Copy1
\b((4\d{3}|5[1-5]\d{2}|2\d{3}|3[47]\d{1,2})[\s\-]?\d{4,6}[\s\-]?\d{4,6}?([\s\-]\d{3,4})?(\d{3})?)\b
This regular expression matches the following samples.
Visa
- 4532613257548007
- 4716563756075937
- 4929038415234561233
- 4718 4123 4142 4124
- 4716-5637-5607-5937
MasterCard
- 2720-9928-3988-7281
- 2720992839887281
- 5461718001676921
- 5489790994470834
- 5489 7909 9447 0834
- 5489-7909-9447-0834
American Express
- 372714876128394
- 346781676352683
- 376506566639896
- 3765 065666 39896
- 3467 816763 52683
- 3400 0000 0000 009
Individual brand credit cards
Card name | Regular Expressions |
---|
American Express card | \b3[47][0-9]{13}\b |
BCGlobal | \b(6541|6556)[0-9]{12}\b |
Carte Blanche card | \b389[0-9]{11}\b |
Diners Club card | \b3(?:0[0-5]|[68][0-9])[0-9]{11}\b |
Discover card | \b65[4-9][0-9]{13}|64[4-9][0-9]{13}|6011[0-9]{12}|(622(?:12[6-9]|1[3-9][0-9]|[2-8][0-9][0-9]|9[01][0-9]|92[0-5])[0-9]{10})\b |
Insta Payment card | \b63[7-9][0-9]{13}\b |
JCB card | \b(?:2131|1800|35\d{3})\d{11}\b |
Korean Local card | \b9[0-9]{15}\b |
Laser card | \b(6304|6706|6709|6771)[0-9]{12,15}\b |
Maestro card | \b(5018|5020|5038|6304|6759|6761|6763)[0-9]{8,15}\b |
Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})\b |
Solo card | \b(6334|6767)[0-9]{12}|(6334|6767)[0-9]{14}|(6334|6767)[0-9]{15}\b |
Switch card | \b(4903|4905|4911|4936|6333|6759)[0-9]{12}|(4903|4905|4911|4936|6333|6759)[0-9]{14}|(4903|4905|4911|4936|6333|6759)[0-9]{15}|564182[0-9]{10}|564182[0-9]{12}|564182[0-9]{13}|633110[0-9]{10}|633110[0-9]{12}|633110[0-9]{13}\b |
Union Pay card | \b(62[0-9]{14,17})\b |
Visa card | \b4[0-9]{12}(?:[0-9]{3})?\b |
Visa Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b |
Other financial accounts
Name | Description | Example |
---|
American Banker Association (ABA) transit routing numbers | This regular expression matches the following ABA routing numbers:
- 011103093
- Florida 067014822
- Maine 211274450
- Massachusetts/Rhode Island 211370545
- Metro DC/Maryland/Virginia 054001725
- New Hampshire 011400071
- New Jersey/Delaware 031201360
- New York – Metro NYC or former Commerce customers 026013673
-
New York – Upstate NY or former Banknorth customers 021302567
- North Carolina/South Carolina 05390219
- Pennsylvania 036001808
- Vermont 011600033
| \b((0[0-9])|(1[0-2])|(2[1-9])|(3[0-2])|(6[1-9])|(7[0-2])|80)([0-9]{7})\b |
US ABA Routing Transit Number | This regular expression can be used to apply markups to an ABA routing number. | \b(0[0-9]|1[0-2]|2[1-9]|3[0-2]|6[1-9]|7[0-2]|80)\d{7}\b |
SWIFT Code | This regular expression can be used to redact SWIFT codes for payment instruction information. | \b[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?\b |
IBAN Codes | This regular expression can be used to redact IBAN codes for payment instruction information. | (?:(?:IT|SM)\d{2}[\w]\d{22}|CY\d{2}[\w]\d{23}|NL\d{2}[\w]{4}\d{10}|LV\d{2}[\w]{4}\d{13}|(?:BG|BH|GB|IE)\d{2}[\w]{4}\d{14}|GI\d{2}[\w]{4}\d{15}|RO\d{2}[\w]{4}\d{16}|KW\d{2}[\w]{4}\d{22}|MT\d{2}[\w]{4}\d{23}|NO\d{13}|(?:DK|FI|GL|FO)\d{16}|MK\d{17}|(?:AT|EE|KZ|LU|XK)\d{18}|(?:BA|HR|LI|CH|CR)\d{19}|(?:GE|DE|LT|ME|RS)\d{20}|IL\d{21}|(?:AD|CZ|ES|MD|SA)\d{22}|PT\d{23}|(?:BE|IS)\d{24}|(?:FR|MR|MC)\d{25}|(?:AL|DO|LB|PL)\d{26}|(?:AZ|HU)\d{27}|(?:GR|MU)\d{28}) |
Government identification numbers regular expressions
These regular expressions can be used to help you apply markups to personally identifiable information related to government records such as tax IDs, government issued IDs, and more.
Name | Description | Example |
---|
Argentina
National Identity (DNI) Number | This regular expression matches the following examples: - 34.960.099
- 63.889.141
-
40.571.278
- 45.855.200
- 80.933.831
| \d{2}\.\d{3}\.\d{3} |
Australia
Tax File Number | The Australian tax file number is typically 8-9 digit numbers preceded by TFN. Other indicators in the government website may say things like payee's tax file number or tax file number.
Typically, you can use these indicators to confirm that the 8-9 digits are a TFN. To apply markups, you will need to select the Character option for the Markup Scope field as well.
For example, when the TFN appears like TFN: 12345678, you can search for TFN: and then use the Redact named group redaction hint to only redact the numbers.
TFN(:|:\s|\s|)(?<redact>(\d{8,9})
This regular expression will match TFN: 12345678 but will only place a redaction on the 12345678. | |
Canada
Passport ID | This regular expression will match Canadian passport IDs. | \b[\w]{2}[\d]{6}\b |
Canada Postal code | This regular expression will match Canadian postal codes. | \b[a-z]\d[a-z][ -]?\d[a-z]\d\b |
Canada Social Insurance Number | This regular expression will match Canadian insurance numbers. | \b(\d{3}[\—\-_]\d{3}[\—\-_]\d{3})|(\d{9})\b |
Croatia Vat ID card number | This regular expression will match Croatian VAT ID card number. | \bHR\d{11}\b |
Czech Republic Vat ID card number | This regular expression will match Czech Republic VAT ID card number. | \bCZ\d{8,10}\b |
Denmark
Personal ID number | This regular expression will match Denmark Personal ID number. | \b\d{10}|\d{6}[-\s]\d{4}\b |
France National ID card (CNI) | This regular expression will match France's National ID card (CNI). | \b\b\d{12}\b\b |
France Social Security Number (INSEE) | This regular expression will match France's Social Security Number (INSEE). | \b\d{13}|\d{13}\s\d{2}\b |
France Driver's License ID | This regular expression will match France's Driver's license ID. | \b\d{12}\b |
France Passport ID | This regular expression will match France's Passport ID. | \b\d{2}11\d{5}\b |
Germany
ID card number | This regular expression will match Germany's ID card number. | \bl\d{8}\b |
Germany Passport ID | This regular expression will match Germany's Passport ID. | \b[cfghjk]\d{3}\w{5}\d\b |
Germany Driver's License ID | This regular expression will match Germany's Driver's License ID. | \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b |
Ireland
Personal Public Service (PPS) Number | This regular expression will match Personal Public Service (PPS) Number. | \b\d{7}\w{1,2}\b |
Netherlands
Citizen's Service (BSN) number | This regular expression will match Citizen's Service (BSN) number. | \b\d{8}|\d{3}[-\.\s]\d{3}[-\.\s]\d{3}\b |
Poland
National ID (PESEL) | This regular expression will match Poland's National ID (PESEL). | \b\d{11}\b |
Portugal
Citizen Card Number | This regular expression will match Portugal's Citizen Card Number. | \d{9}[\w\d]{2}|\d{8}-\d[\d\w]{2}\d |
Spain
Social Security Number (SSN) | This regular expression will match Spain's Social Security Number. | \b\d{2}\/?\d{8}\/?\d{2}\b |
Sweden
Passport ID | This regular expression will match Sweden's Passport ID. | \b\d{8}\b |
United Kingdom
Passport ID | This regular expression will match United Kingdom's Passport ID. | \b\d{9}\b |
United Kingdom Driver's License ID | This regular expression will match United Kingdom's Driver's License ID. | \b[\w9]{5}\d{6}[\w9]{2}\d{5}\b |
United Kingdom National Health Service (NHS) number | This regular expression will match United Kingdom's National Health Service (NHS) number. | \b\d{3}\s\d{3}\s\d{4}\b |
United States
Social Security Number (SSN) | These regular expressions are optimized for image and spreadsheet documents respectively. The image version of the SSN Regex is specifically created to be defensive against common OCR mistakes such as 1 being read as l, i, or I. | Image Projects
\b[\dlZEASBO]{3} [\dlZEASBO]{2} [\dlZEASBO]{4}|([\dlZEASBO] ?){3}[\—\-_] ?([\dlZEASBO] ?){2}[\—\-_] ?([\dlZEASBO] ?){4}\b Spreadsheet Projects \b[\d]{3} [\d]{2} [\d]{4}|([\d] ?){3}[\—\-_] ?([\d] ?){2}[\—\-_] ?([\d] ?){4}\b Spreadsheet and PDF projects redact all but the last four digits of SSN with spaces (?<redact>\b[\d]{3} [\d]{2}) [\d]{4} Spreadsheet and PDF projects redact all but the last four digits of SSN with hyphens (?<redact>([\d] ?){3}[\—\-_] ?([\d] ?){2}[\—\-_]) ?([\d] ?){4}\b |
US Federal Employer Identification Number | This regular expression will match the Federal Employer ID number. | \b[0-9]{2}[—\-][0-9]{7}\b |