Regular expression examples
Regular expressions can be used to help locate sensitive information including contact information, credit card numbers, and personal identification numbers. Examples of regular expressions that can be used to create rules are provided below.
Before you begin
Image markup considerations
If you are using regular expressions for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
Named groups
To specify that only part of a regular expression match should be redacted, use the <redact> named group. Only one named group can be used per rule.
For example, the regular expression TFN(:|:\s|\s|)(?<redact>(\d{8,9}) will match TFN: 12345678 but will only apply a markup on the 12345678.
Email addresses and phone numbers regular expressions
Name | Description | Example |
---|---|---|
France Phone Numbers | This regular expression can be used to redact french phone numbers that include the country code but do not have delimiters. | \b([0O]?[1lI][1lI])?[3E][3E][0O]?[\dOIlZEASB]{9}\b |
Germany Phone Numbers | This regular expression can be used to redact German phone numbers. | \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b |
UK Phone Numbers | This regular expression can be used to redact UK phone numbers that include country code but does not have delimiters. | \b([0O]?[1lI][1lI])?[4A][4A][\dOIlZEASB]{10,11}\b |
US Phone Numbers | This regular expression can be used to redact US phone numbers. It is recommended to test this regex on a website like regex 101 with the phone numbers that appear in your document set before running it in a project to validate that it will match. This regular expression may be overly aggressive so sampling is recommended. | \b((\+|\b)[1l][\-\. ])?\(?\b[\dOlZSB]{3,5}([\-\. ]|\) ?)[\dOlZSB]{3}[\-\. ][\dOlZSB]{4}\b |
US Street Address | This regular expression uses a multi-state conditional hint to redact street addresses. Similar to phone numbers, this regular expression should be sampled for desired results before running against the full document set. | \b\d{1,8}\b[\s\S]{10,100}?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID| IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH| OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY)\b\s\d{5}\b |
Universal regular expressions
Name | Description | Example |
---|---|---|
Dates | This regular expression will match dates. | \b([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul| aug|sep|oct|nov|dec)[/\- ]([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[/\- ]\d{2,4}\b |
Email Addresses | This regular expression will match and redact full email addresses. | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}\b |
Redact specific email domains | \b[a-z0-9._%\+\-—|]+@(gmail|Relativity)\.[a-z|]{2,6}\b | |
Exclude specific email domains | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}(?<!Relativity.com|gmail.com)\b | |
Partial text (redact local email) | (?<redact>[a-z0-9._%\+\-—|]+)@[a-z0-9.\-—|]+\.[a-z|]{2,6} | |
Birth Dates |
This regular expression uses contextual hints to locate and redact dates that are in proximity to words that typically denote birth dates. It's important to understand that while dates are regular patterns, birth dates are not. If a non-birth date exists within close proximity of our contextual hits, it will be redacted. The contextual hint words in this regular expression are:
|
\b(birth|birthdate|birthday|dob|born)\W+(?:\w+\W+){0,5}?(?<REDACT>(\d{4}|\d{1,2})[\/\-]\d{1,2}[\/\-](\d{4}|\d{1,2}))\b |
IPv4 | This regular expression will match and general IPv4 addresses. | \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b |
IPv6 | This regular expression will match and general IPv6 addresses. | \b([\d\w]{4}|0)(\:([\d\w]{4}|0)){7}\b |
Pre-OCR or HOCR images without any redaction rules | This regular expression does not find matches or apply markups. Instead, you can use this regular expression to begin running OCR for a large project without needing to create other rules first. You can then add the rules for matching terms while the OCR is running at your convenience. | \A |
Redact text after key term | \bTerm\s?(?<redact>[\d]{9}) |
Financial accounts regular expressions
Credit cards
If using the regular expressions below for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so simply replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
Visa, MasterCard, AmEx
\b((4\d{3}|5[1-5]\d{2}|2\d{3}|3[47]\d{1,2})[\s\-]?\d{4,6}[\s\-]?\d{4,6}?([\s\-]\d{3,4})?(\d{3})?)\b
This regular expression matches the following samples.
Visa
- 4532613257548007
- 4716563756075937
- 4929038415234561233
- 4718 4123 4142 4124
- 4716-5637-5607-5937
MasterCard
- 2720-9928-3988-7281
- 2720992839887281
- 5461718001676921
- 5489790994470834
- 5489 7909 9447 0834
- 5489-7909-9447-0834
American Express
- 372714876128394
- 346781676352683
- 376506566639896
- 3765 065666 39896
- 3467 816763 52683
- 3400 0000 0000 009
Individual brand credit cards
Card name | Regular Expressions |
---|---|
American Express card | \b3[47][0-9]{13}\b |
BCGlobal | \b(6541|6556)[0-9]{12}\b |
Carte Blanche card | \b389[0-9]{11}\b |
Diners Club card | \b3(?:0[0-5]|[68][0-9])[0-9]{11}\b |
Discover card | \b65[4-9][0-9]{13}|64[4-9][0-9]{13}|6011[0-9]{12}|(622(?:12[6-9]|1[3-9][0-9]|[2-8][0-9][0-9]|9[01][0-9]|92[0-5])[0-9]{10})\b |
Insta Payment card | \b63[7-9][0-9]{13}\b |
JCB card | \b(?:2131|1800|35\d{3})\d{11}\b |
Korean Local card | \b9[0-9]{15}\b |
Laser card | \b(6304|6706|6709|6771)[0-9]{12,15}\b |
Maestro card | \b(5018|5020|5038|6304|6759|6761|6763)[0-9]{8,15}\b |
Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})\b |
Solo card | \b(6334|6767)[0-9]{12}|(6334|6767)[0-9]{14}|(6334|6767)[0-9]{15}\b |
Switch card | \b(4903|4905|4911|4936|6333|6759)[0-9]{12}|(4903|4905|4911|4936|6333|6759)[0-9]{14}|(4903|4905|4911|4936|6333|6759)[0-9]{15}|564182[0-9]{10}|564182[0-9]{12}|564182[0-9]{13}|633110[0-9]{10}|633110[0-9]{12}|633110[0-9]{13}\b |
Union Pay card | \b(62[0-9]{14,17})\b |
Visa card | \b4[0-9]{12}(?:[0-9]{3})?\b |
Visa Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b |
Other financial accounts
Name | Description | Example |
---|---|---|
American Banker Association (ABA) transit routing numbers |
This regular expression matches the following ABA routing numbers:
|
\b((0[0-9])|(1[0-2])|(2[1-9])|(3[0-2])|(6[1-9])|(7[0-2])|80)([0-9]{7})\b |
US ABA Routing Transit Number | This regular expression can be used to apply markups to an ABA routing number. | \b(0[0-9]|1[0-2]|2[1-9]|3[0-2]|6[1-9]|7[0-2]|80)\d{7}\b |
SWIFT Code | This regular expression can be used to redact SWIFT codes for payment instruction information. | \b[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?\b |
IBAN Codes | This regular expression can be used to redact IBAN codes for payment instruction information. | (?:(?:IT|SM)\d{2}[\w]\d{22}|CY\d{2}[\w]\d{23}|NL\d{2}[\w]{4}\d{10}|LV\d{2}[\w]{4}\d{13}|(?:BG|BH|GB|IE)\d{2}[\w]{4}\d{14}|GI\d{2}[\w]{4}\d{15}|RO\d{2}[\w]{4}\d{16}|KW\d{2}[\w]{4}\d{22}|MT\d{2}[\w]{4}\d{23}|NO\d{13}|(?:DK|FI|GL|FO)\d{16}|MK\d{17}|(?:AT|EE|KZ|LU|XK)\d{18}|(?:BA|HR|LI|CH|CR)\d{19}|(?:GE|DE|LT|ME|RS)\d{20}|IL\d{21}|(?:AD|CZ|ES|MD|SA)\d{22}|PT\d{23}|(?:BE|IS)\d{24}|(?:FR|MR|MC)\d{25}|(?:AL|DO|LB|PL)\d{26}|(?:AZ|HU)\d{27}|(?:GR|MU)\d{28}) |
Government identification numbers regular expressions
These regular expressions can be used to help you apply markups to personally identifiable information related to government records such as tax IDs, government issued IDs, and more.