

Regular expressions can be used to help locate sensitive information including contact information, credit card numbers, and personal identification numbers. Examples of regular expressions that can be used to create rules are provided below.
If you are using regular expressions for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
To specify that only part of a regular expression match should be redacted, use the <redact> named group. Only one named group can be used per rule.
For example, the regular expression TFN(:|:\s|\s|)(?<redact>(\d{8,9}) will match TFN: 12345678 but will only apply a markup on the 12345678.
Name | Description | Example |
---|---|---|
France Phone Numbers | This regular expression can be used to redact french phone numbers that include the country code but do not have delimiters. | \b([0O]?[1lI][1lI])?[3E][3E][0O]?[\dOIlZEASB]{9}\b |
Germany Phone Numbers | This regular expression can be used to redact German phone numbers. | \b[\d\w]\d{2}[\d\w]{6}\d[\d\w]\b |
UK Phone Numbers | This regular expression can be used to redact UK phone numbers that include country code but does not have delimiters. | \b([0O]?[1lI][1lI])?[4A][4A][\dOIlZEASB]{10,11}\b |
US Phone Numbers | This regular expression can be used to redact US phone numbers. It is recommended to test this regex on a website like regex 101 with the phone numbers that appear in your document set before running it in a project to validate that it will match. This regular expression may be overly aggressive so sampling is recommended. | \b((\+|\b)[1l][\-\. ])?\(?\b[\dOlZSB]{3,5}([\-\. ]|\) ?)[\dOlZSB]{3}[\-\. ][\dOlZSB]{4}\b |
US Street Address | This regular expression uses a multi-state conditional hint to redact street addresses. Similar to phone numbers, this regular expression should be sampled for desired results before running against the full document set. | \b\d{1,8}\b[\s\S]{10,100}?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID| IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH| OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY)\b\s\d{5}\b |
Name | Description | Example |
---|---|---|
Dates | This regular expression will match dates. | \b([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul| aug|sep|oct|nov|dec)[/\- ]([0-3]?\d(st)?(th)?|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[/\- ]\d{2,4}\b |
Email Addresses | This regular expression will match and redact full email addresses. | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}\b |
Redact specific email domains | \b[a-z0-9._%\+\-—|]+@(gmail|Relativity)\.[a-z|]{2,6}\b | |
Exclude specific email domains | \b[a-z0-9._%\+\-—|]+@[a-z0-9.\-—|]+\.[a-z|]{2,6}(?<!Relativity.com|gmail.com)\b | |
Partial text (redact local email) | (?<redact>[a-z0-9._%\+\-—|]+)@[a-z0-9.\-—|]+\.[a-z|]{2,6} | |
Birth Dates |
This regular expression uses contextual hints to locate and redact dates that are in proximity to words that typically denote birth dates. It's important to understand that while dates are regular patterns, birth dates are not. If a non-birth date exists within close proximity of our contextual hits, it will be redacted. The contextual hint words in this regular expression are:
|
\b(birth|birthdate|birthday|dob|born)\W+(?:\w+\W+){0,5}?(?<REDACT>(\d{4}|\d{1,2})[\/\-]\d{1,2}[\/\-](\d{4}|\d{1,2}))\b |
IPv4 | This regular expression will match and general IPv4 addresses. | \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b |
IPv6 | This regular expression will match and general IPv6 addresses. | \b([\d\w]{4}|0)(\:([\d\w]{4}|0)){7}\b |
Pre-OCR or HOCR images without any redaction rules | This regular expression does not find matches or apply markups. Instead, you can use this regular expression to begin running OCR for a large project without needing to create other rules first. You can then add the rules for matching terms while the OCR is running at your convenience. | \A |
Redact text after key term | \bTerm\s?(?<redact>[\d]{9}) |
If using the regular expressions below for an automated image markup project, results may be improved by being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters.
To do so simply replace the \d pattern with [\dOIlZEASB]. This type of defensive matching is not typically needed for spreadsheets.
\b((4\d{3}|5[1-5]\d{2}|2\d{3}|3[47]\d{1,2})[\s\-]?\d{4,6}[\s\-]?\d{4,6}?([\s\-]\d{3,4})?(\d{3})?)\b
This regular expression matches the following samples.
Visa
MasterCard
American Express
Card name | Regular Expressions |
---|---|
American Express card | \b3[47][0-9]{13}\b |
BCGlobal | \b(6541|6556)[0-9]{12}\b |
Carte Blanche card | \b389[0-9]{11}\b |
Diners Club card | \b3(?:0[0-5]|[68][0-9])[0-9]{11}\b |
Discover card | \b65[4-9][0-9]{13}|64[4-9][0-9]{13}|6011[0-9]{12}|(622(?:12[6-9]|1[3-9][0-9]|[2-8][0-9][0-9]|9[01][0-9]|92[0-5])[0-9]{10})\b |
Insta Payment card | \b63[7-9][0-9]{13}\b |
JCB card | \b(?:2131|1800|35\d{3})\d{11}\b |
Korean Local card | \b9[0-9]{15}\b |
Laser card | \b(6304|6706|6709|6771)[0-9]{12,15}\b |
Maestro card | \b(5018|5020|5038|6304|6759|6761|6763)[0-9]{8,15}\b |
Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})\b |
Solo card | \b(6334|6767)[0-9]{12}|(6334|6767)[0-9]{14}|(6334|6767)[0-9]{15}\b |
Switch card | \b(4903|4905|4911|4936|6333|6759)[0-9]{12}|(4903|4905|4911|4936|6333|6759)[0-9]{14}|(4903|4905|4911|4936|6333|6759)[0-9]{15}|564182[0-9]{10}|564182[0-9]{12}|564182[0-9]{13}|633110[0-9]{10}|633110[0-9]{12}|633110[0-9]{13}\b |
Union Pay card | \b(62[0-9]{14,17})\b |
Visa card | \b4[0-9]{12}(?:[0-9]{3})?\b |
Visa Mastercard | \b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b |
Name | Description | Example |
---|---|---|
American Banker Association (ABA) transit routing numbers |
This regular expression matches the following ABA routing numbers:
|
\b((0[0-9])|(1[0-2])|(2[1-9])|(3[0-2])|(6[1-9])|(7[0-2])|80)([0-9]{7})\b |
US ABA Routing Transit Number | This regular expression can be used to apply markups to an ABA routing number. | \b(0[0-9]|1[0-2]|2[1-9]|3[0-2]|6[1-9]|7[0-2]|80)\d{7}\b |
SWIFT Code | This regular expression can be used to redact SWIFT codes for payment instruction information. | \b[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?\b |
IBAN Codes | This regular expression can be used to redact IBAN codes for payment instruction information. | (?:(?:IT|SM)\d{2}[\w]\d{22}|CY\d{2}[\w]\d{23}|NL\d{2}[\w]{4}\d{10}|LV\d{2}[\w]{4}\d{13}|(?:BG|BH|GB|IE)\d{2}[\w]{4}\d{14}|GI\d{2}[\w]{4}\d{15}|RO\d{2}[\w]{4}\d{16}|KW\d{2}[\w]{4}\d{22}|MT\d{2}[\w]{4}\d{23}|NO\d{13}|(?:DK|FI|GL|FO)\d{16}|MK\d{17}|(?:AT|EE|KZ|LU|XK)\d{18}|(?:BA|HR|LI|CH|CR)\d{19}|(?:GE|DE|LT|ME|RS)\d{20}|IL\d{21}|(?:AD|CZ|ES|MD|SA)\d{22}|PT\d{23}|(?:BE|IS)\d{24}|(?:FR|MR|MC)\d{25}|(?:AL|DO|LB|PL)\d{26}|(?:AZ|HU)\d{27}|(?:GR|MU)\d{28}) |
These regular expressions can be used to help you apply markups to personally identifiable information related to government records such as tax IDs, government issued IDs, and more.
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!