RegEx is a string of characters that represents a pattern. You can use RegEx to search for text that matches these patterns. For example, to detect Employee ID’s that consist of 2 capital letters followed by 5 digits, you could create the following custom detector using the expression: \b([A-Z]{2}[\d]{5})\b
Where:
- \b represents a word boundary
- [A-Z]{2} represents two capitalized letters in the range A to Z
- [\d]{5} represents 5 digits
- The parentheses ( ) are put around the token we want to capture as the ID
Example scenario
For a particular project, it may be important to identify Employee ID’s. Employee ID is not an out of the box detector, so building a custom detector is required.
Employee ID’s look like the following:
In other words, they are all in the form of one capital letter followed by 5 digits, followed by two more capital letters.
Then, the corresponding regex would be: [A-Z]\d{5}[A-Z]{2}
Following the steps described in “Testing RegExes and Keywords,” you can use the interface to test whether this regex works:
In the box in the bottom right-hand corner, the text says:
This indicates that the regex successfully recognizes Employee ID’s.
RegEx recommendations
- Avoid the * character when creating regexes, as they can result in performance issues.
- Data Breach Response uses the Java 8 version of regexes.