Create a new detector

Follow the instructions below to create a new detector or tag, create a new document level detector, and test regular expressions and keywords.

Create a new detector or tag

To create a new detector or tag:

  1. From Project Settings, click the New button at the top left corner of the page.
  2. In the settings menu on the right, enter a name, description, and category for the detector and configure the PII Type.
    • PII type: Whether it will be considered Personal Information that belongs to an entity or the entity itself.
    An image of the settings menu.
  3. Click Save.
  4. If you're only creating a tag, click Save again and move on.
    Otherwise, proceed to Add regular expressions.
Note: Tags can be identified in the detector list on the Detectors and Tags tab using the Built column. Since tags do not contain regular expressions, it is not required to Save and Build when it is created. Tags will always have a value of No in the Built column.

An image of a list of tags and their descriptions.

Add regular expressions

Note: PI Detect uses the Java 8 version of regular expressions.

To add regular expressions:

  1. After saving your new detector or tag, open the Regular Expressions tab.
    An image of the Regular Expressions tab.
  2. Click New Regex.
  3. Enter a regular expression in the Regular Expression text box.
  4. Select a Type for the regular expression.
    • Normal— Normal regex patterns match PI within text. The machine learning model filters any PI matches and successful matches are returned as PI.
    • Bypass model— It is sometimes convenient to bypass the machine learning model, if it is certain that the regex is targeted enough to capture PI. For example, it’s very likely that the text following “BIC code” is in fact a BIC code, so the following regex pattern can be specified to capture this:
      \bbic\s*(code\s*){0,1}:*\s+\b([A-Z]{6}[\dA-Z]{2}(?:[\dA-Z]{3})?)\b.
      This regex matches PI like “Bic code: ABCDEF” and is not passed to a machine learning model for more filtering.
    • Blocklist— A blocklist regex can be used to specify negative patterns for PI. For example, for an SSN detector, it may be beneficial to specify that placeholder social security numbers like 000-00-000 should not be returned.
  5. Specify a Match Group for the regular expression, if necessary.
    • Match group indicates which matching group has the PI. For example, take the following regular expression:
      (ssn|social security number)\s*+:\s*+(\d{3}-\d{2}-\d{4})
      This regular expression matches two groups, (ssn|social security number), and (\d{3}-\d{2}-\d{4}), but only group 2 has the personal information to be captured. Therefore, the match group would be set to 2.
  6. Test the regular expression. For more information on how to test regExes and keywords, see Testing RegExes and keywords
  7. Click Save.
  8. Repeat steps 5-8 as necessary.
    Note: There is no limit to how many regExes, of any type, that can be specified for a detector.

Add keywords

To add keywords:

  1. Navigate to the Keywords tab and click New Keyword.
  2. Enter a keyword in the Keyword text box.
    An image of the Keyword text box.
  3. Specify a Type for the keyword:
    • Global Keyword— A global keyword term is a term that must appear somewhere in the body of the document. If a global keyword is not found in the document, the detector will not return PI matches.
    • Global Blocklist Keyword— A global blocklist term is a term that must not appear anywhere in the body of the document. If a global blocklist term appears in the document, the detector will not return any PI matches.
    • Local Keyword— A local keyword term is a term that must appear near a PI matched via a regex pattern. You can specify a maximum distance in characters to indicate how far away the term should be on either side of the PI found. If the term is not found within the specified distance, the detector will not return that PI match.
    • Local Blocklist Keyword— A local blocklist term is a term that must not appear in the vicinity of a PI match. You can specify a maximum distance. If the local blocklist term appears within that distance of a PI match, the PI match will not be returned.
  4. If you select a Local Keyword or a Local Blocklist Keyword, specify a Max Keyword Distance.
    • The Max Keyword Distance dictates how far away a keyword should be on either side of information found by a regular expression.
    • The default value is 40 characters.
  5. Test the keyword. For more information on how to test regExes and keywords, see Testing RegExes and keywords.
  6. Click Save.
  7. Repeat steps 1-6 as needed.

Save the detector

To save a detector:

  1. Navigate to the Settings tab for the detector. Click Save.
    Note: For updates to a detector to be incorporated the next time you run Incorporate Feedback, the entire detector must be saved.
  2. If the detector was just created, click Save and Build.
    An image of the Settings tab for detectors.

Create a new document level detector

Document-level detectors apply to the document as a whole and do not locally identify or highlight individual pieces of PI. Because they are looking for whole documents rather than individual pieces of PI, document-level detectors do not accept regular expressions or local keywords. Only global keywords and global blocklist keywords are used to configure document-level detectors. While document-level detectors appear in the detector list on the Detectors and Tags tab, you must create them from the Document Categories tab.

To create a new document level detector:

  1. From Project Settings, navigate to the Document Categories tab.
    An image of the Document Categories tab.
  2. Click the New button at the top left corner of the page.
  3. Give the detector a Name and Description.
  4. Click Save.
  5. Click on the Keywords tab.
    An image of the Keywords tab.
  6. Click New Keyword.
  7. Add a new Keyword in the Keyword Term text box.
    An image of the Keyword Term text box.
  8. Specify a Type for the keyword.
    • Global— A global keyword term is a term that must appear somewhere in the body of the document. If a global keyword does not appear in the document, the classifier will not be applied to a document.
    • Global Blocklist— A global blocklist term is a term that must not appear anywhere in the body of the document. If a global blocklist term appears in the document, the classifier will not be applied to a document.
  9. Test the keyword. For more information on how to test RegExes and keywords, see Testing RegExes and keywords.
  10. Click Save.
  11. Repeat steps 3-10 as necessary.
  12. Navigate to the Settings tab for the detector. Click Save.
    Note: For updates to a detector to be incorporated the next time you run Incorporate Feedback, you must save the entire detector.

Testing RegExes and keywords

Before adding a new regular expression or keyword to a detector, or while editing an existing one, you can test the regex or keyword using a test string.

To test a regEx or keyword:

  1. After adding or editing a detector, select Test String from the Testing Options drop down.
  2. Copy and paste a test string containing the information to capture.
    • Test information that should not be captured to be sure that the regEx or keyword is only capturing intended information.
      An image of the regular expressions tab with a  regular expression and test string populated.
  3. The Match Information box will display the result.
    • No matches found if the regEx/keyword did not pick up the information or
    • Detected PI 0: [information that has been detected] if there is a match.
      An image of the Match Information box.

Frequently asked questions