Personal Information detectors

From the Project Settings page, you can view all detectors and tags, add new detectors and tags, and turn detector tags on and off.
An image of the Detectors and Tags tab found on the Project Settings page.

Definitions

The following terms are found throughout Personal Information detectors documentation:

  • Detector (PI)— A combination of AI models, regular expressions (RegExes) and keywords that detect a string of text and classify it as a form of Personal Information.
    PI detectors must have at least one regular expression.
  • Detector (Document)—A combination of keywords that detect a string of text and tag the document as containing the text.
  • Tag—A label reviewers can apply manually to a document for classification purposes.
    Tags do not contain regexes or keywords.
  • Flag—A system or user-generated label indicating a document may require special treatment.
  • Primary Detector—detectors where the expected ratio between a person and the PI Type is 1:1, meaning a single person can only have one of that PI value and that PI value can only apply to one person. Examples of this include Social Security Number and Passport Number.

  • Secondary Detector—detectors where the expected ratio between a person and the PI Type is 1:many, meaning a single person can have many of this PI value but a given PI value only applies to one person OR a single person only has one PI value but the PI value could be the same on different entities. Examples of this include email address and date of birth.

  • Tertiary Detector—detectors where the expected ratio between a person and the PI Type is many:many, meaning a single person can have many PI values of this type and a given PI value can apply to many people. Examples of this include address and phone number.

Permissions

Settings are available for users assigned the role of Lead.

Viewing detectors

On the Detectors and Tags tab of the Project Settings page, detectors can be viewed with their:

  • Enabled Status
    • Enabled - Checkbox
    • Disabled - Unchecked Box
  • Name: Name of the Detector

Specific detectors can also be searched for on the Detectors & Tags tab.

Turning detectors or tags on and off

To turn on a detector, ensure the checkbox next to the corresponding detector is checked.

To turn off a detector, ensure the checkbox next to the corresponding detector is unchecked.

Limitations

The quality of PI detections may be impacted by the quality of unstructured documents.

Data Breach Response uses extracted text to create PI detections on unstructured documents. The formatting of the incoming text will affect the performance of PI detections. The following are examples of what will impact detection performance:

  • Optical character recognition (OCR) quality may affect the quality of detections. If your source data is images and you use OCR technology to generate the text, incorrectly generated text may affect the detector performance.
  • Lack of standard punctuation or casing.

Note: Data Breach Response’s PI detectors only support single language text. If a document includes multiple languages, the output may not be accurate. For documents that do not have English as the primary language, run structured analytics language identification against your document set. Data Breach Response will use the primary language identified when running the Data Analysis