Personal Information detectors

From the Project Settings page, you can view all detectors and tags, add new detectors and tags, and turn detector tags on and off.

Definitions

  • Detector (PI): A combination of AI models, regular expressions (RegExes) and keywords that detect a string of text and classify it as a form of Personal Information.
    PI detectors must have at least one regular expression.
  • Detector (Document): A combination of keywords that detect a string of text and tag the document as containing the text. Also referred to as a document category.
  • Tag: A label reviewers can apply manually to a document for classification purposes.
    Tags do not contain regexes or keywords.
  • Flag: A system or user-generated label indicating a document may require special treatment.

Permissions

Settings are available for users assigned the role of Lead.

Viewing detectors

On the Detectors and Tags tab of the Project Settings page, detectors can be viewed with their:

  • Enabled Status
    • Enabled - Checkbox
    • Disabled - Unchecked Box
  • Name: Name of the Detector

Specific detectors can also be searched for on the Detectors & Tags tab.

Turning detectors or tags on and off

To turn on a detector, ensure the checkbox next to the corresponding detector is checked.

To turn off a detector, ensure the checkbox next to the corresponding detector is unchecked.

Limitations

The quality of PI detections may be impacted by the quality of unstructured documents.

Data Breach Response uses extracted text to create PI detections on unstructured documents. The formatting of the incoming text will affect the performance of PI detections. The following are examples of what will impact detection performance:

  • Optical character recognition (OCR) quality may affect the quality of detections. If your source data is images and you use OCR technology to generate the text, incorrectly generated text may affect the detector performance.
  • Lack of standard punctuation or casing.

Note: Data Breach Response’s PI detectors only support single language text. If a document includes multiple languages, the output may not be accurate. For documents that do not have English as the primary language, run structured analytics language identification against your document set. Data Breach Response will use the primary language identified when running the AI Incorporate Feedback Process.