Last date modified: 2026-May-01

Personal Information detectors

Personal Information (PI) detectors help you identify and classify sensitive data—such as Social Security numbers, email addresses, or passport numbers—during Data Analysis in aiR for Data Breach Response. You configure which detectors to use from the Settings tab before running Data Analysis. Detector selections affect what PI is identified, how entities are deduplicated, and how individuals appear in Entity Analysis and reporting.

Definitions

The following terms are found throughout Personal Information detectors documentation:

  • Detector (PI)— A combination of AI models, regular expressions (RegEx), and keywords that identifies strings of text and classifies them as a type of Personal Information.
  • Out of the Box Detectors - Out of the Box (OOTB) Detectors are pre-built detectors created for aiR for Data Breach Response.
  • Custom Detectors - Custom Detectors are created by users to solve the particular needs of a given project outside of the OOTB offerings.
  • Deduplication Identifiers - Determine how entities are normalized during Entity Analysis.

    Deduplication Identifier Expected Relationship Description Examples
    Primary 1:1 One PI value belongs to one individual, and one individual has only one value of this type. Social Security Number, Passport Number
    Secondary 1:many or many:1 An individual may have multiple values, or the same value may appear across individuals. Email Address, Date of Birth
    Tertiary many:many Multiple individuals can share multiple values of this PI type. Address, Phone Number

Permissions

Settings are available for users assigned the role of Lead.

Viewing detectors

The Detector table shows the description, status, and other information about each detector.

To view the Detectors table:

  1. Open the Settings tab in aiR for Data Breach Response.
  2. Select the Detectors subtab.
  3. Review the Detectors table to view enabled status, categories, and deduplication identifiers.
  4. Select a detector to view or modify its details.


An image of the Detectors table

Detectors fields

The following fields appear on the Detectors table:

Field Description
Detector Names of Out of the Box Detectors and Custom Detectors
Description A description of each detector
Enabled

Yes/No status

Detectors set to No will not be run during Data Analysis.

You can disable detectors to limit PI identification to only the data types relevant to your incident. Disabling unnecessary detectors can reduce noise, improve review efficiency, and simplify entity normalization results.
Created By Lists whether the detector was created by the system or a user.
Category Lists the detector category.
Deduplication Identifier The settings that will be used in entity normalization.

Click on any detector to see the details for that detector. You can sort or filter the list using the filter icon in the top right corner of the table.

Supported Personal Information detectors

aiR for Data Breach Response supports the following out of the box detectors:

Detector Name Description Default Status Category
ABA Routing Number A nine-digit code used to identify U.S. banks for financial transactions Off Financial
Account Number A number used to identify a specific financial account On Financial
Address A location where a person lives or receives mail On Contact
Age All age terms and phrases Off Demographic
Australia Tax File Number (TFN) A unique number assigned to individuals and organizations in Australia for tax identification purposes Off Asia Pacific
Australian Individual Healthcare Identifier (IHI) A unique number assigned to a medical account in Australia for identification and billing purposes Off Asia Pacific
Australian Medicare Provider Number Identifiers and details for healthcare providers registered with Medicare in Australia Off Patient
Credit/Debit Card Expiration The month and year when a credit card expires On Financial
Credit/Debit Card Number A unique number used to identify a credit card account On Financial
Credit/Debit Card Security Code A short numeric code used to verify credit card transactions On Financial
Date of Birth The full calendar date when a person was born On Demographic
Date of Death The calendar date when a person passed away On Patient
Driver License Number A unique number assigned to a licensed driver On North America
Email Address A person's email address On Contact
EU VAT (value added tax) Number A unique identifier assigned to businesses for Value Added Tax purposes within the European Union. Each EU country has its own format Off Financial
Full Name A person’s complete name, including first and last names On Contact
Health insurance number Someone's health insurance identification number Off Patient
International Bank Account Number A globally recognized number used to identify a bank account for international transactions Off Financial
Medical dates of service Someone's medical appointment or service dates Off Patient
Medical provider name Someone's medical provider or healthcare facility name Off Patient
Medical record number Someone's medical record number Off Patient
National ID Number A government-issued unique identifier used to verify an individual's identity within a country Off Identification
Other Any personal information not covered by standard categories Off Other
Partial Credit/Debit Card Number A portion of a credit card number that does not include the full sequence Off Financial
Partial Date of Birth The year a person was born, without the full birthdate Off Demographic
Partial Social Security Number A portion of a US Social Security Number, not the full sequence On Financial
Passport Number A unique number printed on a passport issued by a government or governing agency On Identification
Password A secret word or phrase used to access a secure account or system On Security
Patient account number A number used to identify an individual patient across multiple records or health systems Off Patient
Personal email address An email address used for personal communication outside of work Off Contact
Personal Phone Number A phone number used for personal communication On Contact
PIN A short numeric code used to verify identity or authorize transactions Off Security
Prescription information Someone's prescription medication information Off Patient
US Social Security Number A government-issued number used for identity and tax purposes in the US On North America
UK Electoral Roll Number A unique identifier assigned to individuals registered to vote in UK elections Off Europe, Middle East and Africa
UK National Health Service Number A unique identifier assigned to patients in the UK's healthcare system On Europe, Middle East and Africa
UK National Insurance Number A unique identifier assigned to UK taxpayers for social security purposes On Europe, Middle East and Africa
UK Unique Taxpayer Reference A unique identifier assigned to UK taxpayers On Europe, Middle East and Africa
US Individual Taxpayer ID Number A unique identifier assigned to foreign individuals who need a US taxpayer identification number On North America

Out of the box detectors are pre‑configured to identify common PI types. Detection coverage and accuracy can vary based on document structure, formatting, language, and extracted text quality.

Creating custom detectors

In addition to OOTB detectors, you can also create and use custom PI detectors. Custom PI detectors allow you to identify PI types that are not covered by Out of the Box (OOTB) detectors.

Consider the following before creating custom detectors:

  • Custom detectors must include at least one regular expression.
  • You are responsible for validating the accuracy and scope of custom detectors.
  • Custom detectors participate in entity normalization based on their assigned Deduplication Identifier.

To create a Custom PI Detector:

  1. Click Add Detector in the top right corner of the Detectors table.
  2. Fill out the fields about the new Detector. Name and Category are required.
    If you do not select a value for Deduplication Identifier, it will default to Binary.

    An image of the Add New Detector window
  3. Click Next.
  4. Add Regexes for your Custom Detector.
    1. For more information on regular expressions, see Frequently asked questions.
    2. Specify a Match Group for the regular expression, if necessary.
      • Match group indicates which matching group contains the PI.
        For example, take the following regular expression:(ssn|social security number)\s*+:\s*+(\d{3}-\d{2}-\d{4}).
        This regular expression matches two groups, (ssn|social security number), and (\d{3}-\d{2}-\d{4}), but only group 2 contains the personal information to be captured. Therefore, the match group would be set to 2.
    3. Click Add.
    4. Repeat as necessary. You can include several regexes for a single Custom Detector.
  5. Add Keywords for your Custom Detector.
    1. Specify a Type for the keyword:
      • Global Keyword— A global keyword term is a term that must appear somewhere in the body of the document. If a global keyword is not found in the document, the detector will not return PI matches.
      • Global Blocklist Keyword— A global blocklist term is a term that must not appear anywhere in the body of the document. If a global blocklist term is found in the document, the detector will not return any PI matches.
      • Local Keyword— A local keyword term is a term that must appear near a PI matched via a regex pattern. You can specify a maximum distance in characters to indicate how far away the term should be on either side of the PI found. If the term is not found within the specified distance, the detector will not return that PI match.
      • Local Blocklist Keyword— A local blocklist term is a term that must not appear in the vicinity of a PI match. You can specify a maximum distance. If the local blocklist term appears within that distance of a PI match, the PI match will not be returned.
    2. If you select a Local Keyword or a Local Blocklist Keyword, specify a Max Keyword Distance.
      • The Max Keyword Distance dictates how far away a keyword is permitted to be on either side of information found by a regular expression.
      • The default value is 40 characters.
  6. When complete, click Save.

The Custom Detector will now appear in your Detectors list.

Custom detectors (regex and keyword-based) currently run only on unstructured documents. They do not run on structured documents such as spreadsheets. If your project requires detecting custom PI in structured documents, plan for manual review or QC of those documents.

Editing detectors

To enable or disable a Detector:

  1. Click on the detector you would like to modify.
  2. From the detector detail view adjust the Enabled toggle.
  3. Click Save.
    An image of the Edit Detector window

Limitations

The quality of PI detections may be impacted by the quality of unstructured documents.

aiR for Data Breach Response uses extracted text to create PI detections on unstructured documents. The formatting of the incoming text will affect the performance of PI detections. The following are examples of what will impact detection performance:

  • Optical character recognition (OCR) quality may affect the quality of detections. If your source data is images and you use OCR technology to generate the text, incorrectly generated text may affect the detector performance.
  • Lack of standard punctuation or casing

aiR for Data Breach Response’s PI detectors only support single language text. If a document includes multiple languages, the output may not be accurate. For documents that do not have English as the primary language, run structured analytics language identification against your document set. aiR for Data Breach Response will use the primary language identified when running Data Analysis.

Custom PI detectors run only on unstructured documents. Structured documents, such as spreadsheets, are not evaluated by custom regex or keyword detectors. Any project-specific PI in structured documents requires manual review or QC.

Frequently asked questions

Return to top of the page
Feedback