Last date modified: 2026-Apr-01

Personal Information detectors

From the Settings tab, you can view all Personal Information (PI) detectors and tags, add new detectors and tags, and turn detector tags on and off.

Definitions

The following terms are found throughout Personal Information detectors documentation:

  • Detector (PI)— A combination of AI models, regular expressions (RegExes) and keywords that detect a string of text and classify it as a form of Personal Information.
    Custom PI detectors must have at least one regular expression.
  • Primary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is 1:1, meaning a single person can only have one of that PI value and that PI value can only apply to one person. Examples of this include Social Security Number and Passport Number.
  • Secondary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is 1:many, meaning a single person can have many of this PI value but a given PI value only applies to one person OR a single person only has one PI value but the PI value could be the same on different entities. Examples of this include email address and date of birth.
  • Tertiary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is many:many, meaning a single person can have many PI values of this type and a given PI value can apply to many people. Examples of this include address and phone number.
  • Out of the Box Detectors - Out of the Box (OOTB) Detectors are pre-built detectors created for aiR for Data Breach Response
  • Custom Detectors - Custom Detectors are created by users to solve the particular needs of a given project outside of the OOTB offerings

Permissions

Settings are available for users assigned the role of Lead.

Viewing detectors

To view the Detectors table open the Settings tab and click the Detectors subtab.
An image of the Detectors table

Detectors fields

The following fields appear on the Detectors table:

  • Detector—this includes names of Out of the Box Detectors and Custom Detectors
  • Description— a description of each detector
  • Enabled—status (Yes/No)
    • Detectors set to No will not be run during Data Analysis.
  • Created By—lists whether the detector was created by the system or a user.
  • Category—lists the detector category
  • Deduplication Identifier—the settings that will be used in entity normalization.

Click on any detector to see the details for that detector.

You can sort or filter the list using the filter icon in the top right corner of the table.

Supported Personal Information detectors

aiR for Data Breach Response supports the following out of the box detectors:

Detector Name Description Default Status Category
ABA Routing Number A nine-digit code used to identify U.S. banks for financial transactions Off Financial
Account Number A number used to identify a specific financial account On Financial
Address A location where a person lives or receives mail On Contact
Age All age terms and phrases Off Demographic
Australia Tax File Number (TFN) A unique number assigned to individuals and organizations in Australia for tax identification purposes Off Asia Pacific
Australian Individual Healthcare Identifier (IHI) A unique number assigned to a medical account in Australia for identification and billing purposes Off Asia Pacific
Australian Medicare Provider Number Identifiers and details for healthcare providers registered with Medicare in Australia Off Patient
Credit/Debit Card Expiration The month and year when a credit card expires On Financial
Credit/Debit Card Number A unique number used to identify a credit card account On Financial
Credit/Debit Card Security Code A short numeric code used to verify credit card transactions On Financial
Date of Birth The full calendar date when a person was born On Demographic
Date of Death The calendar date when a person passed away On Patient
Driver License Number A unique number assigned to a licensed driver On North America
Email Address A person's email address On Contact
EU VAT (value added tax) Number A unique identifier assigned to businesses for Value Added Tax purposes within the European Union. Each EU country has its own format Off Financial
Full Name A person’s complete name, including first and last names On Contact
Health insurance number Someone's health insurance identification number Off Patient
International Bank Account Number A globally recognized number used to identify a bank account for international transactions Off Financial
Medical dates of service Someone's medical appointment or service dates Off Patient
Medical provider name Someone's medical provider or healthcare facility name Off Patient
Medical record number Someone's medical record number Off Patient
National ID Number A government-issued unique identifier used to verify an individual's identity within a country Off Identification
Other Any personal information not covered by standard categories Off Other
Partial Credit/Debit Card Number A portion of a credit card number that does not include the full sequence Off Financial
Partial Date of Birth The year a person was born, without the full birthdate Off Demographic
Partial Social Security Number A portion of a US Social Security Number, not the full sequence On Financial
Passport Number A unique number printed on a passport issued by a government or governing agency On Identification
Password A secret word or phrase used to access a secure account or system On Security
Patient account number A number used to identify an individual patient across multiple records or health systems Off Patient
Personal email address An email address used for personal communication outside of work Off Contact
Personal Phone Number A phone number used for personal communication On Contact
PIN A short numeric code used to verify identity or authorize transactions Off Security
Prescription information Someone's prescription medication information Off Patient
US Social Security Number A government-issued number used for identity and tax purposes in the US On North America
UK Electoral Roll Number A unique identifier assigned to individuals registered to vote in UK elections Off Europe, Middle East and Africa
UK National Health Service Number A unique identifier assigned to patients in the UK's healthcare system On Europe, Middle East and Africa
UK National Insurance Number A unique identifier assigned to UK taxpayers for social security purposes On Europe, Middle East and Africa
UK Unique Taxpayer Reference A unique identifier assigned to UK taxpayers On Europe, Middle East and Africa
US Individual Taxpayer ID Number A unique identifier assigned to foreign individuals who need a US taxpayer identification number On North America

Creating custom detectors

In addition to OOTB detectors, you can also create and use custom detectors.

To create a Custom PI Detector:

  1. Click Add Detector in the top right corner of the Detectors table.
  2. Fill out the fields about the new Detector. Name and Category are required.
    If you do not select a value for Deduplication Identifier, it will default to Binary.

    An image of the Add New Detector window
  3. Click Next.
  4. Add Regexes for your Custom Detector.
    1. For more information on regular expressions, see Frequently asked questions.
    2. Specify a Match Group for the regular expression, if necessary.
      • Match group indicates which matching group contains the PI.
        For example, take the following regular expression:(ssn|social security number)\s*+:\s*+(\d{3}-\d{2}-\d{4}).
        This regular expression matches two groups, (ssn|social security number), and (\d{3}-\d{2}-\d{4}), but only group 2 contains the personal information to be captured. Therefore, the match group would be set to 2.
    3. Click Add.
    4. Repeat as necessary. You can include several regexes for a single Custom Detector.
  5. Add Keywords for your Custom Detector.
    1. Specify a Type for the keyword:
      • Global Keyword— A global keyword term is a term that must appear somewhere in the body of the document. If a global keyword is not found in the document, the detector will not return PI matches.
      • Global Blocklist Keyword— A global blocklist term is a term that must not appear anywhere in the body of the document. If a global blocklist term is found in the document, the detector will not return any PI matches.
      • Local Keyword— A local keyword term is a term that must appear near a PI matched via a regex pattern. You can specify a maximum distance in characters to indicate how far away the term should be on either side of the PI found. If the term is not found within the specified distance, the detector will not return that PI match.
      • Local Blocklist Keyword— A local blocklist term is a term that must not appear in the vicinity of a PI match. You can specify a maximum distance. If the local blocklist term appears within that distance of a PI match, the PI match will not be returned.
    2. If you select a Local Keyword or a Local Blocklist Keyword, specify a Max Keyword Distance.
      • The Max Keyword Distance dictates how far away a keyword is permitted to be on either side of information found by a regular expression.
      • The default value is 40 characters.
  6. When complete, click Save.

The Custom Detector will now appear in your Detectors list.

Editing detectors

To enable or disable a Detector:

  1. Click on the detector you would like to modify.
  2. From the detector detail view adjust the Enabled toggle.
  3. Click Save.
    An image of the Edit Detector window

Limitations

The quality of PI detections may be impacted by the quality of unstructured documents.

aiR for Data Breach Response uses extracted text to create PI detections on unstructured documents. The formatting of the incoming text will affect the performance of PI detections. The following are examples of what will impact detection performance:

  • Optical character recognition (OCR) quality may affect the quality of detections. If your source data is images and you use OCR technology to generate the text, incorrectly generated text may affect the detector performance.
  • Lack of standard punctuation or casing

aiR for Data Breach Response’s PI detectors only support single language text. If a document includes multiple languages, the output may not be accurate. For documents that do not have English as the primary language, run structured analytics language identification against your document set. aiR for Data Breach Response will use the primary language identified when running the Data Analysis

Frequently asked questions

Return to top of the page
Feedback