Last date modified: 2026-Apr-01
Personal Information detectors
From the Settings tab, you can view all Personal Information (PI) detectors and tags, add new detectors and tags, and turn detector tags on and off.
Definitions
The following terms are found throughout Personal Information detectors documentation:
- Detector (PI)— A combination of AI models, regular expressions (RegExes) and keywords that detect a string of text and classify it as a form of Personal Information.
Custom PI detectors must have at least one regular expression. - Primary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is 1:1, meaning a single person can only have one of that PI value and that PI value can only apply to one person. Examples of this include Social Security Number and Passport Number.
- Secondary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is 1:many, meaning a single person can have many of this PI value but a given PI value only applies to one person OR a single person only has one PI value but the PI value could be the same on different entities. Examples of this include email address and date of birth.
- Tertiary Deduplication Identifier—detectors where the expected ratio between a person and the PI Type is many:many, meaning a single person can have many PI values of this type and a given PI value can apply to many people. Examples of this include address and phone number.
- Out of the Box Detectors - Out of the Box (OOTB) Detectors are pre-built detectors created for aiR for Data Breach Response
- Custom Detectors - Custom Detectors are created by users to solve the particular needs of a given project outside of the OOTB offerings
Permissions
Settings are available for users assigned the role of Lead.
Viewing detectors
To view the Detectors table open the Settings tab and click the Detectors subtab. 
Detectors fields
The following fields appear on the Detectors table:
- Detector—this includes names of Out of the Box Detectors and Custom Detectors
- Description— a description of each detector
- Enabled—status (Yes/No)
- Detectors set to No will not be run during Data Analysis.
- Created By—lists whether the detector was created by the system or a user.
- Category—lists the detector category
- Deduplication Identifier—the settings that will be used in entity normalization.
Click on any detector to see the details for that detector.
You can sort or filter the list using the filter icon in the top right corner of the table.
Supported Personal Information detectors
aiR for Data Breach Response supports the following out of the box detectors:
| Detector Name | Description | Default Status | Category |
|---|---|---|---|
| ABA Routing Number | A nine-digit code used to identify U.S. banks for financial transactions | Off | Financial |
| Account Number | A number used to identify a specific financial account | On | Financial |
| Address | A location where a person lives or receives mail | On | Contact |
| Age | All age terms and phrases | Off | Demographic |
| Australia Tax File Number (TFN) | A unique number assigned to individuals and organizations in Australia for tax identification purposes | Off | Asia Pacific |
| Australian Individual Healthcare Identifier (IHI) | A unique number assigned to a medical account in Australia for identification and billing purposes | Off | Asia Pacific |
| Australian Medicare Provider Number | Identifiers and details for healthcare providers registered with Medicare in Australia | Off | Patient |
| Credit/Debit Card Expiration | The month and year when a credit card expires | On | Financial |
| Credit/Debit Card Number | A unique number used to identify a credit card account | On | Financial |
| Credit/Debit Card Security Code | A short numeric code used to verify credit card transactions | On | Financial |
| Date of Birth | The full calendar date when a person was born | On | Demographic |
| Date of Death | The calendar date when a person passed away | On | Patient |
| Driver License Number | A unique number assigned to a licensed driver | On | North America |
| Email Address | A person's email address | On | Contact |
| EU VAT (value added tax) Number | A unique identifier assigned to businesses for Value Added Tax purposes within the European Union. Each EU country has its own format | Off | Financial |
| Full Name | A person’s complete name, including first and last names | On | Contact |
| Health insurance number | Someone's health insurance identification number | Off | Patient |
| International Bank Account Number | A globally recognized number used to identify a bank account for international transactions | Off | Financial |
| Medical dates of service | Someone's medical appointment or service dates | Off | Patient |
| Medical provider name | Someone's medical provider or healthcare facility name | Off | Patient |
| Medical record number | Someone's medical record number | Off | Patient |
| National ID Number | A government-issued unique identifier used to verify an individual's identity within a country | Off | Identification |
| Other | Any personal information not covered by standard categories | Off | Other |
| Partial Credit/Debit Card Number | A portion of a credit card number that does not include the full sequence | Off | Financial |
| Partial Date of Birth | The year a person was born, without the full birthdate | Off | Demographic |
| Partial Social Security Number | A portion of a US Social Security Number, not the full sequence | On | Financial |
| Passport Number | A unique number printed on a passport issued by a government or governing agency | On | Identification |
| Password | A secret word or phrase used to access a secure account or system | On | Security |
| Patient account number | A number used to identify an individual patient across multiple records or health systems | Off | Patient |
| Personal email address | An email address used for personal communication outside of work | Off | Contact |
| Personal Phone Number | A phone number used for personal communication | On | Contact |
| PIN | A short numeric code used to verify identity or authorize transactions | Off | Security |
| Prescription information | Someone's prescription medication information | Off | Patient |
| US Social Security Number | A government-issued number used for identity and tax purposes in the US | On | North America |
| UK Electoral Roll Number | A unique identifier assigned to individuals registered to vote in UK elections | Off | Europe, Middle East and Africa |
| UK National Health Service Number | A unique identifier assigned to patients in the UK's healthcare system | On | Europe, Middle East and Africa |
| UK National Insurance Number | A unique identifier assigned to UK taxpayers for social security purposes | On | Europe, Middle East and Africa |
| UK Unique Taxpayer Reference | A unique identifier assigned to UK taxpayers | On | Europe, Middle East and Africa |
| US Individual Taxpayer ID Number | A unique identifier assigned to foreign individuals who need a US taxpayer identification number | On | North America |
Creating custom detectors
In addition to OOTB detectors, you can also create and use custom detectors.
To create a Custom PI Detector:
- Click Add Detector in the top right corner of the Detectors table.
- Fill out the fields about the new Detector. Name and Category are required.
If you do not select a value for Deduplication Identifier, it will default to Binary.

- Click Next.
- Add Regexes for your Custom Detector.
- For more information on regular expressions, see Frequently asked questions.
- Specify a Match Group for the regular expression, if necessary.
- Match group indicates which matching group contains the PI.
For example, take the following regular expression:(ssn|social security number)\s*+:\s*+(\d{3}-\d{2}-\d{4}).
This regular expression matches two groups, (ssn|social security number), and (\d{3}-\d{2}-\d{4}), but only group 2 contains the personal information to be captured. Therefore, the match group would be set to 2.
- Match group indicates which matching group contains the PI.
- Click Add.
- Repeat as necessary. You can include several regexes for a single Custom Detector.
- Add Keywords for your Custom Detector.
- Specify a Type for the keyword:
- Global Keyword— A global keyword term is a term that must appear somewhere in the body of the document. If a global keyword is not found in the document, the detector will not return PI matches.
- Global Blocklist Keyword— A global blocklist term is a term that must not appear anywhere in the body of the document. If a global blocklist term is found in the document, the detector will not return any PI matches.
- Local Keyword— A local keyword term is a term that must appear near a PI matched via a regex pattern. You can specify a maximum distance in characters to indicate how far away the term should be on either side of the PI found. If the term is not found within the specified distance, the detector will not return that PI match.
- Local Blocklist Keyword— A local blocklist term is a term that must not appear in the vicinity of a PI match. You can specify a maximum distance. If the local blocklist term appears within that distance of a PI match, the PI match will not be returned.
- If you select a Local Keyword or a Local Blocklist Keyword, specify a Max Keyword Distance.
- The Max Keyword Distance dictates how far away a keyword is permitted to be on either side of information found by a regular expression.
- The default value is 40 characters.
- Specify a Type for the keyword:
- When complete, click Save.
The Custom Detector will now appear in your Detectors list.
Editing detectors
To enable or disable a Detector:
- Click on the detector you would like to modify.
- From the detector detail view adjust the Enabled toggle.
- Click Save.

Limitations
The quality of PI detections may be impacted by the quality of unstructured documents.
aiR for Data Breach Response uses extracted text to create PI detections on unstructured documents. The formatting of the incoming text will affect the performance of PI detections. The following are examples of what will impact detection performance:
- Optical character recognition (OCR) quality may affect the quality of detections. If your source data is images and you use OCR technology to generate the text, incorrectly generated text may affect the detector performance.
- Lack of standard punctuation or casing
aiR for Data Breach Response’s PI detectors only support single language text. If a document includes multiple languages, the output may not be accurate. For documents that do not have English as the primary language, run structured analytics language identification against your document set. aiR for Data Breach Response will use the primary language identified when running the Data Analysis
Frequently asked questions
RegEx is a string of characters that represents a pattern. You can use RegEx to search for text that matches these patterns. For example, to detect Employee ID’s that consist of 2 capital letters followed by 5 digits, you could create the following custom detector using the expression: \b([A-Z]{2}[\d]{5})\b
Where:
- \b represents a word boundary
- [A-Z]{2} represents two capitalized letters in the range A to Z
- [\d]{5} represents 5 digits
- The parentheses ( ) are put around the token we want to capture as the ID
Example scenario
For a particular project, it may be important to identify Employee ID’s. Employee ID is not an out of the box detector, so building a custom detector is required.
Employee ID’s look like the following:
- N68020KL
- E93400PE
In other words, they are all in the form of one capital letter followed by 5 digits, followed by two more capital letters.
Then, the corresponding regex would be: [A-Z]\d{5}[A-Z]{2}
Following the steps described in “Testing RegExes and Keywords,” you can use the interface to test whether this regex works:
In the box in the bottom right-hand corner, the text says:
- Detected PI 0:
- N68020KL
This indicates that the regex successfully recognizes Employee ID’s.
RegEx recommendations
- Avoid the * character when creating regexes, as they can result in performance issues.
- aiR for Data Breach Response uses the Java 8 version of RegEx.