Errors and flags
PI Detect uses document flags in a number of ways to denote information about a document. There are three types of document flags:
- System Flags
- TIQ Flags
- User Flags
Many flag types are available by default, however lead users can create new User Flags types in a project.
System flags
The application logic creates system flags during an Ingestion or IF run and users cannot remove them. System flags denote documents for which the application has issues processing. The presence of a system flag may prevent that document from processing in certain pipeline stages.
System default flag types
The following table describes descriptions of default flag types and the action required by the user when they appear.
Flag name | Description |
---|---|
Exceeds Excel Size Limit | Excel file has exceeded the maximum file size supported by PI Detect and was not processed. This document will not have PI predictions and not be available for review. |
System Technical Issue | A file impacted by aTechnical Issue that has prevented the document from processing for review and reporting. |
The document failed to convert to PDF | A non-spreadsheet native was not converted to PDF and will not be viewable in the PDFtron viewer. Native is not viewable in PDFTtron but PI predictions are still made. The Document Report will contain information on the document. |
File Extension is not currently supported | The file extension is not supported. |
Exceeds Native File Size Limit | The file is too large. |
Empty Text | The associated text file is empty. |
TIQ flags
The application logic creates TIQ flags during an Ingestion or IF run and users can remove them. They draw attention to issues encountered when processing a document which may require user action. Each flag generates from a stage in the pipeline.
TIQ default flag types
Flag name | Stage | Description | Resolution |
---|---|---|---|
Document deduplication could not run on this document | Deduplication | Application runs into an error during document deduplication. | No immediate resolution is available for this error. If it impedes the output of the project, please contact Relativity Support. |
Document text could not be extracted | Extraction | Application runs into an error extracting the raw text from a document. | No immediate resolution is available for this error. If it impedes the output of the project, please contact Relativity Support. |
Document splitting could not be performed on this document | Document Splitting | Application runs into an error splitting the document into sub-documents. This may prevent PI detections from running on the document. | Review the document manually to make sure PI detections are complete. If it impedes the output of the project, please contact Relativity Support. |
Excel header detection did not complete on this document | Excel Header Detection | Application runs into an error performing header detection. This may prevent PI predictions from completing on the document. | Manually review the document to identify personal information. If it impedes the output of the project, please contact Relativity Support. |
Sheet contains conflicting name headers | Excel Header Detection | Multiple columns were tagged as the same name type in a single table. Documents with these results are blocked from the Entity Report. For example, a table cannot have 2 columns that are 'Full Name.' This can occur for Full Name, First Name, Middle Name, and Last Name fields. | Review documents or Use Spreadsheet QC to remove duplicate labels; for example, two columns labeled as "Full Name." A table can only contain a single entity per row and the entity's name is generated from the labels. Multiple labels will create conflicting information and will be excluded from the Entity Report. If all name fields need to be captured correctly, try to create multiple tables so there is only one instance of a label per table. If two tables cannot be created to capture the entity's information, remove the duplicate labels and upload another version of the document in PI Detect to capture the other entity's information. Or, the documents can be reviewed in Relativity. |
Sheet contains conflicting PI Types within single column | Excel Header Detection | A single column has been tagged as multiple conflicting PI types. | Review documents or Use Spreadsheet QC to remove duplicate labels; for example, two columns labeled as "Address 1." A table can only contain a single instance of a label per row. Multiple labels will create conflicting information and will be excluded from the Entity Report. If all PI fields need to be captured correctly, try to create multiple tables so there is only one instance of a label per table. If two tables cannot be created to capture the PI, remove the duplicate labels and upload another version of the document in PI Detect to capture the other PI. Or, the documents can be reviewed in Relativity. |
Excel detection process not complete due to timeout | Excel Header Detection | Excel document takes too long to process. | Manually review the document and turn off any unnecessary detectors. If it impedes the output of the project, please contact Relativity Support. |
Excel header assignment did not complete on this document. | Excel Header Detection | Application runs into an error applying header assignments. | Manually review the document and turn off any unnecessary detectors. If it impedes the output of the project, please contact Relativity Support. |
Spreadsheet tag statistics could not be fully generated | Excel Tag Stats Processing | Error occurred while calculating spreadsheet statistics used for Spreadsheet QC experience. | If it impedes the output of the project, please contact Relativity Support. |
Spreadsheet content extraction error | Excel Content Detection | An error occurred while extracting information from the spreadsheet to generate entity records. Individuals from this document may not be available in the Entity Report. | Manually review the document to ensure that the tables and columns are labeled correctly. If the issue persists, please contact Relativity Support. |
Cell values were excluded from the Entity report | Excel Content Detection | Cell values were excluded from the Normalization process and the Entity report because they failed to match the format of the assigned PI type. For example, a Social Security Number has 5 digits instead of 9. | Use the Spreadsheet Blocklisting tool to decide if some terms should be allowed. Rerun the normalization process to show these results in the Entity Report. If it impedes the output of the project, please contact Relativity Support. |
Detector regex timed out | Keyword and Regex Detection | Keyword or regex takes too long to run. | Review documents manually and update the detector that is timing out. Turn off PI types that are not necessary for the project. |
Detection process not complete due to timeout | Keyword and Regex Detection | Processing a document takes too long. | Change the detector regex to be less computationally expensive. |
Contains errors from the NER Detector | Machine Learning Detection | Application runs into an error during Name Detection. | Manually review the impacted documents. |
Error applying document classifications | Machine Learning Detection | Error occurred while applying document classifications that may impact the reported accuracy of predictions. | Manually review the impacted documents. |
Personal Information counts could not be calculated | Document Statistics | An error occurred that may cause the PI and name counts to be inaccurately reported. | Manually review the impacted documents and rerun the Incorporate Feedback Pipeline. If it impedes the output of the project, please contact Relativity Support. |
The document could not be scored | Document Scoring | Application runs into error scoring document. | Manually review the impacted documents and rerun the Incorporate Feedback Pipeline. If it impedes the output of the project, please contact Relativity Support. |
Overlapped annotation removal could not complete | Overlap Removal | An error occurred while removing overlapping annotations from machine predictions. | Manually review the impacted documents and rerun the Incorporate Feedback Pipeline. If it impedes the output of the project, please contact Relativity Support. |
PDF annotations could not be generated for document | PDF Annotation Generation | Application runs into error trying to generate annotations on the PDF view of the documents. | Contact Relativity Support. |
User flags
Users can create and delete User Flags. They help manage document sets, but the presence of User Flags has no bearing on application logic.
User default flag types
Flag name | Description | Intended usage |
---|---|---|
Needs Further Review | A user has identified a document as needing extra review by a project lead. | Project lead batches out these documents for review, or reviews one by one. |
Technical Issue |
A user has identified a document as having some technical issue that prevented them from completing their review. | Documents get reviewed outside of PI Detect . |
Illegible | A user has identified a document as not readable and, therefore, not reviewable. | Leads should make sure that all unreviewed documents are flagged to stakeholders. |
Duplicate | A user has identified a document as a duplicate and may not have completed the review of the document. | Leads should make sure all documents flagged as duplicates, and likely not reviewed, are brought to the attention of project stakeholders. |
Bulk Document | A user has identified a document as a bulk document with lots of PI. It should be reviewed during an alternative review process. | Leads should decide how to handle bulk documents based on the project requirements and either batch out the bulk documents for review or review them outside of PI Detect. |
Stage errors
Stage errors appear in the error message column in the Document or Non-Document based error tabs on the Incorporate Feedback page.
Constant name | Type | Readable error message | Stage | Item (for non-document errors) | Error flag | Description |
---|---|---|---|---|---|---|
METADATA_GENERATION_ERROR | Document Error | Document meta data generation error. | Document Generator | UNSUPPORTED_EXTENSION | EMPTY_TEXT | EXCEEDS_SIZE_LIMIT | Failed to generate document statistics about the document's PI. | |
KEYWORD_REGEX_ERROR | Document Error | Unable to process initial keyword/regexes. | PI Text Detection | DETECTION_PROCESS_TIMEOUT_FAILURE | JAVA_DETECTORS_FAILURE | An exception is being caught during the process, which may cause missing PI. | |
PROCESS_SETUP_ERROR | Document Error | An error occurred during the process setup. | PI Text Detection | DETECTION_PROCESS_TIMEOUT_FAILURE | JAVA_DETECTORS_FAILURE | Before running the global keyword filter process, it first clears all the cache and reloads detector regexes and keywords. This error is logged if something went wrong during the setup. | |
DETECTION_GENERATION_ERROR | Document Error | Unable to generate detections. | Spreadsheet Regex Processing, Excel PI Contents Detector | EXCEL_ANNOTATION_GENERATION_FAILURE | EXCEL_HEADER_DETECTOR_FAILURE | SPREADSHEET_REGEX_FAILURE | ||
DETECTION_PROCESS_TIMEOUT | Document Error | The detection process timed out. | PI Text Detection | DETECTION_PROCESS_TIMEOUT_FAILURE | JAVA_DETECTORS_FAILURE | ||
DUPLICATE_DOC_CHECK_ERROR | Document Error | Unable to determine if the document is a duplicate. | Deduplication | DEDUPLICATION_FAILURE | ||
DOC_HAS_DUPLICATES_CHECK_ERROR | Document Error | Unable to check the document for duplicates. | Deduplication | DEDUPLICATION_FAILURE | ||
DAT_TEXT_PATH_ERROR | Document Error | DAT entry does not include a text path. | Minimal Extractor | EXTRACTION_FAILURE | ||
DAT_TEXT_EXTRACTION_ERROR | Document Error | Unable to extract text from the DAT record. | Minimal Extractor | EXTRACTION_FAILURE | ||
DOC_SIZE_LIMIT_EXCEEDED | Document Error | Document size exceeds the maximum processing limit. | Minimal Extractor | EXTRACTION_FAILURE | ||
NATIVE_FILE_NOT_FOUND | Document Error | Native file not found. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_FILE_INACCESSIBLE | Document Error | The native file at the given path is inaccessible. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_TO_PDF_ERROR | Document Error | The native file is not available in the Viewer. Please use the extracted text. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_PDF_CORRUPT_ERROR | Document Error | The native file is corrupted and not available in the Viewer. Please use the extracted text. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_PDF_CORRUPT_OR_ENCRYPTED | Document Error | The native file is either corrupted or encrypted. Please check if the file can be opened before re-ingesting the data. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_PDF_ENCRYPTED_ERROR | Document Error | The file is encrypted. Please decrypt the file and re-ingest the data or use the extracted text. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_PDF_INVALID_FILE_ERROR | Document Error | The file is not in a valid PDF format. Please use the extracted text. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
NATIVE_PDF_CANNOT_CONVERT_FORMAT | Document Error | The format is not directly supported on this platform. Please manually convert and re-ingest the data. | PDF Conversion | PDF_CONVERSION_FAILURE | ||
BLOCKLISTING_ERROR | Non-Document Error | Unable to process blocklisted terms. | Blocklisting | blacklist_id (from PiiBlacklistedData) | An interruption occurred during the blocklisting stage, and the term has failed to be blocklisted. | |
DETECTION_CLASSIFICATION_ERROR | Non-Document Error | Unable to classify the detections for the document. | Machine Learning Detection | classification_id (from PiiClassificationRows) | The process for verifying the PI and creating annotations was interrupted, the detection entry will not show up. | |
DETECTION_MODEL_RUN_ERROR | Non-Document Error | Prediction models failed to run for document: | Machine Learning Detection | classification_id (from PiiClassificationRows) | An exception occurred when trying to load the prediction model. Please contact Relativity Support | |
DETECTION_MODEL_LOAD_ERROR | Non-Document Error | Detection models failed to load. | Machine Learning Detection | classification_id (from PiiClassificationRows) | detectorType | An error occurred when trying to load the detector model. Please contact Relativity Support | |
DETECTION_MODEL_TRAINING_ERROR | Non-Document Error | Detectors failed to train. | Detector Training | detectorType | An interruption occurred during detector training. A model cannot be generated for the detector. | |
APPLY_ANNOTATIONS_ERROR | Document Error | Unable to apply document-level annotations. | Machine Learning Detection | JAVA_DETECTORS_FAILURE | ||
DOCUMENT_SCORER_ERROR | Document Error | Unable to score the document. | Document Scoring | DOCUMENT_SCORER_FAILURE | ||
PII_DOCUMENT_NOT_FOUND | Document Error | Document not found in the database. | Validation | Overlap Removal | OVERLAP_DETECTOR_FAILURE | ||
OVERLAP_REMOVAL_ERROR | Document Error | Overlapping annotations could not be removed. | Overlap Removal | OVERLAP_DETECTOR_FAILURE | ||
DOCUMENT_STAT_ERROR | Document Error | Unable to update document statistics. | Document Statistics | DOCUMENT_STATS_FAILURE | ||
DAT_ENTRY_NOT_FOUND | Document Error | DAT entry not found. | Validation | |||
TEXT_NOT_FOUND | Document Error | Text file for the document could not be found. | Validation, PDF Conversion | PDF_CONVERSION_FAILURE | ||
MISSING_EXCEL_NATIVE | Document Error | Native file for Excel document could not be found. | Validation | |||
PDF_ANNOTATIONS_ERROR | Document Error | PDF coordinates were not added to annotations. | PDF Annotation generation | PDF_ANNOTATION_FAILURE | ||
REPORT_GENERATION_ERROR | Non-Document Error | Unable to generate reports. | Report Generation | Report type name | An interruption occurred during report generation. The specific report failed to generate. | |
EXCEL_HEADER_STATS_ERROR | Document Error | Unable to generate header statistics. | Excel Tag Stats Processing | |||
EXCEL_HEADER_STAT_WRITER_ERROR | Document Error | Unable to write header statistics to the database. | Excel Tag Stats Processing | |||
EXCEL_HEADER_ASSIGNMENTS_ERROR | Non-Document Error | Unable to process header assignments. | Excel Header Detection | headerValue from PiiExcelHeaderAssignment | An interruption occurred during insert/delete annotation for spreadsheet, failed to add/delete annotation. | |
EXCEL_STATS_UPDATE_ERROR | Document Error | Unable to update spreadsheet statistics. | Excel Header Detection | EXCEL_HEADER_ASSIGNMENT_FAILURE | ||
PI_MAPPING_ERROR | Document Error | Unable to retrieve PI types and detector data. | Excel PI Contents Detector | EXCEL_ANNOTATION_GENERATION_FAILURE | ||
NATIVE_PII_DOCUMENT_NOT_FOUND | Document Error | Unable to find the native file or document in the database. | Excel PI Contents Detector | EXCEL_ANNOTATION_GENERATION_FAILURE | ||
FILE_READ_ERROR | Document Error | Failed to read the file. | Excel PI Contents Detector | EXCEL_ANNOTATION_GENERATION_FAILURE | ||
UNSUPPORTED_FIL_EXTENSION | Document Error | Unsupported file extension provided. | Excel PI Contents Detector | EXCEL_ANNOTATION_GENERATION_FAILURE | ||
EXCEL_SEARCH_TERMS_ERROR | Non-Document Error | Unable to process search terms. | Excel Header Detection | SearchTermToProcess | An interruption occurred when processing search term annotations on spreadsheet. | |
EXCEL_SEARCH_TERM_STATS_ERROR | Document Error | Unable to update search term statistics. | Excel Header Detection | |||
CHANGE_RELATIVITY_STATE_FAILURE | Non-Document Error | Failed to change Relativity state. | Report Generation | "" | Unable to reset Relativity state for report generation. |