Data Analysis
Note: Incorporate Feedback has been moved to the Data Analysis tab. The Incorporate Feedback page in Privacy Workflow will be deprecated on March 17, 2025. Please begin using the Data Analysis page prior to this date.
Data Analysis is a combination of machine stages that make predictions, perform calculations, curate machine output, and generate reports.
There are two types of data, and when it comes to finding PI, Data Breach Response treats each one differently.
- Structured data—data that is organized in a specific and predefined way, typically in a table with columns and rows, and where each data point has a specific data type.
Data Breach Response identifies table boundaries and detects header & column content to predict PI. - Unstructured data—unlabeled or otherwise unorganized data. Detections for unstructured data are currently text based such as email, text documents, etc. with additional unstructured data sources in a future state such as photos, audio files, etc.
Data Breach Response uses the context of the document to differentiate different types of PI.
Data Analysis stages
Stage | Description | Can documents be reviewed while the stage is running? | When to run the stage |
---|---|---|---|
Run Blocklisting | The blocklist consists of terms added from the Blocklisting tool that will not be detected as PI. PI detectors will ignore new detections that match the blocklisted terms, and prior detections matching blocklisted terms will be marked as Blocklisted and have their links broken. Manually added detections that match terms in the blocklist will not be removed. If blocklisting is run when no changes have been made to the blocklist, the stage status will show as Skipped in Progress section. | No |
|
Run Unstructured Detectors | Identifies PI by running all enabled PI detectors on unlocked unstructured documents. As soon as unstructured detectors have finished processing on a document, the document becomes available for review. | Yes |
|
Run Structured Detectors | Identifies PI by running all enabled PI detectors on unlocked structured documents. In addition, all names and PI from structured documents are automatically linked. As soon as structured detectors have finished processing on a document, the document becomes available for review. | Yes |
|
Run Normalization | Standardizes names and PI into consolidated entities and generates an updated entity report. | No |
|
Compile Insights | Calculates and consolidates PI and entity statistics for reporting. | No |
|
Running Data Analysis
This section provides instructions for running Data Analysis and lists common use cases when running Data Analysis.
To run Data Analysis:
- Select Run Data Analysis in the console.
- Select the processes to run.
- Click Run to start Data Analysis.
Choosing stages to run
On each run you can configure Data Analysis to run all, or some, stages.
Depending on what the goal of running Data Analysis is, it may be helpful to only select some stages to run. Common use cases when running Data Analysis include:

- Run Blocklisting
- Run Unstructured Detectors
- Run Structured Detectors
- Run Normalization
- Compile Insights
Note: The initial Entity Centric Report is generated from spreadsheet entities only. Entities from unstructured documents will appear on the Entity Centric Report when they are linked in the document viewer.

QC review primarily focuses on refining detectors and potentially blocklisting false hits. At this stage, having an up-to-date Entity Centric Report is not the priority. For this reason and to reduce runtime, run the following stages only:
- Run Blocklisting
- Run Unstructured Detectors
- Run Structured Detectors
- Compile Insights

Just as in the the QC process, detectors may be refined during Review. You can choose to run the same steps in Case 2 if you wish to make detector or blocklist updates during Review.
If you wish to just generate updated versions of the Reviewer Progress and/or Document Report, run the following stage only:
- Compile Insights

During the deduplication process, entities may be merged, entities may be unmerged, or Deduplication Settings may be updated. It is not typical that detectors are updated at this stage.
If changes have been made to structured documents, for example adding or removing PI, since the entity report was last generated, include Run Structured Detectors. This stage is responsible for automatically linking names and PI in structured documents to create entities. Running Structured Detectors ensures those links are up to date for the entity report.
Monitoring Data Analysis status
A run’s progress can be monitored on the Data Analysis page. Data Analysis breaks down each of stage into sections that include dashboard summaries, sub-job details, and counts.
Overall progress
Overall progress can be monitored using the Progress section. Statuses can be:
- Not Started—the stage has not begun.
- Still Running—the stage is in the middle of processing.
- Completed— the stage has finished processing successfully.
- Completed with Failures—the stage has finished processing and some items encountered failures during processing.
- Failed—the stage has finished processing and many items encountered failures during processing, so the stage is considered to have failed.
- Skipped—the stage was not run.
- Interrupted—the stage was stopped in the middle of processing.
Blocklisting details
Dashboard numbers
- Stage—the current status of the stage.
- Errors—the number of errors that occurred during blocklisting. You can retry these errors, see Canceling and retrying Data Analysis for details.
If Blocklisting is run but there have been no changes to the blocklist, the status section displays a Skipped status.
PI Detection details
Dashboard numbers
- Ready for Review—the number of documents that have finished processing through the PI Detection stage and can start to be reviewed. View Documents will take you to the Project Dashboard to view these documents. See Data Analysis and document review for more information.
- Structured Documents Completed—the number of structured documents that have finished processing through the PI Detection stage. Structured and unstructured detection run in parallel.
- Unstructured Documents Completed—the number of unstructured documents that have finished processing through the PI Detection stage. Structured and unstructured detection run in parallel.
- Errors—the number of errors that occurred during PI Detection. View Errors will take you to the Project Dashboard to view these errors.
You can retry these errors, see Canceling and retrying Data Analysis for details.
Note: When running Data Analysis, you can choose to run only unstructured or structured detection. If one is not run, it’s Documents Completed Count will remain zero. For example, if only unstructured detection is run, Structured Documents Completed will display zero documents as the detectors were not run.
Entity Normalization and Deduplication details
- Address Standardization—aligns addresses into a single address format/consistent address formats
- Normalizer—consolidates annotation links and records into entities. Merges entities with the same PI.
Dashboard numbers
- Stage—the substage currently being run
- Completion—percent completion of the stage
- Errors—the number of errors that occurred during Entity Normalization & Deduplication. View Errors will take you to the Project Dashboard to view these errors.
You can retry these errors, see Canceling and retrying Data Analysis for details.
Compile Insights details
- Document Report Generation—creates the Document Report by aggregating PI and entity information on a document level.
- Document Indexing—indexes the database for PI and entity search.
- Table Header Analysis—identifies the review status, the number of instances of a header, and the PI assignment of that header for reporting purposes.
- Precision and Recall—calculates precision and recall. Precision is used to evaluate how accurate a detector’s PI predictions are. Recall is used to evaluate how well a detector is retrieving PI.
- Training—PI detector models are retrained based on user additions, edits, and deletions of PI on unstructured documents.
Dashboard numbers
- Stage—the substage currently being run
- Completion—percent completion of the stage
- Errors—the number of errors that occurred during Entity Normalization & Deduplication. View Errors will take you to the Project Dashboard to view these errors.
You can retry these errors, see Canceling and retrying Data Analysis for details.
Data Analysis and document review
While Data Analysis is running, reviewers will not be able to add/edit/delete entities or PI on documents. However, to reduce time to review, the Unstructured and Structured PI Detection stages follow a document streaming approach. This means that as individual documents finish the PI Detection stage they become available for review. Blocklisting, Entity Normalization & Deduplication, and Compile Insights do not follow this approach and all documents must finish processing before they become available for review, and this should be taken into consideration when selecting what stages to run.
All documents are assigned a Data Analysis Status to indicate it’s availability for review:
- Data Analysis run required—the initial status after ingestion. Indicates Data Analysis has not yet been run on the document.
-
Not ready for review—the document is currently being processed through Blocklisting, PI Detection, or Compile Insights. The status will change to Not ready for review when Data Analysis is run and reviewers will not be able to edit the document.
-
Running normalizer—the document is currently being processed through Entity Normalization & Deduplication. The status will change to Running normalizer when Entity Normalization & Deduplication is running and reviewers will not be able to edit the document.
- Ready for review—the document has finished processing through PI Detection and/or the Data Analysis run is complete. Reviewers are able to edit the document.
You can view a document’s Data Analysis Status on the Project Dashboard Document List and the field can be searched on using PI and Entity Search.
Canceling and retrying Data Analysis
You can stop Data Analysis at any time while it is in progress. To stop it, select the Cancel Project button in the Project Actions console.
If a stage fails or Data Analysis is manually stopped, it can be restarted using the Retry button in the Progress card. Data Analysis will restart from the failed or interrupted stage when retrying.
To start a new run, select Run Data Analysis in the Project Actions console.
Document errors
The Errors field will indicate the number of documents that have errored or encountered an issue during that stage. Select View Errors or the View Errored Documents button in the console to view these documents and their specific errors on the Project Dashboard. For more information on how to address these, see Document Flags.
Data Analysis history
Data Analysis, like other complex features in Relativity, provides the option in the View Run History modal for gathering audits of various runs.
The following information is available in View Run History:
Run details
- Status—the status of the Data Analysis run
- Duration—the run time
- Start Time—the date and time the run was started
- End Time—the date and time the run ended
Stage history
- Stage—the name of the stage
- Status—the status of stage
- Start Time—the date and time the stage was started
- End Time—the date and time the stage ended
- Duration—the run time of the stage

Note: Incorporate Feedback has been moved to the Data Analysis tab. The Incorporate Feedback page in Privacy Workflow will be deprecated on March 17, 2025. Please begin using the Data Analysis page prior to this date.
The Incorporate Feedback pipeline is a combination of machine stages that make predictions, perform calculations, curate machine output, and generate reports.
Permissions
Incorporate Feedback is only available for users assigned the role of Lead.
Incorporate feedback pipeline stages
Process | Description | Sub Stages |
---|---|---|
Run Detectors | This process incorporates user feedback to train the machine learning models and then identifies personal information across non-reviewed documents by running all the PI detectors. User annotations on non-reviewed documents will be removed during this step and replaced by machine predictions. |
|
Run Excel Detectors | This process incorporates user feedback to train the machine learning models and then identifies personal information across non-reviewed spreadsheet documents by running all PI detectors. |
|
Process Excel Detections | Collects all names and PI from spreadsheet documents and creates linkages between them. This must be done at least once to merge individuals and create the Notification Report. | |
Deduplicate Individuals | Merges duplicate individuals based on unique names and personal information. This process reduces the number of rows in the Notification Report. |
|
Generate Reports | Generates all the reports on the ‘Report Generation’ page. In particular, the Document Centric and Entity Centric Reports. |
Navigating to incorporate feedback
Navigate to the Incorporate Feedback tab on the left-side dashboard. Incorporate Feedback is used exclusively by the Project Lead to provide feedback to the model.
Overview
When Incorporate Feedback is running, the Overview tab shows information about in progress stages of the pipeline. When Incorporate Feedback is not running, it displays information about the last run. Information about previous runs of the Incorporate Feedback pipeline can be viewed by using the Select Round to View drop down.
The Overview tab contains the following information:
- Status—the progress of the Incorporate Feedback pipeline as a whole. The pipeline status can be:
- In Progress— incorporate Feedback is running.
- Completed— incorporate Feedback has completed successfully.
- Completed with Failures— incorporate Feedback has completed with errors.
- Stage—the name of the pipeline stage.
- Start Time—the time a stage started running.
- End Time—the time a stage stopped running.
- Duration—the run time of a stage.
- Progress—an indicator of stage progress when Incorporate Feedback is running.
- Status—the status of a stage. Statuses can be:
- Not Started—the stage has not begun.
- Still Running—the stage is in the middle of processing.
- Completed— the stage has finished processing successfully.
- Completed with Failures—the stage has finished processing and some items encountered failures during processing.
- Failed—the stage has finished processing and many items encountered failures during processing, so the stage is considered to have failed.
- Skipped—the stage was not run.
- Interrupted—the stage was stopped in the middle of processing.
In progress details
The In Progress Details tab is only populated while the pipeline is being run. The information that appears while the pipeline is running is dependent on what substages are running.
Document based substages
The following Incorporate Feedback stages process information on a document level:
- SPREADSHEET_REGEX
- Excel Tag Status Processing
- Generate Spreadsheet Linkages
- Overlap Removal
- PDF Annotation Generation
- Document Statistics
- Document Scoring
- Precision and Recall
- Document Indexing
While these stages are running, the In Progress Details tab contains the following information:
- Document ID—the ID of the document being processed
- Duration—the amount of time the document was being processed for
Detector based substages
The following Incorporate Feedback stages process information on a detector level:
- Detector Training
- Machine Learning Detection
While the detector based substages are running, the In Progress Details tab will contain the following information:
- Detector—the detector that was run/is being run
- Duration—the amount of time the detector was run
Blocklisting
The Blocklisting substage processes individual items to blocklist. The In Progress Details tab will contain the following information when Blocklisting is running:
- Blocklist Items—the item to be blocklisted
- Duration—the amount of time the item was run
Report based substages
The Report Generation stage is responsible for generating the following reports:
- Document Report
- Entity Centric Report
- Reviewer Progress
- Unlinked PI Log
- Merge Reason
The In Progress Details tab contains the following information while Report Generation is running:
- Report—the name of the report
- Duration—the amount of time the report was being processed for
Entity normalizer
While the Normalizer substage is running, the In Progress Details tab will contain the following information:
- Stage—the stage of Normalizer
- Duration—the amount of time the stage was running
Document errors and non-document errors
Document Errors
The Document Errors tab shows documents that encountered errors while Incorporate Feedback was running, and what those errors are. For a detailed description of possible errors and flags and their resolutions, see the Documentation on Errors and Flags. The following information is displayed on this tab:
- Document ID—the ID of the impacted document.
- Document Flags—the error flag applied to the document during Incorporate Feedback.
- Error Message—the error message.
- Detection Stage—the stage that the document encountered the error.
Non-document errors
The Non-Document Errors tab shows errors that occurred during non-document-based processes. For a detailed description of possible errors and flags and their resolutions, see the Documentation on Errors and Flags. The following information is displayed on this tab:
- Error Type—the type of error that occurred.
- Error Message—the error message.
- Errored Item Name—the item that caused the error.
Running the pipeline
This section provides instructions for running the incorporate feedback pipeline and lists common use cases when running the pipeline.
To run incorporate feedback:
- Click the Run Incorporate Feedback button.
- Select the processes to run.
- Click Yes to start the pipeline.
Choosing stages to run
On each run you can configure the pipeline to run all, or some, stages.
Depending on what the goal of running the pipeline is, it may be helpful to only select some stages to run. Common use cases when running the pipeline include:
Case 1: Running the pipeline for the first time
- Run Detectors
- Run Excel Detectors
- Generate Reports
Case 2: Running the pipeline during QC review
QC review primarily focuses on refining detectors and potentially blocklisting false hits. At this stage, having an up-to-date Entity Centric Report is not the priority. For this reason and to reduce the pipeline runtime, run the following stages only:
- Run Detectors
- Run Excel Detectors
- Generate Reports
Case 3: Running the pipeline during review
Just as in the the QC process, detectors may be refined during Review. You can choose to run the same steps in Case 2 if you wish to make detector or blocklist updates during Review.
If you wish to just generate updated versions of the Reviewer Progress and/or Document Report, run the following stage only:
- Generate Reports
Case 4: Running the pipeline during normalization
During the deduplication process, entities may be merged using the Deduplicate and Normalize Entities tool, entities may be unmerged, or Deduplication Settings may be updated. It is not typical that detectors are updated at this stage. When making changes related to entity normalization, run:
- Process Excel Detections
- Consolidate Individuals to provide the most up to date merges between individuals based on manual merges/unmerges and deduplication setting changes.
- Generate Reports to generate an up-to-date version of reports.
Stopping the pipeline
You can stop Incorporate Feedback at any time while it is in progress. To stop the pipeline:
- Select the Stop Pipeline button.
- A modal will appear. Click Yes.
- To restart the pipeline, follow the instructions in Restarting the Pipeline.
Restarting the pipeline
If a stage fails at any time while Incorporate Feedback is running or a user stops the pipeline, you can restart it. There are two ways to restart Incorporate Feedback:
Using Retry Stage
Using the Retry Stage functionality will only restart Incorporate Feedback from the selected stage. To use Retry Stage:
- In the Actions column, open the ellipsis menu for the stage that will be retried.
- Click Retry Stage.
Using Retry
The option to Retry the entire pipeline appears when the whole pipeline has stopped due to failure. Clicking Retry here will restart the pipeline from the failed step.
- Click the Retry button.
- A modal will appear. Click Yes.
Troubleshooting
The Progress indicator for a stage is not updating
The progress bar may appear to hover at a certain percentage or near completion for a while. To further investigate:
- Navigate to the In Progress Details tab.
- Observe the list of in-progress documents:
- If the list is being updated with new documents, the stage is still in progress and will require more time to finish.
- If the pipeline appears to be stuck on one document, proceed to step 3.
- Stop the pipeline. Reference Stopping the pipeline for instructions.
- Retry the stage. Reference Using Retry for instructions. If it fails again, please contact your project manager.