Spreadsheet QC tool

Spreadsheet QC allows a Project Lead to view and edit the headers of columns detected as PI, and view which search terms are being tagged by the machine in spreadsheets.

With Spreadsheet QC you can:

  • View all column header values and search terms.
  • Change the PI types associated with a header value.
  • Blocklist or whitelist search terms.

User permissions

The Settings page is available for users assigned the role of Lead.

To navigate to Spreadsheet QC:

  1. From the Privacy Workflow tab, select the Project settings icon.
  2. Click on Spreadsheet QC.

Headers

Select the Headers tab to view the headers table. The headers table contains the following columns:

  • Check box—use to select multiple header values at a time to make a bulk change.

  • Header Value—the column header value found in the dataset (i.e., the text in the header cell).

  • PI Type Assigned—the Header Value’s predicted or currently assigned PI type (or designation of No PI).

  • # Unreviewed Instances—the number of occurrences of the specific header value in documents that have not yet been reviewed by an annotator.

  • # Reviewed Instances—the number of occurrences of the specific header value in documents that were already annotated.

    Changes in PI type through Spreadsheet QC do not apply to Reviewed Instances.

  • Most Common Tag of Reviewed Instances—the PI type that is most often added or approved by annotators for that header value. In orange (if review has started), the percentage of reviewed instances that were tagged with that PI type.

  • Change PI Type for Next Round of IF—if the value in the PI Type Assigned column does not seem appropriate for the Header Value, this dropdown lets you to select the appropriate PI Type. As indicated in the naming of this column label, any changes will not take effect until after running a round of Incorporate Feedback.

  • See Details—clicking on this shows a breakdown of all PI Types reviewers are assigning to the Header Value, and which documents contain the Header Value.

    • Status—shows whether a header value was reviewed

    • piTagged—shows the PI type selected for the header value

    • # instances—how many times the header value occurs

    • %ofinstances—percentage of times a header value was designated a PI type

    • ViewinDocList—this originally appears as a closed envelope. Clicking on the icon changes it to an open envelope and displays a sidebar showing all the Document IDs with the header value and providing the reviewer a link to review these in the document viewer.

Reviewing header values

You can use Spreadsheet QC features in many ways. The following workflow is an example of one approach to improve the detection of PI in spreadsheets:

Identify headers that may need to be adjusted

To identify headers that should be adjusted:

  1. Click # Unreviewed instances to sort the table by this column. The descending sort puts the headers that appear the most in the dataset on the top, which means that actions applied to these headers are the most impactful.

  2. Review can be done in two ways:

    • Focusing on overbroad spreadsheet predictions via checking Header Values with predicted PI Types.

    • Focusing on potential large groups of headers that may need to be associated with a PI Type by reviewing the No PI results.

  3. If you added custom detectors, you can perform Spreadsheet QC of Header Values for each custom detector.

    • Focus on overbroad spreadsheet predictions via checking any Header Values already assigned to the custom PI Type.

    • Focus on Header Values that may need to be associated with the custom PI Type by reviewing all PI Types.

      1. Leaving the PI Type filter clear for this step will return Header Values which may not have been predicted as any PI Type in addition to Header Values which may need to be reassigned from a different PI Type to the custom PI Type.

      2. Filter for Header Values containing words associated with the custom detector.

  4. Click the filter icon in the top right corner of the table and use the dropdown menu to filter to a specific PI type via the PI Type Assigned option.

  5. The Header Value text box can also be used to filter for Header Values containing the inputted text.

  6. To further investigate a header, click the arrow under the See Details column. This will let the reviewer look at specific documents with these header values.

  7. Look at the documents the header is in to evaluate if it is worth making the sweeping change on the PI type.

Adjust PI type assignments for headers

To adjust PI type assignments for headers:

If you want to change several Header Values to the same PI type, you can do so through mass editing.

  1. Click on the checkboxes to the left of the desired Header Values.

  2. Select the PI Type from the dropdown on the top left corner of the table.

  3. Click Apply to populate the Change PI Type for Next Round of IF column with the desired PI Type.

  4. Click Save at the bottom of the table.

Header Values can also be adjusted on an individual basis.

  1. In the Change PI type for Next Round of IF column, select the appropriate PI type for each header.

  2. Click Save at the bottom of the table.

    Note: Upon refreshing the page, changes disappear from the screen. However, these suggested changes are placed in queue to be updated during the next Incorporate Feedback run.

After adjusting header values:

  1. Run Incorporate Feedback.
  2. After running Incorporate Feedback, the PI Type Assigned column should reflect the changes made in Change PI Type for Next Run of IF.