Auto-populate fields by training custom ML models
Custom ML Extraction is a way to teach a machine learning model how to auto-populate fields by using previously coded examples as training data.
As an alternative to using a regular expression to auto-populate a field, Custom ML Extraction will rely on your use of Contract's send to field feature in the viewer to manually populate fields for many examples. Those examples will then be used as guidance for a new machine learning model you can build and train from within the same workspace.
Contracts auto-populates fixed-length and long text fields.
Create and train a custom model
To auto-populate fixed-length text and long text field types:
- Open the Contracts Models tab.
- Click New Contracts Model.
- Name your model.
- For Model Type, select ML Extraction.
- Add a description.
- On the Training Notifications card enter email addresses, separated by lines, for users that should receive notifications when the model training completes.
- On the Populate Existing Field card, select the fixed-length text or long text field you want to populate.
-
On the Model Training card, select the Field To Train On. This is the field you coded as examples.
Select a Training source. This is the saved search that holds all the documents for which you sent-to-field. -
For Existing Value Options, if there is already a value in the field you are auto populating, you can control whether you'd like Contracts to:
- Prepend—put the new result before that text.
- Append—put the new result after that text.
- Replace—replace that text.
- Do nothing—leave that text.
For date fields, you can only replace or do nothing. -
On the Store Additional Text Around Result card, you can select a long text field to store the text of the section, if segmented, in which the auto-populating result is found.
Note: If the document does not have sections, there are some advanced options for storing text around your regex result.
- You can tell Contracts to leave the field blank if there is no section.
- You can tell Contracts to default to a certain number of words before/after the result if there is no section.
- Even if the document does have sections, you can tell Contracts to ignore them and only look for a certain number of words before or after the result.
- Click Save.
- Click Train Model.
Run a custom model
To run a custom model:
- Open the Contracts Analysis Profiles tab.
- Click New Contracts Analysis Profile.
- Name your analysis profile.
- Add a description.
- On the Extraction tab, add the model you created.
- Click Save.
To use your analysis profile:
- Go to the Contracts Analysis Sets tab.
- Click New Contracts Analysis Set.
- Name your analysis set.
- Choose the analysis profile you created.
- Select a saved search to run the analysis on.
- Click Analyze.
Frequently asked questions
The best way to assess the quality of your custom extraction model is to run it on a set of documents and evaluate the results. You can also gauge, at a high level, how your model did by looking at the Precision/Recall/F1 scores auto-populated for each model on the training page. Note that these numbers are a guide and should not be a replacement for running the model to evaluate the results on a document set.
More diverse training data will help improve the model. The best place to start is by fixing issues where the model missed or made mistakes. While models will not find data with 100% accuracy, the more diverse the data, the better your model's accuracy.
You can create, train, and run as many models as you want. You are only limited by the number of Contracts Training Agents, as this is the agent that does the training work.
There is no way to differentiate between the two aside from manually coding a separate field for tracking purposes.
Fixed-length text and long text fields with more field type support are coming in future releases.
You can only use models in the same workspace in which you trained them. You can use those models as many times as you want.
It depends on the complexity of the data point you are trying to extract. You can train some with as few as 50 and some could require 200+ examples. The best way to know is using the Precision/Recall/F1 scores in addition to your evaluation of the results after running the model on a document set.
Yes, Contracts supports populating the same field as training.
Many users can gather training data and manage model administration.
You can use whatever documents you want to train your model. Using client data on the project you are working on will be best, as it represents the data set you’d be using to run your model.
Any time you need to populate a text field with a key data point from a contract across a data set, you have a use case for building a custom extraction model to auto-populate that field.