Auto-populate fields by training custom ML models
Custom ML Extraction is a way to teach a machine learning model how to auto-populate fields by using previously coded examples as training data.
As an alternative to using a regular expression to auto-populate a field, Custom ML Extraction will rely on your use of Contract's send to field feature in the viewer to manually populate fields for many examples. Those examples will then be used as guidance for a new machine learning model you can build and train from within the same workspace.
Contracts auto-populates fixed-length and long text fields.
Create and train a custom model
To auto-populate fixed-length text and long text field types:
- Open the Contracts Models tab.
- Click New Contracts Model.
- Name your model.
- For Model Type, select ML Extraction.
- Add a description.
- On the Training Notifications card enter email addresses, separated by lines, for users that should receive notifications when the model training completes.
- On the Populate Existing Field card, select the fixed-length text or long text field you want to populate.
-
On the Model Training card, select the Field To Train On. This is the field you coded as examples.
Select a Training source. This is the saved search that holds all the documents for which you sent-to-field.
-
For Existing Value Options, if there is already a value in the field you are auto populating, you can control whether you'd like Contracts to:
- Prepend—put the new result before that text.
- Append—put the new result after that text.
- Replace—replace that text.
- Do nothing—leave that text.
For date fields, you can only replace or do nothing.
-
On the Store Additional Text Around Result card, you can select a long text field to store the text of the section, if segmented, in which the auto-populating result is found.
Note: If the document does not have sections, there are some advanced options for storing text around your regex result.
- You can tell Contracts to leave the field blank if there is no section.
- You can tell Contracts to default to a certain number of words before/after the result if there is no section.
- Even if the document does have sections, you can tell Contracts to ignore them and only look for a certain number of words before or after the result.
- Click Save.
- Click Train Model.
Run a custom model
To run a custom model:
- Open the Contracts Analysis Profiles tab.
- Click New Contracts Analysis Profile.
- Name your analysis profile.
- Add a description.
- On the Extraction tab, add the model you created.
- Click Save.
To use your analysis profile:
- Go to the Contracts Analysis Sets tab.
- Click New Contracts Analysis Set.
- Name your analysis set.
- Choose the analysis profile you created.
- Select a saved search to run the analysis on.
- Click Analyze.
Frequently asked questions
How do I know if my model is good or not?
The best way to assess the quality of your custom extraction model is to run it on a set of documents and evaluate the results. You can also gauge, at a high level, how your model did by looking at the Precision/Recall/F1 scores auto-populated for each model on the training page. Note that these numbers are a guide and should not be a replacement for running the model to evaluate the results on a document set.
If my model isn’t good, how can I improve it?
More diverse training data will help improve the model. The best place to start is by fixing issues where the model missed or made mistakes. While models will not find data with 100% accuracy, the more diverse the data, the better your model's accuracy.
How many training examples do I need to create?
It depends on the complexity of the data point you are trying to extract. You can train some with as few as 50 and some could require 200+ examples. The best way to know is using the Precision/Recall/F1 scores in addition to your evaluation of the results after running the model on a document set.
What are some use use cases?
Any time you need to populate a text field with a key data point from a contract across a data set, you have a use case for building a custom extraction model to auto-populate that field.