

Unlike traditional searching methods like dtSearch, Analytics is an entirely mathematical approach to indexing documents. It does not use any outside word lists, such as dictionaries or thesauri, and it is not limited to a specific set of languages. Unlike textual indexing, word order is not a factor.
The basis of conceptual analytics is a conceptual index. This uses Latent Semantic Indexing (LSI) to discover concepts between documents. This indexing process is based solely on term co-occurrence. The language, concepts, and relationships are defined entirely by the contents of your documents and learned by the index. For more information, see Analytics and Latent Semantic Indexing (LSI).
Note: Classification indexes and Active Learning have been replaced by Review Center. The information on this page applies to conceptual indexes.
You can run the following Analytics operations on documents indexed by a conceptual index:
LSI is a wholly mathematical approach to indexing documents. Instead of using any outside word lists, such as a dictionary or thesaurus, LSI leverages sophisticated mathematics to discover term correlations and conceptuality within documents. LSI is language-agnostic, meaning that you can index any language and it learns that language. LSI enables Relativity Analytics to learn the language and, ultimately, the conceptuality of each document by first processing a set of data called a training data source. The training data source may be the same as the set of documents that you want to index or categorize. Alternatively, it may be a subset of these documents, or it could be a completely different set of documents. This training data source is used to build a concept space in the Analytics index.
Using LSI, Analytics inspects all the meaningful terms within a document and uses this holistic inspection to give the document a position within a spatial index. The benefits of this approach include the following:
When you create an Analytics index, Relativity uses the training data source to build a mathematical model called a concept space. The documents you are indexing or categorizing can be mapped into this concept space. While this mathematical concept space is many-dimensional, you can think of it in terms of a three-dimensional space. The training data source enables the system to size the concept space and create the algorithm to map searchable documents into the concept space. In the concept space, documents that are closer together are more conceptually similar than documents that are further from each other.
Throughout Analytics, item similarity is measured using a rank value. Depending on the feature, the rank may be referred to as a coherence score, rank, or threshold. In each scenario, the number represents the same thing.
Because the Analytics engine builds a spatial index, every document has a spatial relationship to every other document. Additionally, every term has a spatial relationship to every other term.
The concept rank is an indication of distance between two items. In the Categorization feature, it indicates the distance between the example document and the resulting document. In Keyword Expansion, it indicates the distance between the two words. The rank does not indicate a percentage of relevance.
For example, when running a concept search, the document that is closest to the query is returned with the highest conceptual score. A higher score means the document is more conceptually related to the query. Remember that this number is not a confidence score or a percentage of shared terms, it is a measurement of distance.
Analytics uses only the documents you provide to make a search index. Because no outside word lists are used, you must create saved searches to dictate which documents are used to build the index. However, if you want to limit search results to certain document groups or have more than one language in the document set, multiple indexes might give you better results.
Note: Permissions for the Search Index object must be kept in sync with permissions on the Analytics Index object.
To create an Analytics conceptual index:
When you save the new index, the Analytics Index console becomes available. See Analytics index console operations.
The Analytics Index Information form contains the following fields:
SELECT LCID, Name FROM sys.syslanguages
SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = ####
If you want to apply item-level or workspace-level security to an Analytics index, you must secure both the Analytics Index object and the Search Index object for that particular index.
Restricting a group from viewing an Analytics Index does not restrict them from searching on the index unless access to the corresponding Search Index is also restricted.
Note: If you are applying item-level security from the Search Indexes tab, you may need to create a new view and add the security field to the view.
A training data source is a set of documents that the system uses to learn the language, the correlation between terms, and the conceptual value of documents. This data source formulates the mapping scheme of all documents into the concept space. Because the system uses this data source to learn, include only the highest quality documents when creating the training data source. The system needs authored content with conceptually relevant text to learn the language and concepts.
Use the following settings when creating a saved search to use as a training data source:
When you select the Optimize training set feature on an Analytics index, you improve the quality of that index by excluding documents that could result in inaccurate term correlations due to their low conceptual value, such as:
To perform this automatic removal of bad documents from the training data source, the Analytics engine evaluates documents based on:
If the optimization excludes a document, the following results are possible:
The data source is the collection of documents to be clustered, categorized, or returned in a concept query. The data source is typically larger than the training data source. There are fewer documents culled from the data source.
Use the following settings when creating a saved search to use as a data source:
When you populate the index, the Conceptual Index multi-choice field on the Document object lists whether a document is included in the data source, training data source, or both. This field is populated every time the index is populated with a full or incremental population. You can use this field as a condition in a saved search to return only training or data source documents.
You can also find data source and training data source documents in the field tree, as well as those which were excluded from training when you enabled the Optimize training set field on the index.
Index sizes are limited by default, and some large documents are excluded from indexing as follows.
By default, indexes are limited by the following parameters:
These limits can be changed by the administrator. For help with adjusting index size limits, contact Relativity Support.
Analytics indexes automatically suppress documents larger than 30 MB before sending them to the Analytics engine. Suppressed large documents will appear in the Document Exceptions. You can also view suppressed documents from the Document list by using the Excluded from Training and Excluded from Searchable Set choices on the Analytics Index Document field.
Once you save the Analytics index, the Analytics index console appears. From the Analytics index console, you can perform the following operations:
To populate the Analytics index on the full set of documents, click Run on the Analytics Index console, then choose Full from the modal that appears. This adds all documents from the data source and training data source to the ready-to-index list. Document “preprocessing” also occurs to clean up text. This includes the following:
Once population is complete, the index builds.
While the index is populating, the following console option becomes available:
After population is complete, you have the option to populate incrementally to account for new or removed documents from the data source and training data source on the ready-to-index list. To perform an incremental build, click Run on the console, then choose Incremental from the modal that appears. See Incremental population considerations for more information.
Note: After building your index, if you want to add any documents that were previously excluded from training back into the training data source document pool, you must disable the Optimize training set field on the index and perform another full population. An incremental population does not re-introduce these previously excluded documents.
After population is complete, the index will build automatically. During this phase, training data source documents and Latent Semantic Indexing (LSI) are used to build the concept space based on the relationships between words and documents. Data source documents are mapped into the concept space, and words on the "stop words" or "noise words" list are filtered from the index to improve quality.
Please note that the index is unavailable for searching during this phase.
You can monitor the progress of any Analytics index process with the progress panel at the top of the layout.
Population and index building occurs in the following stages, which will appear within the progress panel:
The following fields appear in the Document Breakdown section:
Building an index automatically activates it. This makes the index available for users by adding the index to the search drop-down menu on the Documents tab and to the right-click menu in the viewer. All active indexes are searchable.
Once an index is activated, you have the option of deactivating it.
You may need to deactivate an index for the following reasons:
To deactivate an index, click Deactivate Index on the console. A yellow banner will appear at the top of the console.
To reactivate the index, click Reactivate Index on the banner.
Note: If you deactivate an index, you can't run concept searches against the index and keyword expansion becomes unavailable on the index.
If exceptions occur while populating or building an index, the system will retry them automatically.
Retrying exceptions attempts to populate the index again.
Note: You can only populate one index at a time. If you submit more than one index for population, they'll be processed in order of submission by default.
When errored documents are removed from population in an index, they appear on the index console in the Document Exceptions panel. This panel only appears when exceptions exist.
The panel includes the following fields:
To see a list of population statistics, click Show Population Statistics.
This option is available immediately after you save the index, but all rows in this window display a value of 0 until population is started.
This displays a list of population statistics that includes the following fields:
To see an in-depth set of index details, click Show Index Statistics. This information can be helpful when investigating issues with your index.
Clicking this displays a view with the following fields:
There may be times when you need to update your index. Depending on the update you’re making, you can save time by running an incremental population or only running a build. The following table outlines various workflows for different index updates.
Workflow | Index update |
---|---|
Adding new documents that:
|
|
Adding new documents that:
|
|
Removing documents from the data source or training data source |
|
Updating stop/noise words |
|
Updating extracted text For example, updating poor quality OCR text. |
|
Updating filters For example, email header or repeated content filters. |
|
Incremental populations do not necessarily force Analytics to go through every stage of an index build.
When managing or updating indexes with new documents, consider the following guidelines:
You can run an incremental population to add or remove documents from your data source and training data source. This results in an index taking substantially less time to build, and therefore less downtime.
To perform an incremental population, click Run on the console, then choose Incremental from the modal that appears. This checks for changes in both the data source and training data source and updates the index to match.
If extracted text has changed, you have updated the stop/noise words, or you have applied different filters, you must run a full population.
Repeated content filters can be linked to an Analytics index either automatically, using the top filters chosen by the system, or by manually selecting individual filters. These linked filters will only apply to the currently open Analytics conceptual index; they will not be applied to structured analytics sets.
The maximum number of linked repeated content filters per index is 1,000. This includes both manually and automatically linked filters.
By default, when an index runs, it will automatically link the top 200 repeated content filters to the index. These are chosen by multiplying the number of occurrences times word count, then selecting the top 200 in descending order.
The following settings apply when automatically linking repeated content filters:
If an index has both manually and automatically linked filters attached, the manually linked ones will not be changed by the index re-runs and will remain linked. Manually linked filters do not count towards the number in the Repeated content filters to link field, but they do count towards the 1,000 filter maximum.
Use the Repeated Content Filters section on an Analytics index layout to manually link repeated content filters when the Analytics index is not open in Edit mode.
To manually link one or more existing repeated content filters to an Analytics index, perform the following steps:
For more information on repeated content and regular expression filters, see Repeated content filters.
On this page
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!