Analytics engine heap memory exceeds 95% on at least one host

Description

Monitor the heap memory usage of the Analytics engine. Alert if heap usage reaches or exceeds a defined threshold while a job is actively running, indicating potential performance degradation or memory leaks.

Resolution Guidance

Impact When Active

When this alert is active, it indicates that the heap memory on the Analytics engine has reached or exceeded a critical threshold. This can result in:

  • Slower processing of large data sets
  • Job failures or unexpected behavior due to insufficient memory
  • Potential memory leaks causing long-term stability issues

How To Resolve

  1. Identify Active Analytics Jobs
    • Check which jobs are currently running on the Analytics engine. Focus on:
      • Large index builds
      • Structured analytics jobs
      • Concurrent operations
      • Use Relativity Job Monitor or database queries to identify active workloads.
  2. Review Heap Usage
    • Query the memory_used_pct field in metrics-* index to confirm actual heap usage.
    • If memory usage remains consistently over 90–95%, continue with steps below.
  3. Evaluate JVM Heap Size Configuration Check current JVM settings (-Xmx and -Xms) in env.cmd file: \CAAT\bin\env.cmd

General Sizing Guidelines (from Relativity):

Server Role Recommended -Xmx
Structured Analytics only ~85% of total RAM (leave 10 GB for DB)
Indexing only ~85% of total RAM (leave 10 GB for DB)
Combined (Indexing + Structured) ~85% of total RAM (leave 10 GB for DB)
Copy
Adjust -Xmx accordingly and restart the CAAT service for changes to apply.

Follow the instructions provided in the [Relativity documentation](https://help.relativity.com/Server2024/Content/System_Guides/Environment_Optimization_Guide/Configuring_the_Analytics_server.htm#JavaheapsizeJVM) for configuring the CAAT environment.
  1. Optimize or Restructure Workloads
    • Break up large analytics jobs into smaller batches.
    • Optimize training sets and reduce number of documents per job.
    • Avoid concurrent resource-intensive jobs on the same server.
  2. Disable Unused Analytics Indexes
    • Navigate to any unused Analytics indexes and click “Disable Queries” to free RAM.
    • Use the MaxAnalyticsIndexIdleDays setting to automate this.
  3. Restart CAAT if Memory Is Fully Consumed
    • If the Analytics engine becomes unresponsive:
    • Restart the Relativity Analytics Engine (CAAT) Windows service.

Long-Term Recommendations

  • Monitor heap usage trends using telemetry or APM tools.
  • Increase physical memory if usage consistently trends high.
  • Scale horizontally by adding dedicated servers for indexing or structured analytics.
  • Follow Relativity's memory formula: Documents × 6000 = JVM bytes required e.g., 1M docs ≈ 6 GB heap

Alert Details

Alert Condition Details

Name Value Description
Rule Type Elasticsearch query
Data View metrics-*
Filter Query FROM metrics-* EVAL memory_used_pct = (jvm.memory.used / jvm.memory.limit) * 100 WHERE memory_used_pct > 95 KEEP jvm.memory.used, jvm.memory.limit, memory_used_pct,* To fetch the data when analytics engine heap memory usage exceeds 95%
Threshold > 95% When analytics engine heap memory usage exceeds 95% alert triggers
Time Window 5min Verified data for last 5 minutes
Rule schedule 1 minute Checks for every 1 minute

Alert Metric Details

Metric Name: memory_used_pct

Metric Description: Calculates the percentage of memory used by dividing jvm.memory.used by jvm.memory.limit and multiplying by 100. An alert is triggered when memory_used_pct exceeds 95%, indicating that the Analytics engine heap memory usage is critically high

Metric Attributes:

Attribute Name Description Value
jvm.memory.limit Indicates the maximum memory allocation for the JVM in bytes Amount of memory available to the JVM
jvm.memory.used Indicates the amount of memory used by the JVM in bytes Amount of memory used by the JVM