One or more agent servers have not been responding for [n] minutes

Description

This alert is triggered when the Server Manager agent is unable to contact one or more Agent Servers for a specific period of time. This indicates the Agent Server may be unresponsive or has failed to check in.

Resolution Guidance

Impact When Active

  • The Agent Server is likely not functioning correctly.

  • All Agents on the server may stop responding and will not perform their assigned work.

  • Agent check-in times will stop updating and jobs may remain unprocessed.

  • If left unresolved, this may delay workflows or cause job queues to grow.

How To Resolve

  • Log in to the host referenced by the alert.

  • Use the Relativity Service account credentials to access the back end of the Agent Server.

  • Open the Services control panel on the server.

  • Locate this service:

    • kCura EDDS Agent Manager
  • If the service is not running:

    • Right-click on the service name and choose Start.

    • If already running but not responding, right-click and select Stop, then Start to restart it.

  • If the issue persists after restarting services:

    • Open Event Viewer on the Agent Server.

    • Review Windows Logs → Application and System logs for any error messages related to the failing services.

    • Use the information to identify underlying issues preventing the services from starting or functioning properly.

Alert Details

Alert Condition Details

NameValueDescription
Rule TypeElastic QueryTracks check-in status of Agent Servers
Data Viewmetrics-*
Filter Queryrelsvr.agent.disabled : 0 and labels.relsvr_agent_status : "not responding"Agent not responding
GroupCountNumber of Agents not responding
Threshold>= 1Triggered when one or more Agents are not responsive
Time Window1 minVerified data for last 1 minute
Frequency30 secAlert condition checked every 30 seconds

Alert Metric Details

Metric Name:labels.relsvr_agent_status

Metric Description: Indicates the agent status. Possible values are Running, Inactive, and Not Responding.

Metric Attributes:

Attribute NameDescriptionValue
host.nameHostname of the affected Agent Server
labels.agent_nameRelativity Agent Name
labels.agent_type_nameRelativity Agent Type
labels.application_nameApplication NameEnvironment Administration & Operations
labels.exception_messageAny exception message on Agent
labels.messageMessage describes the issueAgent Manager is not responding.
labels.nameName of metricAgent Disabled
labels.relsvr_artifact_idRelativity Agent Artifact Id
labels.relsvr_subsystemAgent Name
labels.relsvr_systemSystem nameAgents
labels.relsvr_agent_statusThe current status of the agentnot responding
labels.relsvr_agent_typeThe name of the agent type of the stale agent
labels.relsvr_resource_server_nameThe name of the server
relsvr.agent.disabledIndicates whether the agent is disabledIf 1, the agent is disabled
Return to top of the page
Feedback