HTTP Health Check - at least one application endpoint failed

Description

This alert is triggered when a health check, which is a special HTTP request that is designed to assess application health, returns a status code other than 200.

Resolution Guidance

Impact When Active

When this alert is triggered, actions involving the application that’s failing the HTTP health check request may error out depending on the nature of the HTTP endpoint failure with regard to its health check. This may cause issues and failures to the end user and any Relativity Application that they are using.

How To Resolve

  • Review logs by host name and application to identify the endpoint that is failing.
  • Restart the service that the HTTP endpoint is failing in.
    • If you are going to restart services, make sure there are no active jobs running for that service.
    • If you restart the services on a web server, then you are going to disrupt your users. Make sure you warn them or notify them that there will be a disruption to the application.
  • You can try:
    • There is a possibility that the service is running but the HTTP health check is returning an error. If that is the case, you can restart the one specific service that the health check alert is coming from. You can only do this if the service process is running.
    • Follow the instructions found on this page to read more on how to do this: Server - How to view Kepler micro-services on an Agent or Web Server
    • You may just want to restart the service.

Alert Details

The alert is active for the following conditions in Relativity:

  • When an HTTP Health Check Request has failed and returns status code other than 200.

Alert Condition Details

Name Value Description
Rule Type Elastic Query
Data View metrics-*
Filter Query NOT numeric_labels.http_status_code : 200 AND labels.http_url : imaging%20health%20check%20service AND httpcheck.status: * HTTP request of application is inaccessible
Threshold > 0 Count greater than 0, alert triggers
Time Window 5 min Verified data for last 5 min
Frequency 1 min Checks for every 1 min

Alert Metric Details

Metric Name:

  • NOT numeric_labels.http_status_code : 200 AND labels.http_url : imaging%20health%20check%20service AND httpcheck.status:

Metric Description:

  • This metric monitors HTTP response status codes for an application endpoint. The alert triggers when the status code is anything other than 200, indicating a potential service failure or unavailability.

Metric Attributes:

Attribute Name Description Value
labels.http_url HTTP request endpoint Example: https://emttest:8990/Kepler/relativity.imaging.services.interfaces.private.healthcheck.iimagingHealthCheckModule1/imaging%20health%20check%20service/getenvironmentstatusasync
httpcheck.status HTTP request status 0/1
numeric_labels.http_status_code Status code of HTTP request 200/404/500
labels.http_status_class HTTP request stages 1XX/2XX/3XX/4XX/5XX