bfec0546-118b-453e-a058-e64038639084
HTTP Health Check - at least one application endpoint failed
Description
This alert is triggered when a health check, which is a special HTTP request that is designed to assess application health, returns a status code other than 200.
Resolution Guidance
Impact When Active
When this alert is triggered, actions involving the application that’s failing the HTTP health check request may error out depending on the nature of the HTTP endpoint failure with regard to its health check. This may cause issues and failures to the end user and any Relativity Application that they are using.
How To Resolve
- Review logs by host name and application to identify the endpoint that is failing.
- Restart the service that the HTTP endpoint is failing in.
- If you are going to restart services, make sure there are no active jobs running for that service.
- If you restart the services on a web server, then you are going to disrupt your users. Make sure you warn them or notify them that there will be a disruption to the application.
- You can try:
- There is a possibility that the service is running but the HTTP health check is returning an error. If that is the case, you can restart the one specific service that the health check alert is coming from. You can only do this if the service process is running.
- Follow the instructions found on this page to read more on how to do this: Server - How to view Kepler micro-services on an Agent or Web Server
- You may just want to restart the service.
Alert Details
The alert is active for the following conditions in Relativity:
- When an HTTP Health Check Request has failed and returns status code other than 200.
Alert Condition Details
Name |
Value |
Description |
Rule Type |
Elastic Query |
|
Data View |
metrics-* |
|
Filter Query |
NOT numeric_labels.http_status_code : 200 AND labels.http_url : imaging%20health%20check%20service AND httpcheck.status: * |
HTTP request of application is inaccessible |
Threshold |
> 0 |
Count greater than 0, alert triggers |
Time Window |
5 min |
Verified data for last 5 min |
Frequency |
1 min |
Checks for every 1 min |
Alert Metric Details
Metric Name:
- NOT numeric_labels.http_status_code : 200 AND labels.http_url : imaging%20health%20check%20service AND httpcheck.status:
Metric Description:
- This metric monitors HTTP response status codes for an application endpoint. The alert triggers when the status code is anything other than
200
, indicating a potential service failure or unavailability.
Metric Attributes:
Attribute Name |
Description |
Value |
labels.http_url |
HTTP request endpoint |
Example: https://emttest:8990/Kepler/relativity.imaging.services.interfaces.private.healthcheck.iimagingHealthCheckModule1/imaging%20health%20check%20service/getenvironmentstatusasync |
httpcheck.status |
HTTP request status |
0/1 |
numeric_labels.http_status_code |
Status code of HTTP request |
200/404/500 |
labels.http_status_class |
HTTP request stages |
1XX/2XX/3XX/4XX/5XX |