Elasticsearch system requirements

Depending on your infrastructure tier, you have different server specifications and recommendations for the Elasticsearch cluster available to you. Elasticsearch is built on a distributed architecture made up of many servers or nodes. A node is a running instance of Elasticsearch (a single instance of Elasticsearch running in the JVM). Every node in an Elasticsearch cluster can serve one of three roles.

  • Master nodes are responsible for managing the cluster.
  • Data nodes are responsible for indexing and searching of the stored data.
  • Client nodes are load balancers that redirect operations to the node that holds the relevant data, while offloading other tasks.

Set up an entirely separate cluster to monitor Elasticsearch with one node that serves all three roles: master, data, and client. While this setup doesn’t take advantage of the distributed architecture, it acts as an isolated logging system that won’t affect the main cluster.

See the following related pages:

Infrastructure considerations

Consider the following factors when determining the infrastructure requirements for creating an Elasticsearch environment:

  • Infrastructure tier – When you build out your initial Relativity environment, we use these measures to determine a tier level of 1, 2, or 3. This tier level takes into consideration the number of users, SQL sizes, and the amount of data and activity in your system.
  • Virtual versus physical servers – Although Elastic recommends physical servers, our implementation doesn't require physical servers. Virtual servers can be implemented for all nodes.
  • Storage type – Elasticsearch is a distributed system and you should run it on storage local to each server. SSDs are not required.
  • Network connectivity – Because of the distributed architecture, network connectivity can impact performance, especially during peak activity. Consider 10 GB as you move up to the higher tiers.
  • Client nodes – Larger clusters that do not perform heavy aggregations (search against your data), may perform better without client nodes. Simply use a master and data node configuration with a load balancer to handle data in your cluster.

Note: Elasticsearch won't t allocate new shards to nodes once they have more than 85% disk used.

Other considerations

  • Shield is one of the many plugins that comes with Elasticsearch. Shield provides a username and password for REST interaction and JWKS authentication to Relativity. JWKS is already running on your Relativity web server.
  • The Elasticsearch cluster uses the certificate from a Relativity web server or a load balanced site for authentication to Relativity.
  • You can set up the nodes for TLS communication node to node. TLS communication requires a wild card for the nodes that contains a valid chain and SAN names. This is highly recommended for clusters that are in anyway exposed to the internet. You can request a script which can be used against an installation of OpenSSL to create the full chain that is not readily available. All of the certificates are contained within a Java keystore which is setup during installation by the script. To request this script, contact Relativity Support.

    If you have a chain of certificates with a wild card certificate and private key that contains SAN names of the servers, you can use those certificates to build the Java keystore for TLS.

Elasticsearch cluster system requirements

The number of nodes required and the specifications for the nodes change depending on both your infrastructure tier and the amount of data that you plan to store in Elasticsearch.

    Notes:
  • These recommendations are for audit only.
  • Disk specs for data nodes reflect the maximum size allowed per node. Smaller disk can be used for the initial setup with plans to expand on demand.

Test (500 GB)

Node type # of nodes needed CPU RAM DISK (GB)
Primary/Data 1 4 32 500

Tier 1 (1 TB)

Node type # of nodes needed CPU RAM DISK (GB)
Primary/Data 1 4 32 1000
Data 1 4 32 1000

Tier 2 (3TB)

Node type # of nodes needed CPU RAM DISK (GB)
Primary/Data 3 4 32 2000

Tier 3 (4-15 TB)

Node type # of nodes needed CPU RAM DISK (GB)
Data 1-15 (scale on demand) 4 32 2000
Primary/Data 3 4 8 2000

To assess the sizes of a workspace’s activity data and extracted text, contact Relativity Support and request the AuditRecord and ExtractedText Size Gatherer script.

If you have further questions after running the script, our team can review the amount of activity and monitoring data you want to store in Elasticsearch and provide a personalized recommendation of monitoring nodes required.