Backing up Relativity Data Grid

Data Grid stores text data in your file share. There is no automatic redundant storage of document text. Because Data Grid text data is stored in your file share, you may need to adjust your file system data backup frequency to meet the requirements of your Service Level Agreements (SLAs) and disaster recovery plans.

Backing up Elasticsearch

We recommend routinely backing up your data. Elasticsearch replicas provide high availability during run time, allowing toleration of sporadic node loss without interruption of service, but replicas don't provide protection against catastrophic failures. Create a complete backup of the entire cluster to protect your data if something goes wrong.

You can use the snapshot API to create a backup of the cluster. The snapshot API saves the current state of all data in your cluster to a shared repository. The first snapshot you create is a complete copy of all data on the cluster.

Each subsequent snapshot compares the current state of the data in the cluster to the data stored in the repository and only modifies the differences between the two.

The snapshot API incrementally edits the repository each time you create a new snapshot, so subsequent backups are significantly faster since they require less data transmission. This page explains all steps necessary to back up and restore Elasticsearch.

Note: Data Grid supports Windows servers only.

Creating a repository

Before implementing this backup method, you must create a repository that can store snapshots. You can use any of the following four repository types:

  • Shared file system, such as a NAS
  • Amazon S3
  • Hadoop Distributed File System (HDFS)
  • Azure/Entra Cloud

Use the following steps to create and share a folder:

  1. Create a folder named ElasticBackup to store snapshots. (//COMPUTER_NAME.business.corp/ElasticBackup).
  2. Right-click on the folder, and then click Properties.
  3. Select the Sharing tab, and then click Share.
  4. Enter the user that runs the Elasticsearch Windows service (domain\account), and then click Add.

  5. Select the user on the share list and set the Permission Level to Co-owner.
  6. Click Share.
  7. When the share completes, click Done.
  8. On the Document Properties dialog, select the Security tab.
  9. Verify that the user that runs the Elasticsearch Windows service has Full Control security permissions to the folder.

Use the following steps to link Elasticsearch to the repository folder:

  1. Launch Marvel from within a browser to connect to one of the nodes in your cluster.
  2. Launch Sense from the Dashboards drop-down near the top right.
  3. Edit the location value and run the following to set up a shared file system repository:

    PUT /_snapshot/my_backup
    {
        "type": "fs",
        "settings": {
          "location": "//COMPUTER_NAME.business.corp/Shared/ElasticBackup",
          "compress": true
        }
    }

  4. Verify your snapshot settings exist by performing the following call:

    GET /_snapshot/

Creating snapshots

There are two ways to create snapshots:

  • Creating snapshots manually from within Sense
  • Scheduling a Windows task using Curator

Creating snapshots manually from within Sense

Run the following to back up all open indexes into a snapshot named "snapshot_1".

PUT /_snapshot/ElasticBackup/snapshot_1

Note: Increment the name of the snapshot for best results (e.g., snapshot_1, snapshot_2, snapshot_3, etc.). All alphabetical characters in the snapshot name must be lowercase.

Verify that this process created a backup by navigating to the following location:

//COMPUTER_NAME.business.corp/Shared/ElasticBackup

Your backup should look similar to the following image:

For more information on snapshot commands, including the ability to snapshot specific indexes, see Backing up your cluster on the Elasticsearch website for your version of Elasticsearch.

Scheduling a Windows task using Curator

The best way to schedule automatic backups of your data is to use Curator, which you can combine with scheduled tasks to automatically invoke the desired behavior.

The Curator Python API can be used to manage indexes and snapshots with the following features:

  • Iterative methods - allow you to retrieve data across the cluster within specified parameters.
  • Non-iterative methods - allow you to retrieve data within a single index or snapshot.
  • Helper methods - allow you to retrieve values required to complete iterative and non-iterative methods.

For more information on Curator and snapshot capabilities, see Snapshot on Github.

Installing Curator 4

Before setting up Curator, you must complete the following: 

  1. Download and run the Curator installer from Elastic.
  2. Download and install the Microsoft C++ redistributable.

Running Curator

Once you install Curator, you can use it to run “actions” which are created in the action.yml file. Use the following command to run an operation in the action.yml file:

Note: This command runs a dry-run where Curator simulates the action(s) in the file without making any changes. To actually run the operation, remove the --dry-run flag.

curator [--config CONFIG.YML] [--dry-run] ACTION_FILE.YML

This command references both a configuration and action file. You can run PowerShell scripts to automatically create the files needed to run Curator.

Running Curator from PowerShell

The following PowerShell scripts automatically create the PS1 and YML files needed to run Curator. It also sets up three Windows tasks: backup, backup cleanup, and Marvel cleanup.

Running Curator manually

The following sections contain examples of the different action files you can create manually to run in Curator. You can save these YML files anywhere. Ensure you use a full path to the files when executing the command.

Sample configuration file

Sample backup action file

Sample restore action file

Backup script with email service

You can also use Curator to send an email if an action failed or succeeded.

The script in the example uses a global variable to run and will not work if credentials are needed and it is run outside of the ISE, which contains that global variable. To create a global variable for your ISE session, run the following:

$global:cred = get-credential  (Run this to create a global variable for your ISE session)

Setting the script as a scheduled task

Use the following steps to set the script as a scheduled task:

  1. Click Start > Administrative Tools > Windows Task Scheduler on the system that runs scheduled tasks.
  2. In the Task Scheduler, click Create Task under Actions on the right.
  3. Enter a name and description for the task. (Entering a description is optional.)
  4. Navigate to the General tab, and then select Security Options.
  5. Specify the user account that runs scheduled tasks. The account can be the same one that runs the Elasticsearch Windows service.
  6. Edit the settings to run tasks regardless of whether or not the user is logged in.

  7. Navigate to the Triggers tab, and then click New to add a new trigger for the scheduled task.
  8. Verify that the Begin the Task field is set to On a schedule, and then set the start date to your preferred time.
  9. Set the frequency to be every one hour if you're unsure what your recovery point objective goals are.

    Note: Relativity stores the last 90 days of audits for each workspace in SQL Server. Long text fields, like extracted text, are usually never edited post import.

  10. Set the duration of the task to run indefinitely.
  11. Click OK.
  12. The following example has the task running every hour indefinitely:

  13. Navigate to the Actions tab, and then click New.
  14. Set the Action to Start a program.
  15. In the Program/script field, enter "Powershell."
  16. In the Add arguments (optional) field, enter the following value:
    .\[Your PowerShell Script Name]

    For example, if your PowerShell script is named "Migration1.ps1" then you would enter ".\Migration1.ps1" as the value.

  17. In the Start in (optional) field, add the location of the folder that contains your PowerShell script. In this example, the script directory is C:\Script.

    The location entered in the Start in box also stores the scheduled task run times, the job history for the copies, and any additional logging that may occur.

  18. Click OK after configuring your preferred settings.
  19. Set any other preferred settings in the Conditions and Settings tabs. You can also set up an additional action to email an system admin each time the script runs.
  20. Click OK.

When you complete these steps, the task runs according to your settings.

Restoring a snapshot

There are multiple methods for restoring snapshots. Restoring snapshots from the Elasticsearch head console is the recommended procedure, but you can also use cURL to restore snapshots. The Elasticsearch website's documentation relies heavily on cURL commands for snapshot restoration. Brief descriptions of both methods are provided here.

Restoring snapshots from the Elasticsearch head console

You can restore a snapshot from the Elasticsearch head console. Use the following steps to restore a snapshot with this method:

  1. Navigate to the Elasticsearch head console URL (http://localhost:9200/_plugin/head).
  2. Expand the Query tab.
  3. Enter the following URL in the first field: http://localhost:9200/
  4. Enter _search in the second field, and use the drop-down menu to select GET.
  5. Enter the following code to retrieve your snapshot: 
    {
    	"type": "fs",
    	"settings": {
    		"location": "/mount/backups/my_backup",
    		"compress": true
    	}
    }
    		

Note: You can restore a snapshot on a functioning cluster, but all indexes residing on the cluster must be closed. The restore only updates closed indexes and creates a new index for any index that doesn't already exist on the cluster.

Restoring snapshots with cURL

Most of the documentation on the Elasticsearch website relies on cURL commands to restore snapshots. You must install cURL for Windows in order to have access to cURL commands in a Windows environment. You can download cURL for Windows here.

Note: You can paste cURL commands into Marvel Sense (excluding the $ character), and Marvel automatically converts the cURL command into JSON. The cURL command doesn't convert if typed in manually, you must paste it from your clipboard.

Once you install cURL for Windows, you can restore the cluster state and all indexes in a snapshot with the following cURL command:

$ curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"

Note: You can restore a snapshot on a functioning cluster, but all indexes residing on the cluster must be closed. The restore only updates closed indexes and creates a new index for any index that doesn't already exist on the cluster.

For more information on creating or restoring snapshots, see Snapshot modules on the Elasticsearch website.