Creating elasticsearch backups with snapshot/restore

Posted on 2014-12-15 by

In the beginning elasticserch did not really have a backup option. Creating a backup was copying files yourself. Than sometime a go, snapshot/restore was introduced. Using snapshots you can create incremental backups, you can restore complete backups or just a few indices. In this blogpost I am going to explain more about what you can do with snapshots and how it works. Of course I’ll present you with code. I’ll show you how to write java code to interact with the snapshots, on top of it I’ll show you a way to do it using my elasticsearch gui plugin.

Some background

An index, or better a Lucene index, is a set of segments. Each segment is the result of a commit or a merge. Each segment is immutable, that makes it perfect to make a copy of. In the end that is what creating a snapshot does. From a functional perspective, a snapshot is a backup of the state and the data the moment the snapshot was created, or better creating the snapshot started. After some time more segments are added and maybe some segments have been removed due to a merge. Creating a new snapshot means that only the segments that have not yet been copied need to be copied. That is what we call incremental snapshotting.

You store a snapshot in a repository. Each snapshot must have a unique name per srepository. Multiple repository implementations are available. I only make use of filesystem based repositories. Make sure the file system repository is available to all nodes. Other options are Amazon S3, HDFS and Azure Cloud. I will only look at the file system implementation.

The functionalities

The functionalities we want to perform is create a repository and delete a repository. With a repository available we can start creating snapshots, we can delete repositories, obtain the status of a running snapshot creation. And we can also stop the creation of a snapshot. Of course we can also list all repositories and snapshots per repository. The following sections will show you have this can be done using java.

Using java

Before we can do things with elasticsearch we first need to have a connection. Of course we can use the rest interface, but in java we usually do it differently. There are two client to connect to a cluster, the node client and the transport client. This is not really in the scope of this blog post. We use the transport client. Check references for a reference to the complete project. For our code sample we start with an available client connection.

List repositories

If you are not sure what repositories are already configured, you can use the following function. Notice that we do not need the client, but we need the ClusterAdminClient. Since all java api’s are a-synchronous we need to work with the future interface. This pattern will come back in the other samples. Create a builder, execute the command and do something with the response.

public List listRepositories() {
    GetRepositoriesRequestBuilder getRepo =
            new GetRepositoriesRequestBuilder(client.admin().cluster());
    GetRepositoriesResponse repositoryMetaDatas = getRepo.execute().actionGet();
    return repositoryMetaDatas.repositories()
            .stream()
            .map(Repository::mapFrom)
            .collect(Collectors.toList());
}

Create and remove repositories

If you do not have a repository or need to create another one, we need to create a repository. When thinking about the filesystem repository, you need to provide the name of the repository and the location where we can store the files. Remember that this location has to be available to all nodes. The code than looks as follows.

private void createSnapshotRepository() {
    Settings settings = ImmutableSettings.builder()
            .put("location", "/just/a/location/testrepo")
            .build();

    PutRepositoryRequestBuilder putRepo = 
    		new PutRepositoryRequestBuilder(client.admin().cluster());
    putRepo.setName("newrepo")
            .setType("fs")
            .setSettings(settings)
            .execute().actionGet();

}

Notice that we first create the settings object. The structure of the request is the required properties name and location together with a settings object. That way we can add the type specific configuration in the settings object. In these cases it helps looking at the rest api requirements. You can find the reference below. For this sample I also show you the request you can do in sense to create a repository.

PUT /_snapshot/newrepo
{
  "type":"fs",
  "settings": {
    "location":"/just/a/location/testrepo"
  }
}

Deleting a snapshot repository is about the same as creating one. Create a builder and execute it.

public void deleteRepository(String repositoryName) {
    DeleteRepositoryRequestBuilder builder = 
            new DeleteRepositoryRequestBuilder(client.admin().cluster());
    builder.setName(repositoryName);
    builder.execute().actionGet();
}

List snapshots

Showing all snapshots for a repository of course requires the name of the repository to obtain the snapshots from. Other than that the code looks almost the same as for the list of repositories.

public List showSnapshots(String repositoryName) {
    GetSnapshotsRequestBuilder builder = 
            new GetSnapshotsRequestBuilder(client.admin().cluster());
    builder.setRepository(repositoryName);
    GetSnapshotsResponse getSnapshotsResponse = builder.execute().actionGet();
    return getSnapshotsResponse.getSnapshots().stream()
            .map(SnapshotInfo::name)
            .collect(Collectors.toList());
}

Create snapshot and show running snapshots

The momment you have all been waiting for, creating a snapshot. This is what we are going to do the most usually. An important aspect is that the name of the snapshot must be unique. Of course you can do this yourself, in my code I use a timestamp added to a prefix for the name of the snapshot. When creating a snapshot, you can pass multiple string patterns. When you do not provide an index, all indices are added to the snapshot. Again the code itself is not really exciting.

public void createSnapshot(String repositoryName, String snapshotPrefix, String patternToSnapshot) {
    CreateSnapshotRequestBuilder builder = new CreateSnapshotRequestBuilder(client.admin().cluster());
    String snapshot = snapshotPrefix + "-" + LocalDateTime.now().format(dateTimeFormatter);
    builder.setRepository(repositoryName)
            .setIndices(patternToSnapshot)
            .setSnapshot(snapshot);
    builder.execute().actionGet();
}

When a snapshot is running, this can consume a lot of CPU. Usually this is limited. When requesting a list of available snapshots you can also see the state of the snapshot. A running snapshot has the state STARTED. If it takes a lot of time, chances are big that you do not get a fast response to the list of snapshots. That is because of the limitation of CPu for the thread that does the snapshotting. Therefore there is a lightweight version called the status api. With this endpoint you can ask the status of running snapshots. If there is no running snapshot you get back an empty array. This is what it looks like. This time I have also included the function for mapping the response to a SnapshotState object.

public List showRunningSnapshots() {
    SnapshotsStatusRequestBuilder builder =
            new SnapshotsStatusRequestBuilder(client.admin().cluster());
    SnapshotsStatusResponse snapshotsStatusResponse = builder.execute().actionGet();
    return snapshotsStatusResponse.getSnapshots().stream()
            .map(SnapshotState::from)
            .collect(Collectors.toList());
}

public static SnapshotState from(SnapshotStatus status) {
    SnapshotState state = new SnapshotState();
    state.setRepository(status.getSnapshotId().getRepository());
    state.setSnapshot(status.getSnapshotId().getSnapshot());
    state.setState(status.getState().name());
    state.setNumFilesDone(status.getStats().getProcessedFiles());
    state.setNumFilesTotal(status.getStats().getNumberOfFiles());
    return state;
}

In this case I just return all running snapshots, you can also add the repository and even the name of the snapshot to request the status for

Delete snapshots and stop running snapshot

Deleting a snapshot is straightforward. As you can see in the next code block. What is interesting is that you can also delete a snapshot that is still running. So if you made a mistake and you have a long running snapshot, using the delete functionality you can stop the snapshot. The files that have already been backedup will be removed.

public void deleteSnapshot(String repositoryName, String snapshot) {
    DeleteSnapshotRequestBuilder builder = new DeleteSnapshotRequestBuilder(client.admin().cluster());
    builder.setRepository(repositoryName).setSnapshot(snapshot);
    builder.execute().actionGet();
}

Restore snapshots

The final piece of code is about restoring a snapshot. Of course you need a name and a repository to restore a backup. There are some advanced features that I am not going to cover here. One of them is that you can restore just a part of a snapshot and you can also change the name of an index you retore. Why is this interesting? Because you cannot restore an index that is available and open. In the next code block I am solving that by closing the existing index before doing the actual restore.

public void restoreSnapshot(String repositoryName, String snapshot) {
    // Obtain the snapshot and check the indices that are in the snapshot
    GetSnapshotsRequestBuilder builder = new GetSnapshotsRequestBuilder(client.admin().cluster());
    builder.setRepository(repositoryName);
    builder.setSnapshots(snapshot);
    GetSnapshotsResponse getSnapshotsResponse = builder.execute().actionGet();

    // Check if the index exists and if so, close it before we can restore it.
    ImmutableList indices = getSnapshotsResponse.getSnapshots().get(0).indices();
    CloseIndexRequestBuilder closeIndexRequestBuilder =
            new CloseIndexRequestBuilder(client.admin().indices());
    closeIndexRequestBuilder.setIndices(indices.toArray(new String[indices.size()]));
    closeIndexRequestBuilder.execute().actionGet();

    // Now execute the actual restore action
    RestoreSnapshotRequestBuilder restoreBuilder = new RestoreSnapshotRequestBuilder(adminClient);
    restoreBuilder.setRepository(repositoryName).setSnapshot(snapshot);
    restoreBuilder.execute().actionGet();
}

That is it for the coding with java part. Next I am going to show you some screens from my plugin that is working with snapshots. This plugin uses the javascript driver, so there is no java code in my plugin.

Elasticsearch gui

In the plugin go to the tools tab and open the snapshot tool. The first screen shows the available repositories. Click on one of the repositories and the snapshots in that repository are presented.

Snapshot show repositories

Beware, if there is a running snapshot you only get back the status of the snapshots that are running. We see that later on.

Snapshot show clicked repo

Now we push the create snapshot button and we get a screen where we can enter the name of a snapshot or a prefix that gets a timestamp appended.

Create a new snapshot

In my case I started a snapshot for all available logstash indices. This is big and therefore easy to show what the status window looks like. The indices that have not yet finished are yellow.

Snapshot show status

Final thoughts

By now you should have found the pattern in the java code. It is not hard. Of course when creating queries things can become a lot more complicated. But using these methods it is not that hard to create an admin tool in your application that can help users to look at the snapshots that are available and even restore snapshots. If you do not want to include this code in your project, but you do want to create and restore snapshots you can also use my plugin.

References

About Jettro Coenradie

I am a Software Developer / Architect with a lot of hands on experience in Java, AngularJS, Elasticsearch and lots of others tools. I like to use these technologies to help customers with the business challenges. On top of that I like to gather and share knowledge related to data analytics. I have experience with importing and transforming data as well as presenting and visualising the data. Currently I am working with tools like elasticsearch, logstash and Kibana but also D3 and C3 for graphics and other presentations.


4 Comments

  • bhavesh shah

    Hi…snapshots are how different from copying ,
    so why anyone sud use snapshot for backup thn copying the data folder ?

  • Jettro Coenradie

    The advantage is that a snapshot understands what happens in the cluster. With copying files you could run into locked files. The snapshot is also incremental, you can choose what to restore, use other index name, etc. So a lot more functionality than just copying files.

    • Manirul

      do you have the code into GIt repo. I would like to see the whole project instead of code snaps

Leave a Reply

Your email address will not be published. Required fields are marked *