Tag Archive: docker



Elasticsearch instances for integration testing

In my latest project I have implemented all communication with my Elasticsearch cluster using the high level REST client. My next step was to setup and teardown an Elasticsearch instance automatically in order to facilitate proper integration testing. This article describes three different ways of doing so and discusses some of the pros and cons. Please refer to this repository for implementations of all three methods.

docker-maven-plugin

This generic Docker plugin allows you to bind the starting and stopping of Docker containers to Maven lifecycles. You specify two blocks within the plugin; configuration and executions. In the configuration block, you choose the image that you want to run (Elasticsearch 6.3.0 in this case), the ports that you want to expose, a health check and any environment variables. See the snippet below for a complete example:

<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>docker-maven-plugin</artifactId>
    <version>${version.io.fabric8.docker-maven-plugin}</version>
    <configuration>
        <imagePullPolicy>always</imagePullPolicy>
        <images>
            <image>
                <alias>docker-elasticsearch-integration-test</alias>
                <name>docker.elastic.co/elasticsearch/elasticsearch:6.3.0</name>
                <run>
                    <namingStrategy>alias</namingStrategy>
                    <ports>
                        <port>9299:9200</port>
                        <port>9399:9300</port>
                    </ports>
                    <env>
                        <cluster.name>integration-test-cluster</cluster.name>
                    </env>
                    <wait>
                        <http>
                            <url>http://localhost:9299</url>
                            <method>GET</method>
                            <status>200</status>
                        </http>
                        <time>60000</time>
                    </wait>
                </run>
            </image>
        </images>
    </configuration>
    <executions>
        <execution>
            <id>docker:start</id>
            <phase>pre-integration-test</phase>
            <goals>
                <goal>start</goal>
            </goals>
        </execution>
        <execution>
            <id>docker:stop</id>
            <phase>post-integration-test</phase>
            <goals>
                <goal>stop</goal>
            </goals>
        </execution>
    </executions>
</plugin>

You can see that I’ve bound the plugin to the pre- and post-integration-test lifecycle phases. By doing so, the Elasticsearch container will be started just before any integration tests are ran and will be stopped after the integration tests have finished. I’ve used the maven-failsafe-plugin in order to trigger the execution of tests ending with *IT.java in the integration-test lifecycle phase.

Since this is a generic Docker plugin, there is no special functionality to easily install Elasticsearch plugins that may be needed during your integration tests. You could however create your own image with the required plugins and pull that image during your integration tests.

The integration with IntelliJ is also not optimal. When running an *IT.java class, IntelliJ will not trigger the correct lifecycle phases and will attempt to run your integration test without creating the required Docker container. Before running an integration test from IntelliJ, you need to manually start the container from the “Maven projects” view by running the docker:start commando:

Maven Projects view in IntelliJ

After running, you will also need to run the docker:stop commando to kill the container that is still running. If you forget to kill the running container and want to run a mvn clean install later on it will fail, since the build will attempt to create a container on the same port – as far as I know, the plugin does not allow for random ports to be chosen.

Pros:

  • Little setup, only requires configuration of one Maven plugin

Cons:

  • No out of the box functionality to start the Elasticsearch instance on a random port
  • No out of the box functionality to install extra Elasticsearch plugins
  • Extra dependency in your build pipeline (Docker)
  • IntelliJ does not trigger the correct lifecycle phases

elasticsearch-maven-plugin

This second plugin does not require Docker and only needs some Maven configuration to get started. See the snippet below for a complete example:

<plugin>
    <groupId>com.github.alexcojocaru</groupId>
    <artifactId>elasticsearch-maven-plugin</artifactId>
    <version>${version.com.github.alexcojocaru.elasticsearch-maven-plugin}</version>
    <configuration>
        <version>${version.org.elastic}</version>
        <clusterName>integration-test-cluster</clusterName>
        <transportPort>9399</transportPort>
        <httpPort>9299</httpPort>
    </configuration>
    <executions>
        <execution>
            <id>start-elasticsearch</id>
            <phase>pre-integration-test</phase>
            <goals>
                <goal>runforked</goal>
            </goals>
        </execution>
        <execution>
            <id>stop-elasticsearch</id>
            <phase>post-integration-test</phase>
            <goals>
                <goal>stop</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Again, I’ve bound the plugin to the pre- and post-integration-test lifecycle phases in combination with the maven-failsafe-plugin.

This plugin provides a way of starting the Elasticsearch instance from IntelliJ in much the same way as the docker-maven-plugin. You can run the elasticsearch:runforked commando from the “Maven projects” view. However in my case, this started the container and then immediately exited. There is also no out of the box possibility of setting a random port for your instance. Of course, there are solutions to this at the expense of having a somewhat more complex Maven configuration.

Overall, this is a plugin that seems to provide almost everything we need with a lot of configuration options. You can automatically install Elasticsearch plugins or even bootstrap your instance with data.

In practice I did have some problems using the plugin in my build pipeline. Upon downloading the Elasticsearch zip the build would sometimes fail, or in other cases when attempting to download a plugin. Your mileage may vary, but this was reason for me to keep looking for another solution. Which brings me to plugin number three.

Pros:

  • Little setup, only requires configuration of one Maven plugin
  • No extra external dependencies
  • High amount of configuration possible

Cons:

  • No out of the box functionality to start the Elasticsearch instance on a random port
  • Poor integration with IntelliJ
  • Seems unstable

testcontainers-elasticsearch

This third plugin is different from the other two. It uses a Java testcontainer that you can configure through Java code. This gives you a lot of flexibility and requires no Maven configuration. Since there is no Maven configuration, it does require some work to make sure the Elasticsearch container is started and stopped at the correct moments.

In order to realize this, I have extended the standard SpringJUnit4ClassRunner class with my own ElasticsearchSpringRunner. In this runner, I have added a new JUnit RunListener named JUnitExecutionListener. This listener defines two methods testRunStarted and testRunFinished that enable me to start and stop the Elasticsearch container at the same points in time that the pre- and post-integration-test Maven lifecycle phases would. See the snippet below for the implementation of the listener:

public class JUnitExecutionListener extends RunListener {

    private static final Logger LOGGER = LoggerFactory.getLogger(JUnitExecutionListener.class);
    private static final String ELASTICSEARCH_IMAGE = "docker.elastic.co/elasticsearch/elasticsearch";
    private static final String ELASTICSEARCH_VERSION = "6.3.0";
    private static final String ELASTICSEARCH_HOST_PROPERTY = "nl.luminis.articles.maven.elasticsearch.host";
    private static final int ELASTICSEARCH_PORT = 9200;

    private ElasticsearchContainer container;

    @Override
    public void testRunStarted(Description description) {
        // Create a Docker Elasticsearch container when there is no existing host defined in default-test.properties.
        // Spring will use this property to configure the application when it starts.
        if (System.getProperty(ELASTICSEARCH_HOST_PROPERTY) == null) {
            LOGGER.debug("Create Elasticsearch container");
            int mappedPort = createContainer();
            System.setProperty(ELASTICSEARCH_HOST_PROPERTY, "localhost:" + mappedPort);
            String host = System.getProperty(ELASTICSEARCH_HOST_PROPERTY);
            RestAssured.basePath = "";
            RestAssured.baseURI = "http://" + host.split(":")[0];
            RestAssured.port = Integer.parseInt(host.split(":")[1]);
            LOGGER.debug("Created Elasticsearch container at {}", host);
        }
    }

    @Override
    public void testRunFinished(Result result) {
        if (container != null) {
            String host = System.getProperty(ELASTICSEARCH_HOST_PROPERTY);
            LOGGER.debug("Removing Elasticsearch container at {}", host);
            container.stop();
        }
    }

    private int createContainer() {
        container = new ElasticsearchContainer();
        container.withBaseUrl(ELASTICSEARCH_IMAGE);
        container.withVersion(ELASTICSEARCH_VERSION);
        container.withEnv("cluster.name", "integration-test-cluster");
        container.start();
        return container.getMappedPort(ELASTICSEARCH_PORT);
    }
}

It will create an Elasticsearch Docker container on a random port for use by the integration tests. The best thing about having this runner is that it works perfectly fine in IntelliJ. Simply right-click and run your *IT.java classes annotated with @RunWith(ElasticsearchSpringRunner.class) and IntelliJ will use the listener to setup the Elasticsearch container. This allows you to automate your build pipeline while still keeping developers happy.

Pros:

  • Neat integration with both Java and therefore your IDE
  • Sufficient configuration options out of the box

Cons:

  • More complex initial setup
  • Extra dependency in your build pipeline (Docker)

In summary, all three of the above plugins are able to realize the goal of starting an Elasticsearch instance for your integration testing. For me personally, I will be using the testcontainers-elasticsearch plugin going forward. The extra Docker dependency is not a problem since I use Docker in most of my build pipelines anyway. Furthermore, the integration with Java allows me to configure things in such a way that it works perfectly fine from both the command line and the IDE.

Feel free to checkout the code behind this article, play around with the integration tests that I’ve setup there and decide for yourself which plugin suits your needs best. Please note that the project has a special Maven profile that separates unittests from integration tests. Build the project using mvn clean install -P integration-test to run both.


Providing developers and testers with a representative database using Docker

When developing or testing, having a database that comes pre-filled with relevant data can be a big help with implementation or scenario walkthroughs. However, often there is only a structure dump of the production database available, or less. This article outlines the process of creating a Docker image that starts a database and automatically restores a dump containing a representative set of data. We use the PostgreSQL database in our examples, but the process outlined here can easily be applied to other databases such as MySQL or Oracle SQL.

Dumping the database

I will assume that you have access to a database that contains a copy of production or an equivalent representative set of data. You can dump your database using Postgres’ pg_dump utility:

pg_dump --dbname=postgresql://user@localhost:5432/mydatabase \
        --format=custom --file=database.dmp

We will be using the custom format option to create the dump file with. This gives us a file that can easily be restored with the pg_restore utility later on and ensures the file is compressed. In case of larger databases, you may also wish to exclude certain database elements from your dump. In order to do so you have the following options:

  • The --exclude-table option, which takes a string pattern and ensures any tables matching the pattern will not be included in the dump file.
  • The --schema option, which restricts our dump to particular schemas in the database. It may be a good idea to exclude pg_catalog – this schema contains among other things the table pg_largeobject, which contains all of your database’s binaries.

See the Postgres documentation for more available options.

Distributing the dump among users

For the distribution of the dump, we will be using Docker. PostgresMySQL and even Oracle provide you with prebuilt Docker images of their databases.

Example 1: A first attempt
In order to start an instance of a Postgres database, you can use the following docker run command to start a container based on Postgres:

docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d postgres:9.5.10-alpine

This starts a container named database-dump-container that can be reached at port 5432 with user:password as the login. Note the usage of the 9.5.10-alpine tag. This ensures that the Linux distribution that we use inside the Docker container is Alpine Linux, a distribution with a small footprint. The whole Docker image will take up about 14 MB, while the regular 9.5.10 tag would require 104 MB. We are pulling the image from Docker Hub, a public Docker registry where various open source projects host their Docker images.

Having started our Docker container, we can now copy our dump file into it. We first use docker exec to execute a command against the container we just made. In this case, we create a directory inside the Docker container:

docker exec -i database-dump-container mkdir -p /var/lib/postgresql/dumps/

Following that, we use docker cp to copy the dump file from our host into the container:

docker cp database.dmp database-dump-container:/var/lib/postgresql/dumps/

After this, we can restore our dump:

docker exec -i database-dump-container pg_restore --username=user --verbose \
                                                  --exit-on-error --format=custom \
                                                  --dbname=mydatabase /var/lib/postgresql/dumps/database.dmp

We now have a Docker container with a running Postgres instance containing our data dump. In order to actually distribute this, you will need to get it into a Docker repository. If you register with Docker Hub you can create public repositories for free. After creating your account you can login to the registry that hosts your repositories with the following command:

docker login docker.io

Enter the username and password for your Docker Hub account when prompted.

Having done this, we are able to publish our data dump container as an image, using the following commands:

docker commit database-dump-container my-repository/database-dump-image
docker push my-repository/database-dump-image

Note that you are able to push different versions of an image by using Docker image tags.

The image is now available to other developers. It can be pulled and ran on another machine using the following commands:

docker pull my-repository/database-dump-image
docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d my-repository/database-dump-image

All done! Or are we? After we run the container based on the image, we still have an empty database. How did this happen?

Example 2: Creating your own Dockerfile
It turns out that the Postgres Docker image uses Docker volumes. This separates the actual image from data and ensures that the size of the image remains reasonable. We can view what volumes Docker has made for us by using docker volume ls. These volumes can be associated with more than one Docker container and will remain, even after you have removed the container that initially spawned the volume. If you would like to remove a Docker container, including its volumes, make sure to use the -v option:

docker rm -v database-dump-container

Go ahead and execute the command, we will be recreating the container in the following steps.

So how can we use this knowledge to distribute our database including our dump? Luckily, the Postgres image provides for exactly this situation. Any scripts that are present in the Docker container under the directory /docker-entrypoint-initdb.d/ will be executed automatically upon starting a new container. This allows us to add data to the Docker volume upon starting the container. In order to make use of this functionality, we are going to have to create our own image using a Dockerfile that extends the postgres:9.5.10-alpine image we used earlier:

FROM postgres:9.5.10-alpine

RUN mkdir -p /var/lib/postgresql/dumps/
ADD database.dmp /var/lib/postgresql/dumps/
ADD intialize.sh /docker-entrypoint-initdb.d/

The contents of initialize.sh are as follows:

pg_restore --username=user --verbose --exit-on-error --format=custom \
           --dbname=mydatabase /var/lib/postgresql/dumps/database.dmp

We can build and run this Dockerfile by navigating to its directory and then executing:

docker build --rm=true -t database-dump-image .
docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d database-dump-image

After starting the container, inspect its progress using docker logs -f database-dump-container. You can see that upon starting the container, our database dump is being restored into the Postgres instance.

We can now again publish the image using the earlier steps, and the image is available for usage.

Conclusions and further reading

While working through this article, you have used a lot of important concepts within Docker. The first example demonstrated the usage of images and containers, combined with commands exec and cp that are able to interact with running containers. We then demonstrated how you can publish a Docker image using Docker Hub, after which we’ve shown you how to build and run your own custom made image. We have also touched upon some more complex topics such as Docker volumes.

After this you may wish to consult the Docker documentation to further familiarize yourself with the other commands that Docker offers.

This setup still leaves room for improvement – the current process involves quite a lot of handwork, and we’ve coupled our container with one particular database dump. Please refer to the Github project for automated examples of this process.


Elasticsearch on the web week 8

This is the first overview post around what I found on the world wide web this week related to elasticsearch. This is not a generated post, but a post describing the most interesting articles I read this week. Some of them are older than this week, but they did help me this week with something. From now on I am regularly going to write these overview posts. If you want to stay up te date feel free to follow us on twitter. In this post news about Kibana 4, ELK and Docker, update to the shield plugin and the latest and greatest version of elasticsearch.

(more…)