Categories for Continuous Delivery



Looking back on AWS Summit Benelux 2018

Last week I visited AWS Summit Benelux together with Sander. AWS Summit is all about cloud computing and the topics that surround cloud computing. This being my first AWS conference I can say it was a really nice experience. Sure there was room for improvement (no coffee or tea after the opening keynote being one), but other than that it was a very good experience. Getting inside was a breeze with all the different check-in points and after you entered you were directly on the exhibitor floor where a lot of Amazon partners showed their products.

Opening keynote

The day started with an introduction by Kamini Aisola, Head of Amazon Benelux. With this being my first AWS summit it was great to see Kamini showing some numbers about the conference: 2000 attendees and 28 technical sessions. She also showed us the growth pattern of AWS with an increasing growth of 49% compared to last year. That’s really impressive!

Who are builders?

Shortly after, Amazon.com CTO Werner Vogels started with his opening keynote. Werner showed how AWS evolved from being ‘just’ an IaaS company to now offering more than 125 different services. More than 90% of the developed services were based on customer feedback from the last couple of years. That’s probably one of the reasons why AWS is growing so rapidly and customers are adopting the AWS platform.

What I noticed throughout the entire keynote is that AWS is constantly thinking about what builders want to build (in the cloud) and what kind of tools those builders need to have to be successful. These tools come in different forms and sizes, but I noticed there is a certain pattern in how services evolve or are grown at AWS. The overall trend I noticed during the talks is that engineers or builders should have to spend less time focussing on lower level infrastructure and can start to really focus on delivering business value by leveraging the services that AWS has to offer.

During the keynote Werner ran through a couple of different focus areas for which he showed what AWS is currently offering. In this post I won’t go through all of them, because I expect you can probably watch a recording of the keynote on youtube soon, but I’ll highlight a few.

Let’s first start with the state of Machine Learning and analytics. Werner looked back at how machine learning evolved at Amazon.com and how services were developed to make machine learning more accessible for teams within the organisation. Out of this came a really nice mission statement:

AWS want’s to put machine learning in the hands of every developer and data scientist.

To achieve this mission AWS is currently offering a layered ML stack to engineers looking into to using ML on the AWS platform.

The layers go from low-level libraries to pre-build functionalities based on these lower level layers. I really liked that fact that these services are built in such a way that engineers can decide at which level of complexity they want to start using the ML services offered by AWS. Most of the time data engineers and data scientist will start from either SageMaker or even lower, but most application developers might just want to use a pre-built functionality like image recognition, text processing or speech recognition. See for instance this really awesome post on using Facial recognition by my colleague Roberto.

Another example of this layered approach was with regards to container support on AWS. A few years back Amazon added container support to their offering with Amazon Elastic Container Service (Amazon ECS). This allowed Amazon ECS helped customers run containers on AWS without having to manage all servers and manager their own container orchestration software. ECS delivered all of this. Now fast forwarding a few years Amazon is now offering Amazon EKS (managed Kubernetes on Amazon) after they noticed that about 63% of managed Kubernetes clusters ran on AWS. Kubernetes has become the current industry standard when it comes to container orchestration, so this makes a lot of sense. In addition, Amazon now also offers Amazon Fargate. With Fargate they take the next step which means that Fargate allows you as the developer to focus on running containers ‘without having to think about managing servers or clusters’.

During his keynote, Werner also mentioned the Well-Architected framework. The Well-Architect framework has been developed to help cloud architects run their applications in the cloud based on AWS best practices. When implemented correctly it allows you to fully focus on your functional requirements to deliver business value to your customers. The framework is based on the following five pillars:

  1. Operational Excellence
  2. Security
  3. Reliability
  4. Performance Efficiency
  5. Cost Optimization

I had not heard about the framework before, so during the weekend I read through some of its documentation. Some of the items are pretty straightforward, but others might give you some insights in what it means to run applications in the cloud. One aspect of the Well-Architected framework, Security, had been recurring throughout the entire keynote.

Werner emphasised a very important point during his presentation:

Security is EVERYONE’s job

With all the data breaches happening lately I think this is a really good point to make. Security should be everybody’s number one priority these days.

During the keynote, there were a couple of customers that showed how AWS had helped them achieve a certain goal. Bastiaan Terhorst, CPO at WeTransfer explained that being a cloud-scale company comes with certain problems. He explained how they moved from a brittle situation towards a more scalable solution. They could not modify the schema of their DB anymore without breaking the application, which is horrible if you reach a certain scale and customer base. They had to rearchitect the way they worked with incoming data and using historic data for reporting. I really liked the fact that he shared some hard-learned lessons about database scalability issues that can occur when you reach a certain scale.

Tim Bogaert, CTO at de Persgroep also showed how they moved from being a silo-ed organization with own datacenters and waterfall long-running projects towards all-in AWS with an agile approach and teams following the “You Build It, You Run It” mantra. It was an interesting story because I see a lot of larger enterprises still struggling with these transitions.

After the morning keynote, the breakout sessions started. There were 7 parallel tracks and all with different topics, so plenty to choose from. During the day I attended only a few, so here goes.

Improve Productivity with Continuous Integration & Delivery

This really nice talk by Clara Ligouri (software engineer for AWS Developer Tools) and Jamie van Brunschot (Cloud engineer at Coolblue) gave a good insight into all the different tools provided by AWS to support the full development and deployment lifecycle of an application.

Clara modified some code in Cloud9 (the online IDE), debugged some code, ran CI jobs, tests and deployments all from within her browser and pushed a new change to production within only a matter of minutes. It shows how far the current state of being a cloud-native developer has really come. I looked at Cloud9 years ago. Way before they were acquired by Amazon. I’ve always been a bit skeptical when it comes to using an online IDE. I remember having some good discussions with the CTO at my former company about if this would really be the next step for IDEs and software development in general. I’m just so comfortable with IntelliJ for Java development and it always works (even if I do not have any internet ;-)). I do wonder if anybody reading this is already using Cloud9 (or any other Web IDE) and is doing his / her development fully in the cloud. If you do, please leave a comment, I would love to learn from your experiences. The other tools like CodePipeline and CodeDeploy definitely looked interesting, so I need to find some time to play around with them.

GDPR

Next up was a talk on GDPR. The room was quite packed. I didn’t expect that though, because everybody should be GDPR compliant by now right? ūüôā Well not really. Companies are still implementing changes to be compliant with GDPR. The talk by Christian Hesse looked at different aspects of GDPR like:

  • The right to data portability
  • The right to be forgotten
  • Privacy by design
  • Data breach notification

He also talked about the shared responsibility model when it comes to being GDPR compliant. AWS as the processor of personal data and the company using AWS being the controller are both responsible for making sure data stays safe. GDPR is a hot topic and I guess it will stay so for the rest of the year at least. It’s something that we as engineers will always need to keep in the back of our minds while developing new applications or features.

Serverless

In the afternoon I also attended a talk on Serverless by Prakash Palanisamy (Solutions Architect, Amazon Web Services) and Joachim den Hertog (Solutions Architect, ReSnap / Albelli). This presentation gave a nice overview of Serverless and Step functions, but also showed new improvements like the Serverless Application Repository, save Serverless deployments and incremental deployments. Joachim gave some insights into how Albelli was using Serverless and Machine Learning on the AWS platform for their online photo book creator application called ReSnap.

Unfortunately I had to leave early, so I missed the end of the Serverless talk and the last breakout session, but all in all AWS Summit Benelux was a very nice experience with some interesting customer cases and architectures. For a ‘free’ event it was amazingly organized, I learned some new things and had a chance to speak with some people about how they used AWS. It has triggered me to spend some more time with AWS and its services. Let’s see what interesting things I can do on the next Luminis TechDay.

Build On!


Tracing API’s: Combining Spring’s Sleuth, Zipkin & ELK

Tracing bugs, errors and the cause of lagging performance can be cumbersome, especially when functionality is distributed over multiple microservices.
In order to keep track of these issues, the usage of an ELK stack (or any similar sytem) is already a big step forward in creating a clear overview of all the processes of a service and finding these issues.
Often bugs can be traced by using ELK far more easily than just using a plain log file – if even available.
Optimization in this approach can be preferred, as for example you may want to see the trace logging for one specific event only. (more…)


Providing developers and testers with a representative database using Docker

When developing or testing, having a database that comes pre-filled with relevant data can be a big help with implementation or scenario walkthroughs. However, often there is only a structure dump of the production database available, or less. This article outlines the process of creating a Docker image that starts a database and automatically restores a dump containing a representative set of data. We use the PostgreSQL database in our examples, but the process outlined here can easily be applied to other databases such as MySQL or Oracle SQL.

Dumping the database

I will assume that you have access to a database that contains a copy of production or an equivalent representative set of data. You can dump your database using Postgres’ pg_dump utility:

pg_dump --dbname=postgresql://user@localhost:5432/mydatabase \
        --format=custom --file=database.dmp

We will be using the custom format option to create the dump file with. This gives us a file that can easily be restored with the pg_restore utility later on and ensures the file is compressed. In case of larger databases, you may also wish to exclude certain database elements from your dump. In order to do so you have the following options:

  • The¬†--exclude-table¬†option, which takes a string pattern and ensures any tables matching the pattern will not be included in the dump file.
  • The¬†--schema¬†option, which restricts our dump to particular schemas in the database. It may be a good idea to exclude¬†pg_catalog¬†– this schema contains among other things the table¬†pg_largeobject, which contains all of your database’s binaries.

See the Postgres documentation for more available options.

Distributing the dump among users

For the distribution of the dump, we will be using Docker. Postgres, MySQL and even Oracle provide you with prebuilt Docker images of their databases.

Example 1: A first attempt
In order to start an instance of a Postgres database, you can use the following docker run command to start a container based on Postgres:

docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d postgres:9.5.10-alpine

This starts a container named database-dump-container that can be reached at port 5432 with user:password as the login. Note the usage of the 9.5.10-alpine tag. This ensures that the Linux distribution that we use inside the Docker container is Alpine Linux, a distribution with a small footprint. The whole Docker image will take up about 14 MB, while the regular 9.5.10 tag would require 104 MB. We are pulling the image from Docker Hub, a public Docker registry where various open source projects host their Docker images.

Having started our Docker container, we can now copy our dump file into it. We first use docker exec to execute a command against the container we just made. In this case, we create a directory inside the Docker container:

docker exec -i database-dump-container mkdir -p /var/lib/postgresql/dumps/

Following that, we use docker cp to copy the dump file from our host into the container:

docker cp database.dmp database-dump-container:/var/lib/postgresql/dumps/

After this, we can restore our dump:

docker exec -i database-dump-container pg_restore --username=user --verbose \
                                                  --exit-on-error --format=custom \
                                                  --dbname=mydatabase /var/lib/postgresql/dumps/database.dmp

We now have a Docker container with a running Postgres instance containing our data dump. In order to actually distribute this, you will need to get it into a Docker repository. If you register with Docker Hub you can create public repositories for free. After creating your account you can login to the registry that hosts your repositories with the following command:

docker login docker.io

Enter the username and password for your Docker Hub account when prompted.

Having done this, we are able to publish our data dump container as an image, using the following commands:

docker commit database-dump-container my-repository/database-dump-image
docker push my-repository/database-dump-image

Note that you are able to push different versions of an image by using Docker image tags.

The image is now available to other developers. It can be pulled and ran on another machine using the following commands:

docker pull my-repository/database-dump-image
docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d my-repository/database-dump-image

All done! Or are we? After we run the container based on the image, we still have an empty database. How did this happen?

Example 2: Creating your own Dockerfile
It turns out that the Postgres Docker image uses Docker volumes. This separates the actual image from data and ensures that the size of the image remains reasonable. We can view what volumes Docker has made for us by using docker volume ls. These volumes can be associated with more than one Docker container and will remain, even after you have removed the container that initially spawned the volume. If you would like to remove a Docker container, including its volumes, make sure to use the -v option:

docker rm -v database-dump-container

Go ahead and execute the command, we will be recreating the container in the following steps.

So how can we use this knowledge to distribute our database including our dump? Luckily, the Postgres image provides for exactly this situation. Any scripts that are present in the Docker container under the directory /docker-entrypoint-initdb.d/ will be executed automatically upon starting a new container. This allows us to add data to the Docker volume upon starting the container. In order to make use of this functionality, we are going to have to create our own image using a Dockerfile that extends the postgres:9.5.10-alpine image we used earlier:

FROM postgres:9.5.10-alpine

RUN mkdir -p /var/lib/postgresql/dumps/
ADD database.dmp /var/lib/postgresql/dumps/
ADD intialize.sh /docker-entrypoint-initdb.d/

The contents of initialize.sh are as follows:

pg_restore --username=user --verbose --exit-on-error --format=custom \
           --dbname=mydatabase /var/lib/postgresql/dumps/database.dmp

We can build and run this Dockerfile by navigating to its directory and then executing:

docker build --rm=true -t database-dump-image .
docker run -p 5432:5432 --name database-dump-container \
           -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
           -e POSTGRES_DB=mydatabase -d database-dump-image

After starting the container, inspect its progress using docker logs -f database-dump-container. You can see that upon starting the container, our database dump is being restored into the Postgres instance.

We can now again publish the image using the earlier steps, and the image is available for usage.

Conclusions and further reading

While working through this article, you have used a lot of important concepts within Docker. The first example demonstrated the usage of images and containers, combined with commands¬†exec¬†and¬†cp¬†that are able to interact with running containers. We then demonstrated how you can publish a Docker image using Docker Hub, after which we’ve shown you how to build and run your own custom made image. We have also touched upon some more complex topics such as Docker volumes.

After this you may wish to consult the Docker documentation to further familiarize yourself with the other commands that Docker offers.

This setup still leaves room for improvement – the current process involves quite a lot of handwork, and we’ve coupled our container with one particular database dump. Please refer to the¬†Github project¬†for automated examples of this process.