Categories for Architecture



Monitoring Spring Boot applications with Prometheus and Grafana

At my current project, we’ve been building three different applications. All three applications are based on Spring Boot, but have very different workloads. They’ve all reached their way to the production environment and have been running steadily for quite some time now. We do regular (weekly basis) deployments of our applications to production with bug fixes, new features, and technical improvements. The organisation has a traditional infrastructure workflow in the sense that deployments to the VM instances on acceptance and production happen via the (remote) hosting provider.

The hosting provider is responsible for the uptime of the applications and therefore they keep an eye on system metrics through the usage of their own monitoring system. As a team, we are able to look in the system, but it doesn’t say much about the internals of our application. In the past, we’ve asked to add some additional metrics to their system, but the system isn’t that easy to configure with additional metrics. To us as a team runtime statistics about our applications and the impact our changes have on the overall health are crucial to understanding the impact of our work. The rest of this post will give a short description of our journey and the reasons why we chose the resulting setup.

Spring Boot Actuator and Micrometer

If you’ve used Spring Boot before you’ve probably heard of Spring Boot Actuator. Actuator is a set of features that help you monitor and manage your application when it moves away from your local development environment and onto a test, staging or production environment. It helps expose operational information about the running application – health, metrics, audit entries, scheduled task, env settings, etc. You can query the information via either several HTTP endpoints or JMX beans. Being able to view the information is useful, but it’s hard to spot trends or see the behaviour over a period of time.

When we recently upgraded our projects to Spring Boot 2 my team was pretty excited that we were able to start using micrometer a (new) instrumentation library powering the delivery of application metrics. Micrometer is now the default metrics library in Spring Boot 2 and it doesn’t just give you metrics from your Spring application, but can also deliver JVM metrics (garbage collection and memory pools, etc) and also metrics from the application container. Micrometer has several different libraries that can be included to ship metrics to different backends and has support for Prometheus, Netflix Atlas, CloudWatch, Datadog, Graphite, Ganglia, JMX, Influx/Telegraf, New Relic, StatsD, SignalFx, and Wavefront.

Because we didn’t have a lot of control over the way our applications were deployed we looked at the several different backends supported by micrometer. Most of the above backends work by pushing data out to a remote (cloud) service. Since the organisation we work for doesn’t allow us to push this ‘sensitive’ data to a remote party we looked at self-hosted solutions. We did a quick scan and started with looking into Prometheus (and Grafana) and soon learned that it was really easy to get a monitoring system up and we had a running system within an hour.

To be able to use Spring Boot Actuator and Prometheus together you need to add two dependencies to your project:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Actuator has an endpoint available for prometheus to scrape but it’s not exposed by default, so you will need to enable the endpoint by means of configuration. In this case, I’ll do so via the application.properties.

management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus,info,health

Now if you browse to http(s)://host(:8080)/actuator/prometheus you will see the output that prometheus will scrape to get the information from your application. A small snippet of the information provided by the endpoint is shown below, but there is a lot more information that the prometheus endpoint will expose.

# HELP tomcat_global_sent_bytes_total  
# TYPE tomcat_global_sent_bytes_total counter
tomcat_global_sent_bytes_total{name="http-nio-8080",} 75776.0
tomcat_global_sent_bytes_total{name="http-nio-8443",} 1.0182049E8
# HELP tomcat_servlet_request_max_seconds  
# TYPE tomcat_servlet_request_max_seconds gauge
tomcat_servlet_request_max_seconds{name="default",} 0.0
tomcat_servlet_request_max_seconds{name="jsp",} 0.0
# HELP process_files_open The open file descriptor count
# TYPE process_files_open gauge
process_files_open 91.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.00427715996578272
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{area="heap",id="PS Eden Space",} 1.77733632E8
jvm_memory_max_bytes{area="heap",id="PS Survivor Space",} 524288.0
jvm_memory_max_bytes{area="heap",id="PS Old Gen",} 3.58088704E8

Now that everything is configured from the application perspective, let’s move on to Prometheus itself.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud and now part of the Cloud Native Computing Foundation. To get a better understanding of what prometheus really is let us take a look at an architectural diagram.

(Source: https://prometheus.io/docs/introduction/overview/)

The prometheus server contains of a set of 3 features:

  • A time series database
  • A retrieval component which scrapes its targets for information
  • An HTTP server which you can use to query information stored inside the time series database

To make it even more powerful there are some additional components which you can use if you want:

  • An alert manager, which you can use to send alerts via Pagerduty, Slack, etc.
  • A push gateway in case you need to push information to prometheus instead of using the default pull mechanism
  • Grafana for visualizing data and creating dashboards

When looking at Prometheus the most appealing features for us were:

  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

To get up and running quickly you can configure prometheus to scrape some (existing) Spring Boot applications. For scraping targets, you will need to specify them within the prometheus configuration. Prometheus uses a file called prometheus.yml as its main configuration file. Within the configuration file, you can specify where it can find the targets it needs to monitor, specify recording rules and alerting rules.

The following example shows a configuration with a set of static targets for both prometheus itself and our spring boot application.

global:
  scrape_interval:   15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'bootifull-monitoring'

scrape_configs:
- job_name:       'monitoring-demo'

  # Override the global default and scrape targets from this job every 10 seconds.
  scrape_interval: 10s
  metrics_path: '/actuator/prometheus'

  static_configs:
  - targets: ['monitoring-demo:8080']
    labels:
      application: 'monitoring-demo'

- job_name: 'prometheus'

  scrape_interval: 5s

  static_configs:
  - targets: ['localhost:9090']

As you can see the configuration is pretty simple. You can add specific labels to the targets which can, later on, be used for querying, filtering and creating a dashboard based upon the information stored within prometheus.

If you want to get started quickly with Prometheus and have docker on your environment you can use the official docker prometheus image by running the following command and provide a custom configuration from your host machine by running:

docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
       prom/prometheus:v2.4.3

In the above example we bind-mount the main prometheus configuration file from the host system, so you can, for instance, use the above configuration. Prometheus itself has some basic graphing capabilities (as you can see in the following image), but they are more meant to be used when doing some ad-hoc queries.

For creating an application monitoring dashboard Grafana is much more suited.

Grafana

So what is Grafana and what role does it play in our monitoring stack?

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

The cool thing about Grafana is (next to the beautiful UI) that it’s not tied to Prometheus as its single data source like for instance Kibana is tied to Elasticsearch. Grafana can have many different data sources like AWS Cloudwatch, Elasticsearch, InfluxDB, Prometheus, etc. This makes it a very good option for creating a monitoring dashboard. Grafana talks to prometheus by using the PromQL query language.

For Grafana there is also an official Docker image available for you to use. You can get Grafana up and running with a simple command.

docker run -p 3000:3000 grafana/grafana:5.2.4

Now if we connect Grafana with Prometheus as the datasource and install this excellent JVM Micrometer dashboard into Grafana we can instantly start monitoring our Spring Boot application. You will end up with a pretty mature dashboard that lets you switch between different instances of your application.

If you want to start everything all at once you can easily use docker-compose.

version: "3"
services:
  app:
    image: monitoring-demo:latest
    container_name: 'monitoring-demo'
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
    - '8080:8080'
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./monitoring/prometheus/:/etc/prometheus/
    ports:
    - '9090:9090'
  grafana:
    image: grafana/grafana:5.2.4
    container_name: 'grafana'
    ports:
    - '3000:3000'
    volumes:
    - ./monitoring/grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
    - ./monitoring/grafana/config.monitoring
    depends_on:
    - prometheus

I’ve put together a small demo project, containing a simple Spring Boot application and the above prometheus configuration, in a github repository for demo and experimentation purposes. Now if you want to generate some statistics run a small load test with JMeter or Apache Bench. Feel free to use/fork it!


Documenting Hypermedia REST APIs with Spring REST Docs

Last year, at the end of summer, the project I was working on required a public REST API. During the requirements gathering phase we discussed the ‘level’ of our future REST API. In case you’re unfamiliar with Leonard Richardson’s REST maturity model I would highly recommend reading this article written by Martin Fowler about the model.

In my opinion a public API requires really good documentation. The documentation helps to explain how to use the API, what the resource represents (explain your domain model) and can help to increase adoption of the API. If I have to consume an API myself I’m always relieved if there is some well written API documentation available.

After the design phase we chose to build a Level 3 REST API. Documenting a level 3 REST api is not that easy. We looked at Swagger / OpenAPI, but in the 2.0 version of the spec, which was available at the time, it was not possible to design and or document link relations, which are part of the third level. After some research we learned there was a Spring project called Spring REST Docs, which allowed you to document any type of API. It works by writing tests for your API endpoints and acts as a proxy which captures the requests and responses and turns them into documentation. It does not only look at the request and response cycle, but actually inspects and validates if you’ve documented certain request or response fields. If you haven’t specified and documented them, your actual test will fail. This is really neat feature! It makes sure that your documentation is always in sync with your API.

Using Spring REST Docs is pretty straight-forward. You can start by just adding a dependency to your Maven or Gradle based project.

<dependency>
  <groupId>org.springframework.restdocs</groupId>
  <artifactId>spring-restdocs-mockmvc</artifactId>
  <version>${spring.restdoc.version}</version>
  <scope>test</scope>
</dependency>

Now when you use for instance Spring MockMVC you can test an API resource by having the following code:

@Test 
public void testGetAllPlanets() throws Exception { 
    mockMvc.perform(get("/planets").accept(MediaType.APPLICATION_JSON)) 
    .andExpect(status().isOk())
    .andExpect(jsonPath("$.length()",is(2))); 
} 

All the test does is performing a GET request on the /planets resource. Now to document this API resource all you need to do is add the document() call with an identifier, which will result in documentation for the /planets resource.

@Test
public void testGetAllPlanets() throws Exception {
    mockMvc.perform(get("/planets").accept(MediaType.APPLICATION_JSON))
        .andExpect(status().isOk())
        .andExpect(jsonPath("$.length()",is(2)))
        .andDo(document("planet-list"));
}

Now when you run this test, Spring REST Docs will generate several AsciiDoc snippets for this API resource.

Let’s inspect one of these asciidoc snippets.

[source,bash]
----
$ curl 'https://api.mydomain.com/v1/planets' -i -X GET \
    -H 'Accept: application/hal+json'
----

Looks pretty neat right? It generates a nice example of how to perform a request against the API by using curl. It will show what headers are required or in case you want to send a payload how to pass the payload along with the request.

Documenting how to perform an API call is nice, but it gets even better when we start documenting fields. By documenting fields in the request or response we will immediately start validating the documentation for missing fields or parameters. For documenting fields in the JSON response body we can use the responseFields snippet instruction.

@Test
public void testGetPerson() throws Exception {
  mockMvc.perform(get("/people/{id}", personFixture.getId())
         .accept(MediaTypes.HAL_JSON_VALUE))
         .andExpect(status().isOk())
         .andDo(document("people-get-example",
                pathParameters(
                    parameterWithName("id").description("Person's id")
                ),
                links(halLinks(),
                      linkWithRel("self").ignored()
                ),
                responseFields(
                        fieldWithPath("id").description("Person's id"),
                        fieldWithPath("name").description("Person's name"),
                        subsectionWithPath("_links").ignored()
                 ))
          );
    }

In the above example we have documented 2 fields: id and name. We can add a description, but also a type, specify if they are optional or we can even ignore specific sections like I did in the above example. Ignoring a section is possible in case you want to document them once since they will be available across multiple resources. Now if you are very strict with writing JavaDoc you might also want to consider using Spring Auto REST Docs. Spring Auto REST Docs uses introspection of you Java classes and POJOs to generate the field descriptions for you. It’s pretty neat, but I found some corner cases for when you use a hypermedia API. You can’t really create specific documentation for Link objects. The documentation comes from the Spring Javadocs itself, so we chose to leave auto rest docs out.

Having a bunch of asciidoc snippets is nice, but it’s better to have some human readable format like HTML. This is where the maven asciidoctor plugin comes in. It has the ability to process the asciidoc files and turn it into a publishable format like HTML or PDF. To get the HTML output (also known as backend) all you need to do is add the maven plugin with the correct configuration.

<build>
  <plugins>
    ....
    <plugin> 
      <groupId>org.asciidoctor</groupId>
      <artifactId>asciidoctor-maven-plugin</artifactId>
      <version>1.5.3</version>
      <executions>
        <execution>
          <id>generate-docs</id>
          <phase>prepare-package</phase> 
          <goals>
            <goal>process-asciidoc</goal>
          </goals>
          <configuration>
            <backend>html</backend>
            <doctype>book</doctype>
          </configuration>
        </execution>
      </executions>
      <dependencies>
        <dependency> 
          <groupId>org.springframework.restdocs</groupId>
          <artifactId>spring-restdocs-asciidoctor</artifactId>
          <version>2.0.1.RELEASE</version>
        </dependency>
      </dependencies>
    </plugin>
  </plugins>

Now to turn all the different asciidoc snippets into once single documentation page you can create an index.adoc file that aggregates the generated AsciiDoc snippets into a single file. Let’s take a look at an example:

= DevCon REST TDD Demo
Jeroen Reijn;
:doctype: book
:icons: font
:source-highlighter: highlightjs
:toc: left
:toclevels: 4
:sectlinks:
:operation-curl-request-title: Example request
:operation-http-response-title: Example response

[[resources-planets]]
== Planets

The Planets resources is used to create and list planets

[[resources-planets-list]]
=== Listing planets

A `GET` request will list all of the service's planets.

operation::planets-list-example[snippets='response-fields,curl-request,http-response']

[[resources-planets-create]]
=== Creating a planet

A `POST` request is used to create a planet.

operation::planets-create-example[snippets='request-fields,curl-request,http-response']

The above asciidoc snippet shows you how to write documentation in asciidoc and how to include certain operations and even how you can selectively pick certain snippets which you want to include. You can see the result in the Github pages version.

The advantage of splitting the generation from the actual HTML production has several benefits. One that I found appealing myself is that by documenting the API in two steps (code and documentation) you can have multiple people working on writing the documentation. At my previous company we had a dedicated technical writer that wrote the documentation for our product. An API is also a product so you can have engineers create the API, tests the API and document the resources by generate the documentation snippets and the technical writer can then do their own tick when it comes to writing good readable/consumable content. Writing documentation is a trade by itself and I have always liked the mailchimp content style guide for some clear guidelines on writing technical documentation.

Now if we take a look at the overall process we will see it integrates nicely into our CI / CD pipeline. All documentation is version control managed and part of the same release cycle of the API itself.

If you want to take look at a working example you can check out my DevCon REST TDD demo repository on github or see me use Spring Rest Docs to live code and document an API during my talk at DevCon.


Looking back on AWS Summit Benelux 2018

Last week I visited AWS Summit Benelux together with Sander. AWS Summit is all about cloud computing and the topics that surround cloud computing. This being my first AWS conference I can say it was a really nice experience. Sure there was room for improvement (no coffee or tea after the opening keynote being one), but other than that it was a very good experience. Getting inside was a breeze with all the different check-in points and after you entered you were directly on the exhibitor floor where a lot of Amazon partners showed their products.

Opening keynote

The day started with an introduction by Kamini Aisola, Head of Amazon Benelux. With this being my first AWS summit it was great to see Kamini showing some numbers about the conference: 2000 attendees and 28 technical sessions. She also showed us the growth pattern of AWS with an increasing growth of 49% compared to last year. That’s really impressive!

Who are builders?

Shortly after, Amazon.com CTO Werner Vogels started with his opening keynote. Werner showed how AWS evolved from being ‘just’ an IaaS company to now offering more than 125 different services. More than 90% of the developed services were based on customer feedback from the last couple of years. That’s probably one of the reasons why AWS is growing so rapidly and customers are adopting the AWS platform.

What I noticed throughout the entire keynote is that AWS is constantly thinking about what builders want to build (in the cloud) and what kind of tools those builders need to have to be successful. These tools come in different forms and sizes, but I noticed there is a certain pattern in how services evolve or are grown at AWS. The overall trend I noticed during the talks is that engineers or builders should have to spend less time focussing on lower level infrastructure and can start to really focus on delivering business value by leveraging the services that AWS has to offer.

During the keynote Werner ran through a couple of different focus areas for which he showed what AWS is currently offering. In this post I won’t go through all of them, because I expect you can probably watch a recording of the keynote on youtube soon, but I’ll highlight a few.

Let’s first start with the state of Machine Learning and analytics. Werner looked back at how machine learning evolved at Amazon.com and how services were developed to make machine learning more accessible for teams within the organisation. Out of this came a really nice mission statement:

AWS want’s to put machine learning in the hands of every developer and data scientist.

To achieve this mission AWS is currently offering a layered ML stack to engineers looking into to using ML on the AWS platform.

The layers go from low-level libraries to pre-build functionalities based on these lower level layers. I really liked that fact that these services are built in such a way that engineers can decide at which level of complexity they want to start using the ML services offered by AWS. Most of the time data engineers and data scientist will start from either SageMaker or even lower, but most application developers might just want to use a pre-built functionality like image recognition, text processing or speech recognition. See for instance this really awesome post on using Facial recognition by my colleague Roberto.

Another example of this layered approach was with regards to container support on AWS. A few years back Amazon added container support to their offering with Amazon Elastic Container Service (Amazon ECS). This allowed Amazon ECS helped customers run containers on AWS without having to manage all servers and manager their own container orchestration software. ECS delivered all of this. Now fast forwarding a few years Amazon is now offering Amazon EKS (managed Kubernetes on Amazon) after they noticed that about 63% of managed Kubernetes clusters ran on AWS. Kubernetes has become the current industry standard when it comes to container orchestration, so this makes a lot of sense. In addition, Amazon now also offers Amazon Fargate. With Fargate they take the next step which means that Fargate allows you as the developer to focus on running containers ‘without having to think about managing servers or clusters’.

During his keynote, Werner also mentioned the Well-Architected framework. The Well-Architect framework has been developed to help cloud architects run their applications in the cloud based on AWS best practices. When implemented correctly it allows you to fully focus on your functional requirements to deliver business value to your customers. The framework is based on the following five pillars:

  1. Operational Excellence
  2. Security
  3. Reliability
  4. Performance Efficiency
  5. Cost Optimization

I had not heard about the framework before, so during the weekend I read through some of its documentation. Some of the items are pretty straightforward, but others might give you some insights in what it means to run applications in the cloud. One aspect of the Well-Architected framework, Security, had been recurring throughout the entire keynote.

Werner emphasised a very important point during his presentation:

Security is EVERYONE’s job

With all the data breaches happening lately I think this is a really good point to make. Security should be everybody’s number one priority these days.

During the keynote, there were a couple of customers that showed how AWS had helped them achieve a certain goal. Bastiaan Terhorst, CPO at WeTransfer explained that being a cloud-scale company comes with certain problems. He explained how they moved from a brittle situation towards a more scalable solution. They could not modify the schema of their DB anymore without breaking the application, which is horrible if you reach a certain scale and customer base. They had to rearchitect the way they worked with incoming data and using historic data for reporting. I really liked the fact that he shared some hard-learned lessons about database scalability issues that can occur when you reach a certain scale.

Tim Bogaert, CTO at de Persgroep also showed how they moved from being a silo-ed organization with own datacenters and waterfall long-running projects towards all-in AWS with an agile approach and teams following the “You Build It, You Run It” mantra. It was an interesting story because I see a lot of larger enterprises still struggling with these transitions.

After the morning keynote, the breakout sessions started. There were 7 parallel tracks and all with different topics, so plenty to choose from. During the day I attended only a few, so here goes.

Improve Productivity with Continuous Integration & Delivery

This really nice talk by Clara Ligouri (software engineer for AWS Developer Tools) and Jamie van Brunschot (Cloud engineer at Coolblue) gave a good insight into all the different tools provided by AWS to support the full development and deployment lifecycle of an application.

Clara modified some code in Cloud9 (the online IDE), debugged some code, ran CI jobs, tests and deployments all from within her browser and pushed a new change to production within only a matter of minutes. It shows how far the current state of being a cloud-native developer has really come. I looked at Cloud9 years ago. Way before they were acquired by Amazon. I’ve always been a bit skeptical when it comes to using an online IDE. I remember having some good discussions with the CTO at my former company about if this would really be the next step for IDEs and software development in general. I’m just so comfortable with IntelliJ for Java development and it always works (even if I do not have any internet ;-)). I do wonder if anybody reading this is already using Cloud9 (or any other Web IDE) and is doing his / her development fully in the cloud. If you do, please leave a comment, I would love to learn from your experiences. The other tools like CodePipeline and CodeDeploy definitely looked interesting, so I need to find some time to play around with them.

GDPR

Next up was a talk on GDPR. The room was quite packed. I didn’t expect that though, because everybody should be GDPR compliant by now right? 🙂 Well not really. Companies are still implementing changes to be compliant with GDPR. The talk by Christian Hesse looked at different aspects of GDPR like:

  • The right to data portability
  • The right to be forgotten
  • Privacy by design
  • Data breach notification

He also talked about the shared responsibility model when it comes to being GDPR compliant. AWS as the processor of personal data and the company using AWS being the controller are both responsible for making sure data stays safe. GDPR is a hot topic and I guess it will stay so for the rest of the year at least. It’s something that we as engineers will always need to keep in the back of our minds while developing new applications or features.

Serverless

In the afternoon I also attended a talk on Serverless by Prakash Palanisamy (Solutions Architect, Amazon Web Services) and Joachim den Hertog (Solutions Architect, ReSnap / Albelli). This presentation gave a nice overview of Serverless and Step functions, but also showed new improvements like the Serverless Application Repository, save Serverless deployments and incremental deployments. Joachim gave some insights into how Albelli was using Serverless and Machine Learning on the AWS platform for their online photo book creator application called ReSnap.

Unfortunately I had to leave early, so I missed the end of the Serverless talk and the last breakout session, but all in all AWS Summit Benelux was a very nice experience with some interesting customer cases and architectures. For a ‘free’ event it was amazingly organized, I learned some new things and had a chance to speak with some people about how they used AWS. It has triggered me to spend some more time with AWS and its services. Let’s see what interesting things I can do on the next Luminis TechDay.

Build On!


Apigee

Logo of ApigeeAt one of my former projects at a client of Luminis I got the opportunity to work with Apigee. Apigee is a platform for APIs. In this blog I’ll describe in an overview its features.

In Apigee APIs can be easily created with shared modules, called policies, and flows. Because of this it is easy to arrange authentication and authorization, but also other things like protection against traffic spikes. Apigee is still a work-in-progress, so not all specifications are well documented and the version control system can still be improved a lot. That’s why we stored the versions in a git repository on Stash and deployed it with Jenkins to Apigee. On the other side, the versions of the APIs themselves can be disclosed through a segment in the path.

A great advantage to base the architecture on this platform is uniformity in APIs and their maintainability. Analyses and statistics belong to the standard functionality. The consumers and the API (Proxies) can be connected, so filtering is possible on basis of both.

There are a lot of standard policies with functionality available to compose a flow for an API. These consist of traffic management, security, mediation and extensions. The mediation policies are for conversion, extraction, and changing the request/response message. The extensions can be used for enclosing Java, JavaScript, Python and also to do call-outs to other APIs inside or outside Apigee, statistics and logging. Because of this the work for simple APIs and APIs with underlying applications can be limited to configuration in Apigee. On the other side it is also possible to put in some more code for fine-tuning or advanced capabilities. This can be done neatly in a management console, but can also be done directly in XML-files.

By working with API Proxy’s in Apigee, several security measurements can be taken for all passing traffic, like authentication, authorization, validations to prevent code injections, and prevent spikes in traffic. It’s also possible to share variables within flows and environments. Besides that, there are also caching possibilities. By caching key-value-pairs and responses a higher performance can be achieved.

Adding this layer and disclosing data like this offers many advantages, like re-usability of APIs and a central place to locate them. This is ideal for omnichannel, where companies want to disclose data through all channels, like websites, social media, apps, etc. Not only did we have front-end components on the website as consumers of our APIs, but also several apps. What can be done exactly in this layer, can vary from a simple proxy to complete applications. The proxy can be made for analyses and statistics, authentication, authorization, (first) validations, security measurements, and/or caching. This can also vary per API or API Proxy. In the context of microservices it can have advantages in terms of security and governance to disclose them through this platform. All in all, Apigee is a nice platform if you want to work with APIs.

Side note: last year Apigee was acquired by Google.


A beginners guide to the SMACK stack – Part 2: Mesos

In the first part of this series, we’ve briefly looked at the technologies that make up the SMACK stack. In this post, we’ll take a closer look at one of the fundamental layers of the stack: Apache Mesos.

What is Apache Mesos?

Apache Mesos is an open source cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. Mesos manages a cluster of machines (virtual or physical) and can also be seen as a distributed systems kernel, because the Mesos kernel runs on every machine in the cluster and provides applications (e.g., Hadoop, Spark, Kafka, etc) with API’s for resource management and scheduling across an entire data center or cloud environment. One of the ideas behind Mesos is to deploy multiple distributed systems within the same shared pool of cluster nodes in order to increase resource utilization. Let’s get to know Mesos a little better by looking at some of the core Mesos concepts.

Architecture

By examining the Mesos architecture we will get to know the most important concepts to understand when dealing with Mesos.

 

mesos-architecture-2

 

As you can see in the above figure, a Mesos architecture consists of a master daemon that manages agent daemons running on nodes in the cluster. Apache Mesos also uses Apache ZooKeeper to operate. ZooKeeper acts as the master election service in the Mesos architecture and stores state for the Mesos nodes.

Frameworks

Next to the masters and agents, Mesos has the concept of frameworks. Frameworks within Mesos are responsible for running tasks on the Mesos agent nodes.

A Mesos framework consists of two major components:

  1. a scheduler that registers with the Mesos master to be offered resources
  2. an executor process that is launched on slave nodes to run the framework’s tasks

Mesos agents will notify the Mesos master about their available resources. Based on that information, the Mesos master determines how many resources to offer to each framework. After the Mesos master has decided which resources to offer, the scheduler then selects which of the offered resources to use. When a framework accepts offered resources, it passes Mesos a description of the tasks it wants to launch.

Now that we went over the concept of frameworks, schedulers, and executors, it’s interesting to point out that most of the components that make up the SMACK stack are available as a Mesos framework. Spark, Kafka, and Cassandra are all available as a framework for Mesos. Mesos has support for a lot of different frameworks, you can find a more extensive list of available frameworks on the Mesos frameworks documentation page.

Jobs

Almost all data processing platforms will have the need for two different kinds of jobs:

  1. scheduled/periodic jobs – you can think of periodic batch aggregations or reporting jobs
  2. long-running jobs – stream processing or long running application jobs

For the first kind of jobs, you can use the Chronos framework. Chronos is a distributed and fault-tolerant scheduler that runs on top of Apache Mesos and can be used for job orchestration. If you’re familiar with Linux you can compare it to a distributed version of cron. However, compared to regular cron, Chronos has a number of advantages. For instance, it allows you to schedule your jobs using ISO8601 repeating interval notation, which enables more flexibility in job scheduling. Next to that Chronos also supports arbitrarily long dependency chains. Jobs can be triggered by the completion of other jobs. This can very useful at times.

For the long-running jobs, Mesos has the Marathon framework. Marathon is a fault tolerant and distributed ‘init’ system and can be used to deploy and run applications across a Mesos cluster. Marathon has many features that simplify running applications in a clustered environment, such as high-availability, node constraints, application health checks, an API for scriptability and service discovery. When it comes to application deployment Mesos has built in support for Blue-Green deployments and it also adds scaling and self-healing to the already big feature sets. In case a machine in the cluster dies, Marathon will make sure that the application is automatically spawned elsewhere in the cluster to make sure the application is always on and meets the preconfigured amount of instances.

Marathon can be used to run any kind of executable process and is also often used as a PaaS solutions for running containers. Other Mesos frameworks can also be launched from Marathon. In combination with the self-healing ability, Mesos and Marathon make a very robust platform for running any kind of application.

Getting some hands-on experience

You can build Mesos from source, but probably the easiest way to getting some hands-on experience with Mesos in the context of the SMACK stack is by installing DC/OS. DC/OS stands for the open source DataCenter Operating System developed by Mesosphere, the company which is also actively working on Apache Mesos, and DC/OS is built on top of Apache Mesos. When experimenting with new technologies I always like to use Vagrant or Docker. Luckily Mesosphere has released a DC/OS vagrant project which we can easily use, but before we get started, make sure you have Git, Vagrant, VirtualBox installed.

The first step is installing the vagrant-host manager plugin, which will alter our /etc/hosts file with some new hostnames, so we can easily connect to the instances.

$ vagrant plugin install vagrant-hostmanager

Now let’s clone the dcos vagrant project:

$ git clone https://github.com/dcos/dcos-vagrant

Let’s configure a 3 node installation by copying the correct Vagrant configuration:

$ cd dcos-vagrant
$ cp VagrantConfig-1m-1a-1p.yaml VagrantConfig.yaml

Now that we’ve configured everything, we can spin up our new DC/OS VMs.

$ vagrant up

During the installation, it might ask you for a password. You will need to enter your local machines password because vagrant is trying to alter your /etc/hosts file. It might take a little while before everything is up and running, but once it’s done you can navigate to http://m1.dcos/.

If everything is setup alright you should be able to log in and see the DC/OS dashboard.

As you can see from the screenshot above we currently have no tasks or services running, but now we’re all set to go and continue with our deep dive into the SMACK stack. Let’s install Marathon in DC/OS, so we can get some sense of what it takes to install a Mesos Framework.

Go to Universe -> Packages and install Marathon as a Mesos framework. Once installed Marathon you should be able to see you now have running tasks on the dashboard.

And by going to Services you should be able to see the marathon deployment happening within the Mesos cluster.

Now if you want you can also open the service and go to the Marathon UI to start creating groups or applications, but we’ll leave that for next time.

Marathon allows you to deploy processes and application containers and through the UI you can also easily scale these applications.

Mesos frameworks like Marathon, Kafka, Cassandra and Spark can also be easily scaled from within DC/OS. It’s just a matter of a few clicks.

In the above example, we installed Mesos via DC/OS in Vagrant, but you can also run DC/OS in the cloud or on premise. See the DC/OS get started page for more setup options.

Summary

Mesos is the fundamental layer of the SMACK stack that allows the applications to be run efficiently. It has all the features available to do proper auto recovery and can meet the scaling requirements required when you hit high load traffic on your architecture. Installing other parts of the stack is almost trivial with DC/OS and the Mesos frameworks for Kafka, Spark and Cassandra.

In the next part of this series, we will dive deeper into Cassandra and start adding Cassandra to our local SMACK stack for storage of our data at hand.

SaveSave


A beginners guide to the SMACK stack – Part 1: Introduction

This is the introduction to a new series of blog posts about data analysis with the SMACK stack. Follow along in the coming weeks to learn more!

Over the last 20 years, the architectures of (big) data platforms have changed and companies processing large amounts of data have moved from big Enterprise Data warehouse solutions to on-premise / cloud-based solutions, with Apache Hadoop (Hadoop Distributed File System, MapReduce, and YARN) as a fundamental building block. Architectures based on Hadoop tend to be focussed on (long running) batched or offline jobs, where data is captured to storage and then processed periodically. For usage in an online environment, this batch based processing was becoming too slow and business expectations were changing. Over the last 5-10 years, the demand for performing (near) real-time analysis has been pushing the industry into finding new solutions and architectural patterns to achieve these new goals. This has led to several ‘new’ architectural patterns like the Lambda architecture and the Kappa architecture. Both architectural patterns have a focus on processing data at speed (stream processing), where the Kappa architecture is purely focussed on a streaming (speed) layer and completely removes batch-oriented processing.

When designing a data platform there are many aspects that need to be taken into consideration:

  • the type of analysis – batch, (near) real-time, or both
  • the processing methodology – predictive, analytical, ad-hoc queries or reporting
  • data frequency and size — how much data is expected and at what frequency does it arrive at the platform
  • the type of data – transactional, historical, etc
  • the format of incoming data — structured, unstructured or semi-structured
  • the data consumers – who will be using the results

This list is by no means exhaustive, but it’s a starting point.

Organisations processing high volumes of data used to always pick a (single) vendor backed product stack, but these days there are so many  great, open source, reliable and proven solutions out there that you can easily take a best of breed approach and build your own stack. There is  a wide variety of components to select, so always do it based on your specific requirements.  One of the more popular, general purpose and best-of-breed big data stacks I’ve seen lately is the SMACK stack.

The SMACK stack

The SMACK Stack: Spark, Mesos, Akka, Cassandra and Kafka.

The SMACK stack consists of the following technologies:

  • Spark – Apache Spark™ is a fast and general engine for large-scale data processing. Spark allows you to combine SQL, streaming, and complex analytics. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. Spark has support for both real-time (Spark Streaming with µ batches) as well as batch (MapReduce) processing.
  • Mesos – Apache Mesos™ abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesos runs applications within its cluster and makes sure they are highly available and in the case of a machine failure will relocate applications to different nodes in the cluster.
  • Akka – Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. Akka uses the Actor Model to raise the abstraction level and provide a better platform to build scalable, resilient and responsive applications. Everything in Akka is designed to work in a distributed environment: all interactions of actors use pure message passing and everything is asynchronous.
  • Cassandra – Apache Cassandra™ is a proven, high performant, durable and fault tolerant NoSQL database. Cassandra can easily manage large amounts of data and offers robust support for clusters spanning multiple datacenters and geographical locations.
  • Kafka – Apache Kafka™ is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design to allow a single cluster to serve as the central data backbone for a large organization.

The tools are very easy to integrate with each other and serve their own purpose within a modern platform for Big Data applications. The ease of integration between has probably helped a lot in making it a popular solution, but it’s not the only reason. I think the most important reasons are because:

  1. it’s a concise toolbox that can deal with a wide variety of data processing scenarios
  2. it’s composed of proven, battle tested and widely used software components. The individual components are open source and backed by a large open-source community
  3. the stack is easily scalable and replication of data happens while still preserving low latencies
  4. the stack can run on a single cluster managed platform that can handle heterogeneous loads and any kind of applications

Over the next couple of weeks, we’ll be doing a deep-dive into each individual technology, so we can elaborate why these technologies combined are extremely powerful and give you a wide variety of options when designing your (big) data architecture.

Feel free to continue reading in the second part of this series, which covers Apache Mesos, the foundation of the SMACK stack.