Categories for DevOps



Axon Metrics with Spring Boot 2 + Prometheus + Grafana

Included in the Axon framework is a module that will collect metrics on the different Axon message types (commands, events and queries). In this blog, I will explain how to use the Axon metrics module in combination with Prometheus and visualize the metrics in Grafana.

Full disclaimer, I’m contributing to the Axon Metrics module myself, so I’m not completely impartial 😉

Intro Axon

Axon is a Java framework for writing applications using the CQRS paradigm. One of the things CQRS focuses on more than the traditional CRUD/Active Record style of writing applications is business interactions. In a CRUD based system the focus lies more on the Creating/Read/Update/Deletion (hence CRUD) of records while CQRS focuses more the business interactions by modeling the explicit actions (Commands).

For example, instead of an update on the customer information record when the customer moves to a different address, CQRS also models the intent of the change. There will be an explicit MoveCustomer command. When using CQRS with event sourcing this command will also result in an explicit event that indicates that the customer has moved. Probably something like a CustomerMoved event. This explicit modeling of business interactions and state changes has great benefits in applications with complex business processes.

In a traditional CRUD application this information is usually lost because it’s often impossible to deduce why, in this example, the customer’s address was updated (was it a misspelling or did the customer actually move to a new address). When correctly applying event sourcing we can record and react to this information.

Another advantage of having the business interactions modeled in explicit commands and events is that with Axon metrics we can measure the actual business itself. Visualizing metrics collected on these messages holds great value for businesses. Instead of measuring that certain (obscure) technical calls and updates are made to a system. Why not measure the amount of OrderPlaced events and put a thermometer in the core business process itself.

When the number of OrderPlaced events suddenly go down there is a problem that affects the bottom line of the business, not just a technical problem of some obscure REST API call or internal service call that doesn’t work anymore where it’s unclear what the impact is on the business.

Example application

I’ve created an example application to demonstrate the capabilities of using Axon with the metrics module. The code can be found here: https://github.com/luminis-ams/axon-metrics-example The example application contains a flight ticket booking domain. There is a component that simulates user interaction by generating commands that automatically create flights and books seats on flights.

I’ve instrumented the message processors in such a way that is possible to monitor individual message payload types (like BookSeatCommand and SeatBookedEvent for example) to individual message handlers (like the SlowEventListener component for example). I’ve tried to put in some interesting problems in the application so that I can demonstrate how you can detect problems in your Axon application by monitoring it.

The Axon Metric configuration code can be found here.

Small side note, the current metrics module uses Dropwizard metrics. In order to expose the metrics using Prometheus I’ve configured an exporter for the Dropwizard metrics to Prometheus with this line:

collectorRegistry.register(new DropwizardExports(metricRegistry));

(see MetricConfig line 33)

Micrometer Metrics in Axon

I’ve sent in a pull request to Axon to add support for Micrometer metrics. Micrometer is the new default metrics library which is used in Spring Boot 2.0. If you check out the axon-micrometer_beta branch on the GitHub repository you can see how to use the new module (for now you only can use it if you maven install my pull request branch)

In the configuration class on the Micrometer branch, you can see it’s not necessary to export the Dropwizard metrics to Prometheus anymore. Micrometer provides an abstraction of Prometheus which we now use directly. Micrometer can be used with a lot of different metric implementations. For an overview see https://micrometer.io/docs

Dashboard with Grafana

Now that the metrics are exposed via Prometheus we can use Grafana to visualize them in a dashboard. (For more information about Prometheus and Grafana see the excellent blog of my colleague Jeroen Reijn about “Monitoring Spring Boot applications with Prometheus and Grafana”)

Here are some screenshots from Grafana.

CapacityMonitor on events

This one displays the capacity of the event listeners

As you can see in the above image there is 1 event listener that is close to full capacity. If you have a single thread processor this means that almost 100% of the time the event listener is busy. Probably events are queuing up for this event processor. The capacity is not always 1 because the capacity doesn’t measure the overhead of the Axon Framework itself, Axon has to retrieve event messages from a database and store them once their processed.

MessageTimerMonitor on events

This one displays the timings of the different event processors.

You can see that two of the four event processors show a latency of sometimes more than 4 seconds for half of the requests. This is a problem and should be looked at.

MessageCountingMonitor on BookSeatCommand

This graph displays the success, failure and ingested counter of the BookSeatCommand.

As you can see sometimes the command handler of the BookSeatCommand throws an Exception which will be counted as a failure in the graph.

Run the example application

I’ve created a docker setup if you want to run the full stack yourself. You can check out the code of example application here:

https://github.com/luminis-ams/axon-metrics-example

After that first do a build of the app project to create the application docker image. From the project root run:

./mvnw clean build 

After that run:

docker-compose -f docker/docker-compose.yml up

As told the metrics are exposed using Prometheus. On the Prometheus endpoint http://localhost:8080/actuator/prometheus you can see the metrics in raw form.

In Prometheus, you can see the metrics under http://localhost:9090

You can import the example metrics dashboard in Grafana in the following way:

  1. Login http://localhost:3000 (login admin:password)
  2. On the left side of the screen click on the + icon and select import
  3. Copy the contents of grafana/dashboard.json into the bottom text field
  4. Select the Prometheus data source in the dropdown

Conclusion

Because of Axon’s strong focus on modeling business interactions, metrics on those business interactions contain a lot more useful information. Useful not only for troubleshooting but also for measuring the performance of the business itself.


Monitoring Spring Boot applications with Prometheus and Grafana

At my current project, we’ve been building three different applications. All three applications are based on Spring Boot, but have very different workloads. They’ve all reached their way to the production environment and have been running steadily for quite some time now. We do regular (weekly basis) deployments of our applications to production with bug fixes, new features, and technical improvements. The organisation has a traditional infrastructure workflow in the sense that deployments to the VM instances on acceptance and production happen via the (remote) hosting provider.

The hosting provider is responsible for the uptime of the applications and therefore they keep an eye on system metrics through the usage of their own monitoring system. As a team, we are able to look in the system, but it doesn’t say much about the internals of our application. In the past, we’ve asked to add some additional metrics to their system, but the system isn’t that easy to configure with additional metrics. To us as a team runtime statistics about our applications and the impact our changes have on the overall health are crucial to understanding the impact of our work. The rest of this post will give a short description of our journey and the reasons why we chose the resulting setup.

Spring Boot Actuator and Micrometer

If you’ve used Spring Boot before you’ve probably heard of Spring Boot Actuator. Actuator is a set of features that help you monitor and manage your application when it moves away from your local development environment and onto a test, staging or production environment. It helps expose operational information about the running application – health, metrics, audit entries, scheduled task, env settings, etc. You can query the information via either several HTTP endpoints or JMX beans. Being able to view the information is useful, but it’s hard to spot trends or see the behaviour over a period of time.

When we recently upgraded our projects to Spring Boot 2 my team was pretty excited that we were able to start using micrometer a (new) instrumentation library powering the delivery of application metrics. Micrometer is now the default metrics library in Spring Boot 2 and it doesn’t just give you metrics from your Spring application, but can also deliver JVM metrics (garbage collection and memory pools, etc) and also metrics from the application container. Micrometer has several different libraries that can be included to ship metrics to different backends and has support for Prometheus, Netflix Atlas, CloudWatch, Datadog, Graphite, Ganglia, JMX, Influx/Telegraf, New Relic, StatsD, SignalFx, and Wavefront.

Because we didn’t have a lot of control over the way our applications were deployed we looked at the several different backends supported by micrometer. Most of the above backends work by pushing data out to a remote (cloud) service. Since the organisation we work for doesn’t allow us to push this ‘sensitive’ data to a remote party we looked at self-hosted solutions. We did a quick scan and started with looking into Prometheus (and Grafana) and soon learned that it was really easy to get a monitoring system up and we had a running system within an hour.

To be able to use Spring Boot Actuator and Prometheus together you need to add two dependencies to your project:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Actuator has an endpoint available for prometheus to scrape but it’s not exposed by default, so you will need to enable the endpoint by means of configuration. In this case, I’ll do so via the application.properties.

management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus,info,health

Now if you browse to http(s)://host(:8080)/actuator/prometheus you will see the output that prometheus will scrape to get the information from your application. A small snippet of the information provided by the endpoint is shown below, but there is a lot more information that the prometheus endpoint will expose.

# HELP tomcat_global_sent_bytes_total  
# TYPE tomcat_global_sent_bytes_total counter
tomcat_global_sent_bytes_total{name="http-nio-8080",} 75776.0
tomcat_global_sent_bytes_total{name="http-nio-8443",} 1.0182049E8
# HELP tomcat_servlet_request_max_seconds  
# TYPE tomcat_servlet_request_max_seconds gauge
tomcat_servlet_request_max_seconds{name="default",} 0.0
tomcat_servlet_request_max_seconds{name="jsp",} 0.0
# HELP process_files_open The open file descriptor count
# TYPE process_files_open gauge
process_files_open 91.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.00427715996578272
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{area="heap",id="PS Eden Space",} 1.77733632E8
jvm_memory_max_bytes{area="heap",id="PS Survivor Space",} 524288.0
jvm_memory_max_bytes{area="heap",id="PS Old Gen",} 3.58088704E8

Now that everything is configured from the application perspective, let’s move on to Prometheus itself.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud and now part of the Cloud Native Computing Foundation. To get a better understanding of what prometheus really is let us take a look at an architectural diagram.

(Source: https://prometheus.io/docs/introduction/overview/)

The prometheus server contains of a set of 3 features:

  • A time series database
  • A retrieval component which scrapes its targets for information
  • An HTTP server which you can use to query information stored inside the time series database

To make it even more powerful there are some additional components which you can use if you want:

  • An alert manager, which you can use to send alerts via Pagerduty, Slack, etc.
  • A push gateway in case you need to push information to prometheus instead of using the default pull mechanism
  • Grafana for visualizing data and creating dashboards

When looking at Prometheus the most appealing features for us were:

  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

To get up and running quickly you can configure prometheus to scrape some (existing) Spring Boot applications. For scraping targets, you will need to specify them within the prometheus configuration. Prometheus uses a file called prometheus.yml as its main configuration file. Within the configuration file, you can specify where it can find the targets it needs to monitor, specify recording rules and alerting rules.

The following example shows a configuration with a set of static targets for both prometheus itself and our spring boot application.

global:
  scrape_interval:   15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'bootifull-monitoring'

scrape_configs:
- job_name:       'monitoring-demo'

  # Override the global default and scrape targets from this job every 10 seconds.
  scrape_interval: 10s
  metrics_path: '/actuator/prometheus'

  static_configs:
  - targets: ['monitoring-demo:8080']
    labels:
      application: 'monitoring-demo'

- job_name: 'prometheus'

  scrape_interval: 5s

  static_configs:
  - targets: ['localhost:9090']

As you can see the configuration is pretty simple. You can add specific labels to the targets which can, later on, be used for querying, filtering and creating a dashboard based upon the information stored within prometheus.

If you want to get started quickly with Prometheus and have docker on your environment you can use the official docker prometheus image by running the following command and provide a custom configuration from your host machine by running:

docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
       prom/prometheus:v2.4.3

In the above example we bind-mount the main prometheus configuration file from the host system, so you can, for instance, use the above configuration. Prometheus itself has some basic graphing capabilities (as you can see in the following image), but they are more meant to be used when doing some ad-hoc queries.

For creating an application monitoring dashboard Grafana is much more suited.

Grafana

So what is Grafana and what role does it play in our monitoring stack?

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

The cool thing about Grafana is (next to the beautiful UI) that it’s not tied to Prometheus as its single data source like for instance Kibana is tied to Elasticsearch. Grafana can have many different data sources like AWS Cloudwatch, Elasticsearch, InfluxDB, Prometheus, etc. This makes it a very good option for creating a monitoring dashboard. Grafana talks to prometheus by using the PromQL query language.

For Grafana there is also an official Docker image available for you to use. You can get Grafana up and running with a simple command.

docker run -p 3000:3000 grafana/grafana:5.2.4

Now if we connect Grafana with Prometheus as the datasource and install this excellent JVM Micrometer dashboard into Grafana we can instantly start monitoring our Spring Boot application. You will end up with a pretty mature dashboard that lets you switch between different instances of your application.

If you want to start everything all at once you can easily use docker-compose.

version: "3"
services:
  app:
    image: monitoring-demo:latest
    container_name: 'monitoring-demo'
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
    - '8080:8080'
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./monitoring/prometheus/:/etc/prometheus/
    ports:
    - '9090:9090'
  grafana:
    image: grafana/grafana:5.2.4
    container_name: 'grafana'
    ports:
    - '3000:3000'
    volumes:
    - ./monitoring/grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
    - ./monitoring/grafana/config.monitoring
    depends_on:
    - prometheus

I’ve put together a small demo project, containing a simple Spring Boot application and the above prometheus configuration, in a github repository for demo and experimentation purposes. Now if you want to generate some statistics run a small load test with JMeter or Apache Bench. Feel free to use/fork it!


Setting up data analytics pipeline: the best practices

The picture is courtesy of https://bit.ly/2K44Nk5 1 Datapipeline Architect Example

In a data science analogy with the automotive industry, the data plays the role of the raw-oil which is not yet ready for combustion. The data modeling phase is comparable with combustion in the engines and data preparation is the refinery process turning raw-oil to the fuel i.e., ready for combustion. In this analogy data analytics pipeline includes all the steps from extracting the oil up to combustion, driving and reaching to the destination (analogous to reach the business goals). As you can imagine, the data (or oil in this analogy) goes through a various transformation and goes from one stage of the process to another. But the question is what is the best practice in terms of data format and tooling? Although there are many tools that make the best practice sometimes very use-case specific but generally JSON is the best practice for the data-format of communication or the lingua franca and Python is the best practice for orchestration, data preparation, analytics and live production.

What is the common inefficiency and why it happens?

The current inefficiency is overusing of tabular (csv-like) data-formats for communication or lingua franca. I believe data scientists still overuse the structured data types for communication within data analytics pipeline because of standard data-frame-like data formats offered by major analytic tools such as Python and R. Data scientists start getting used to data-frame mentality forgetting the fact that tabular storage of the data is a low scale solution, not optimized for communication and when it comes to bigger sets of data or flexibility to add new fields to the data, data-frames and their tabular form are non-efficient.

DataOps Pipeline and Data Analytics

A very important aspect for analytics being ignored in some circumstances is going live and getting integrated with other systems. DataOps is about setting up a set of tools from capturing data, storing them up to analytics and integration, falling into an interdisciplinary realm of the DevOps, Data Engineering, Analytics and Software Engineering (Hereinafter I use data analytics pipeline and DataOps pipeline interchangeably.) The modeling part and probably some parts in data prep phases need a data-frame like data format but the rest of the pipeline is more efficient and robust if is JSON native. It allows adding/removing features easier and is a compact form for communication between modules.

The picture is courtesy of https://zalando-jobsite.cdn.prismic.io/zalando-jobsite/2ed778169b702ca83c2505ceb65424d748351109_image_5-0d8e25c02668e476dd491d457f605d89.jpg 2

The role of Python

Python is a great programming language used not only by the scientific community but also the application developers. It is ready to be used as back-end and by combining it with Django you can build up full-stack web applications. Python has almost everything you need to set up a DataOps pipeline and is ready for integration and live production.

Python Example: transforming CSV to JSON and storing it in MongoDB

To show some capabilities of Python in combination with JSON, I have brought a simple example. In this example, a dataframe is converted to JSON (Python dictionaries) and is stored in MongoDB. MongoDB is an important database in today’s data storage as it is JSON native storing data in a document format bringing high flexibility .

<br />### Loading packages

from pymongo import MongoClient import pandas as pd

# Connecting to the database

client = MongoClient('localhost', 27017)

# Creating database and schema

db = client.pymongo_test posts = db.posts

# Defining a dummy dataframe

df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])

# Transforming dataframe to a dictionary (JSON)

dic=df.to_dict()

# Writing to the database

result = posts.insert_one(dic) print('One post: {0}'.format(result.inserted_id))

The above example shows the ability of python in data transformation from dataframe to JSON and its ability to connect to various tooling (MongoDB in this example) in DataOps pipeline.

Recap

This article is an extension to my previous article on future of data science (https://bit.ly/2sz8EdM). In my earlier article, I have sketched the future of data science and have recommended data scientists to go towards full-stack. Once you have a full stack and various layers for DataOps / data analytics JSON is the lingua franca between modules bringing robustness and flexibility for this communication and Python is the orchestrator of various tools and techniques in this pipeline.


1: The picture is courtesy of https://cdn-images-1.medium.com/max/1600/1*8-NNHZhRVb5EPHK5iin92Q.png 2: The picture is courtesy of https://zalando-jobsite.cdn.prismic.io/zalando-jobsite/2ed778169b702ca83c2505ceb65424d748351109_image_5-0d8e25c02668e476dd491d457f605d89.jpg