Categories for DevOps



Monitoring Spring Boot applications with Prometheus and Grafana

At my current project, we’ve been building three different applications. All three applications are based on Spring Boot, but have very different workloads. They’ve all reached their way to the production environment and have been running steadily for quite some time now. We do regular (weekly basis) deployments of our applications to production with bug fixes, new features, and technical improvements. The organisation has a traditional infrastructure workflow in the sense that deployments to the VM instances on acceptance and production happen via the (remote) hosting provider.

The hosting provider is responsible for the uptime of the applications and therefore they keep an eye on system metrics through the usage of their own monitoring system. As a team, we are able to look in the system, but it doesn’t say much about the internals of our application. In the past, we’ve asked to add some additional metrics to their system, but the system isn’t that easy to configure with additional metrics. To us as a team runtime statistics about our applications and the impact our changes have on the overall health are crucial to understanding the impact of our work. The rest of this post will give a short description of our journey and the reasons why we chose the resulting setup.

Spring Boot Actuator and Micrometer

If you’ve used Spring Boot before you’ve probably heard of Spring Boot Actuator. Actuator is a set of features that help you monitor and manage your application when it moves away from your local development environment and onto a test, staging or production environment. It helps expose operational information about the running application – health, metrics, audit entries, scheduled task, env settings, etc. You can query the information via either several HTTP endpoints or JMX beans. Being able to view the information is useful, but it’s hard to spot trends or see the behaviour over a period of time.

When we recently upgraded our projects to Spring Boot 2 my team was pretty excited that we were able to start using micrometer a (new) instrumentation library powering the delivery of application metrics. Micrometer is now the default metrics library in Spring Boot 2 and it doesn’t just give you metrics from your Spring application, but can also deliver JVM metrics (garbage collection and memory pools, etc) and also metrics from the application container. Micrometer has several different libraries that can be included to ship metrics to different backends and has support for Prometheus, Netflix Atlas, CloudWatch, Datadog, Graphite, Ganglia, JMX, Influx/Telegraf, New Relic, StatsD, SignalFx, and Wavefront.

Because we didn’t have a lot of control over the way our applications were deployed we looked at the several different backends supported by micrometer. Most of the above backends work by pushing data out to a remote (cloud) service. Since the organisation we work for doesn’t allow us to push this ‘sensitive’ data to a remote party we looked at self-hosted solutions. We did a quick scan and started with looking into Prometheus (and Grafana) and soon learned that it was really easy to get a monitoring system up and we had a running system within an hour.

To be able to use Spring Boot Actuator and Prometheus together you need to add two dependencies to your project:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Actuator has an endpoint available for prometheus to scrape but it’s not exposed by default, so you will need to enable the endpoint by means of configuration. In this case, I’ll do so via the application.properties.

management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus,info,health

Now if you browse to http(s)://host(:8080)/actuator/prometheus you will see the output that prometheus will scrape to get the information from your application. A small snippet of the information provided by the endpoint is shown below, but there is a lot more information that the prometheus endpoint will expose.

# HELP tomcat_global_sent_bytes_total  
# TYPE tomcat_global_sent_bytes_total counter
tomcat_global_sent_bytes_total{name="http-nio-8080",} 75776.0
tomcat_global_sent_bytes_total{name="http-nio-8443",} 1.0182049E8
# HELP tomcat_servlet_request_max_seconds  
# TYPE tomcat_servlet_request_max_seconds gauge
tomcat_servlet_request_max_seconds{name="default",} 0.0
tomcat_servlet_request_max_seconds{name="jsp",} 0.0
# HELP process_files_open The open file descriptor count
# TYPE process_files_open gauge
process_files_open 91.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.00427715996578272
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{area="heap",id="PS Eden Space",} 1.77733632E8
jvm_memory_max_bytes{area="heap",id="PS Survivor Space",} 524288.0
jvm_memory_max_bytes{area="heap",id="PS Old Gen",} 3.58088704E8

Now that everything is configured from the application perspective, let’s move on to Prometheus itself.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud and now part of the Cloud Native Computing Foundation. To get a better understanding of what prometheus really is let us take a look at an architectural diagram.

(Source: https://prometheus.io/docs/introduction/overview/)

The prometheus server contains of a set of 3 features:

  • A time series database
  • A retrieval component which scrapes its targets for information
  • An HTTP server which you can use to query information stored inside the time series database

To make it even more powerful there are some additional components which you can use if you want:

  • An alert manager, which you can use to send alerts via Pagerduty, Slack, etc.
  • A push gateway in case you need to push information to prometheus instead of using the default pull mechanism
  • Grafana for visualizing data and creating dashboards

When looking at Prometheus the most appealing features for us were:

  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

To get up and running quickly you can configure prometheus to scrape some (existing) Spring Boot applications. For scraping targets, you will need to specify them within the prometheus configuration. Prometheus uses a file called prometheus.yml as its main configuration file. Within the configuration file, you can specify where it can find the targets it needs to monitor, specify recording rules and alerting rules.

The following example shows a configuration with a set of static targets for both prometheus itself and our spring boot application.

global:
  scrape_interval:   15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'bootifull-monitoring'

scrape_configs:
- job_name:       'monitoring-demo'

  # Override the global default and scrape targets from this job every 10 seconds.
  scrape_interval: 10s
  metrics_path: '/actuator/prometheus'

  static_configs:
  - targets: ['monitoring-demo:8080']
    labels:
      application: 'monitoring-demo'

- job_name: 'prometheus'

  scrape_interval: 5s

  static_configs:
  - targets: ['localhost:9090']

As you can see the configuration is pretty simple. You can add specific labels to the targets which can, later on, be used for querying, filtering and creating a dashboard based upon the information stored within prometheus.

If you want to get started quickly with Prometheus and have docker on your environment you can use the official docker prometheus image by running the following command and provide a custom configuration from your host machine by running:

docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
       prom/prometheus:v2.4.3

In the above example we bind-mount the main prometheus configuration file from the host system, so you can, for instance, use the above configuration. Prometheus itself has some basic graphing capabilities (as you can see in the following image), but they are more meant to be used when doing some ad-hoc queries.

For creating an application monitoring dashboard Grafana is much more suited.

Grafana

So what is Grafana and what role does it play in our monitoring stack?

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

The cool thing about Grafana is (next to the beautiful UI) that it’s not tied to Prometheus as its single data source like for instance Kibana is tied to Elasticsearch. Grafana can have many different data sources like AWS Cloudwatch, Elasticsearch, InfluxDB, Prometheus, etc. This makes it a very good option for creating a monitoring dashboard. Grafana talks to prometheus by using the PromQL query language.

For Grafana there is also an official Docker image available for you to use. You can get Grafana up and running with a simple command.

docker run -p 3000:3000 grafana/grafana:5.2.4

Now if we connect Grafana with Prometheus as the datasource and install this excellent JVM Micrometer dashboard into Grafana we can instantly start monitoring our Spring Boot application. You will end up with a pretty mature dashboard that lets you switch between different instances of your application.

If you want to start everything all at once you can easily use docker-compose.

version: "3"
services:
  app:
    image: monitoring-demo:latest
    container_name: 'monitoring-demo'
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
    - '8080:8080'
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./monitoring/prometheus/:/etc/prometheus/
    ports:
    - '9090:9090'
  grafana:
    image: grafana/grafana:5.2.4
    container_name: 'grafana'
    ports:
    - '3000:3000'
    volumes:
    - ./monitoring/grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
    - ./monitoring/grafana/config.monitoring
    depends_on:
    - prometheus

I’ve put together a small demo project, containing a simple Spring Boot application and the above prometheus configuration, in a github repository for demo and experimentation purposes. Now if you want to generate some statistics run a small load test with JMeter or Apache Bench. Feel free to use/fork it!


Setting up data analytics pipeline: the best practices

The picture is courtesy of https://bit.ly/2K44Nk5 1 Datapipeline Architect Example

In a data science analogy with the automotive industry, the data plays the role of the raw-oil which is not yet ready for combustion. The data modeling phase is comparable with combustion in the engines and data preparation is the refinery process turning raw-oil to the fuel i.e., ready for combustion. In this analogy data analytics pipeline includes all the steps from extracting the oil up to combustion, driving and reaching to the destination (analogous to reach the business goals). As you can imagine, the data (or oil in this analogy) goes through a various transformation and goes from one stage of the process to another. But the question is what is the best practice in terms of data format and tooling? Although there are many tools that make the best practice sometimes very use-case specific but generally JSON is the best practice for the data-format of communication or the lingua franca and Python is the best practice for orchestration, data preparation, analytics and live production.

What is the common inefficiency and why it happens?

The current inefficiency is overusing of tabular (csv-like) data-formats for communication or lingua franca. I believe data scientists still overuse the structured data types for communication within data analytics pipeline because of standard data-frame-like data formats offered by major analytic tools such as Python and R. Data scientists start getting used to data-frame mentality forgetting the fact that tabular storage of the data is a low scale solution, not optimized for communication and when it comes to bigger sets of data or flexibility to add new fields to the data, data-frames and their tabular form are non-efficient.

DataOps Pipeline and Data Analytics

A very important aspect for analytics being ignored in some circumstances is going live and getting integrated with other systems. DataOps is about setting up a set of tools from capturing data, storing them up to analytics and integration, falling into an interdisciplinary realm of the DevOps, Data Engineering, Analytics and Software Engineering (Hereinafter I use data analytics pipeline and DataOps pipeline interchangeably.) The modeling part and probably some parts in data prep phases need a data-frame like data format but the rest of the pipeline is more efficient and robust if is JSON native. It allows adding/removing features easier and is a compact form for communication between modules.

The picture is courtesy of https://zalando-jobsite.cdn.prismic.io/zalando-jobsite/2ed778169b702ca83c2505ceb65424d748351109_image_5-0d8e25c02668e476dd491d457f605d89.jpg 2

The role of Python

Python is a great programming language used not only by the scientific community but also the application developers. It is ready to be used as back-end and by combining it with Django you can build up full-stack web applications. Python has almost everything you need to set up a DataOps pipeline and is ready for integration and live production.

Python Example: transforming CSV to JSON and storing it in MongoDB

To show some capabilities of Python in combination with JSON, I have brought a simple example. In this example, a dataframe is converted to JSON (Python dictionaries) and is stored in MongoDB. MongoDB is an important database in today’s data storage as it is JSON native storing data in a document format bringing high flexibility .

<br />### Loading packages

from pymongo import MongoClient import pandas as pd

# Connecting to the database

client = MongoClient('localhost', 27017)

# Creating database and schema

db = client.pymongo_test posts = db.posts

# Defining a dummy dataframe

df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])

# Transforming dataframe to a dictionary (JSON)

dic=df.to_dict()

# Writing to the database

result = posts.insert_one(dic) print('One post: {0}'.format(result.inserted_id))

The above example shows the ability of python in data transformation from dataframe to JSON and its ability to connect to various tooling (MongoDB in this example) in DataOps pipeline.

Recap

This article is an extension to my previous article on future of data science (https://bit.ly/2sz8EdM). In my earlier article, I have sketched the future of data science and have recommended data scientists to go towards full-stack. Once you have a full stack and various layers for DataOps / data analytics JSON is the lingua franca between modules bringing robustness and flexibility for this communication and Python is the orchestrator of various tools and techniques in this pipeline.


1: The picture is courtesy of https://cdn-images-1.medium.com/max/1600/1*8-NNHZhRVb5EPHK5iin92Q.png 2: The picture is courtesy of https://zalando-jobsite.cdn.prismic.io/zalando-jobsite/2ed778169b702ca83c2505ceb65424d748351109_image_5-0d8e25c02668e476dd491d457f605d89.jpg