Tag Archive: kibana



A fresh look at Logstash

Soon after the release of elasticsearch it became clear that elasticsearch was good at more than providing search. It turned out that it could be used to store logs very effectively. That is why logstash was using elasticsearch. It contained standard parsers for apache httpd logs. To obtain the logs it had file monitoring plugins. It had plugins to extend and filter the content, and it had plugins to send the content to elasticsearch. That is Logstash in a nutshell back in the days. Of course the logs had to be shown, therefore a tool called Kibana was created. Kibana was a nice tool to create highly interactive dashboards to show and analyse your data. Together they became the famous ELK suite. Nowadays we have a lot more options in all these tools. We have Ingest node in elastic to pre-process documents before they move into elasticsearch, we have beats to monitor files, databases, machines, etc. And we have very nice and new Kibana dashboards. Time to re-investigate what the combination of Logstash, Elasticsearch and Kibana can do. In this blog post I’ll focus on Logstash.

X-Pack

As the company elastic has to make some money as well, they have created a product called X-Pack. X-Pack has a lot of features that sometimes span multiple products. There is a security component, by using this you can make users login in when using kibana and secure your content. Other interesting parts of X-Pack are machine learning, graph and monitoring. Parts of X-Pack can be used free of charge, you do need a license though. For other parts you need a paid license. I personally like the monitoring part so I regularly install X-Pack. In this blogpost I’ll also investigate the X-Pack features for Logstash. I’ll focus on out-of-the-box functionality and mostly what all these nice new things like monitoring and pipeline viewing bring us.

Using the version 6 release candidate

As elastic has already given us a RC1 of their complete stack, I’ll use this one for the evaluation. Beware though, this is still a release candidate, so not production ready.

What does Logstash do

If you never really heard about Logstash, let me give you a very short introduction. Logstash can be used to obtain data from a multitude of different sources. Than filter, transform and enrich the data. Finally store the data to again a multitude of datasources. Example data sources are relational databases, files, queues and websockets. Logstash ships with a large number of filter plugins, with these we can process data to exclude some fields. We can also enrich data, lookup information about ip addresses, or lookup records belonging to an id in for instance elasticsearch or a database. After the lookup we can add data to the document or event that we are handling before sending it to one or more outputs. Outputs can be elasticsearch, a database, but also queue’s like Kafka or RabbitMQ.

In the later releases logstash started to add more features that a tool handling large amounts of data over longer periods need. Things like monitoring and clustering of nodes were introduced and also persisting incoming data to disk. By now logstash in combination with Kibana and Elasticsearch is used by very large companies but also by a lot of start ups to monitor their servers and handle all sorts of interesting data streams.

Enough of this talk, let us get our hands dirty. First step install everything on our developer machines.

Installation

I’ll focus on the developer machine, if you want to install it on a server please refer to the extensive logstash documentation.

First download the zip or tar.gz file and extract it to a convenient location. Now create a folder where you can store the configuration files. To make the files small and to show you that you can split them, I create three different files in this folder: input.conf, filters.conf and output.conf. The most basic configuration is one with a stdin for input, no filters and stdout for output. Below the contents for the two files

input {
	stdin{}
}
output { 
	stdout { 
		codec => rubydebug
	}
}

Time to start logstash. Step into the downloaded and extracted folder with the logstash binaries and execute the following command.

bin/logstash -r -f ../logstashblog/

the -r, can be used during development for reloading the configuration on change. Beware, this does not work with the stdin plugin. With -f we tell logstash to load a configuration file or directory. In our case a directory containing the three mentioned files. When logstash is ready it will print something like this:

[2017-10-28T19:00:19,511][INFO ][logstash.pipeline        ] Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[2017-10-28T19:00:19,526][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}

Now you can type something and the result is the created document or event that went through the almost empty pipeline. The thing to notice is that we now have a field called message containing the text we entered.

Just some text for input
{
      "@version" => "1",
          "host" => "Jettros-MBP.fritz.box",
    "@timestamp" => 2017-10-28T17:02:18.185Z,
       "message" => "Just some text for input"
}

Now that we know it is working, I want you to have a look at the monitoring options you have available using the rest endpoint.

http://localhost:9600/

{
"host": "Jettros-MBP.fritz.box",
"version": "6.0.0-rc1",
"http_address": "127.0.0.1:9600",
"id": "20290d5e-1303-4fbd-9e15-03f549886af1",
"name": "Jettros-MBP.fritz.box",
"build_date": "2017-09-25T20:32:16Z",
"build_sha": "c13a253bb733452031913c186892523d03967857",
"build_snapshot": false
}

You can use the same url with different endpoints to get information about the node, the plugins, stats and hot threads:
http://localhost:9600/_node
http://localhost:9600/_node/plugins
http://localhost:9600/_node/stats
http://localhost:9600/_node/hot_threads

It becomes a lot more fun if we have a UI, so let us install xpack into logstash. Before we can run logstash with monitoring on, we need to install elasticsearch and kibana with X-pack installed into those as well. Refer to the X-Pack documentation on how to do it.

The basic commands to install x-pack into elasticsearch and kibana are very easy. For now I disable security by adding the following line to both kibana.yml and elasticsearch.yml: xpack.security.enabled: false. After installing x-pack into logstash we have to add the following lines to the logstash.yml file in the config folder

xpack.monitoring.elasticsearch.url: ["http://localhost:9200"] 
xpack.monitoring.elasticsearch.username:
xpack.monitoring.elasticsearch.password:

Notice the empty username and password, this is required when security is disabled. Now move over to Kibana and check the monitoring tab (the heart shape figure) and click on logstash. In the first screen you can see the events, they could be zero, zo please enter some events. Now move to the pipeline tab. Of course with our basic pipeline, this is a bit stupid, but imagine what it will show later on.

Screen Shot 2017 10 28 at 19 52 46

Time to get some real input.

Import the Signalmedia dataset

Signalmedia has provided a dataset you can use for research. More information about the dataset and how to obtain it can be found here. The dataset contains an exact amount of 1 million news documents. You can download the file as a file that contains a JSON document on each line. The JSON document has the following format:

{
   "id": "a080f99a-07d9-47d1-8244-26a540017b7a",
   "content": "KUALA LUMPUR, Sept 15 (MySinchew) -- The Kuala Lumpur City Hall today issued ...",
   "title": "Pay up or face legal action: DBKL",
   "media-type": "News",
   "source": "My Sinchew",
   "published": "2015-09-15T10:17:53Z"
}

We want to import this big file with all the JSON documents as separate documents into elasticsearch using logstash. The first step is to create a logstash input. Use the path to point to the file. We can use the logstash file plugin to load the file, tell it to start at the beginning and mark each line as a JSON document. The file plugin has more options you can use. It can also handle rolling files that are used a lot in logging.

input {
	file {
        path => "/Volumes/Transcend/signalmedia-1m.jsonl"
        codec => "json"
        start_position => beginning 
    }
}

That is it, with the stdout plugin and the rubydebug codec this would give the following output.

{
          "path" => "/Volumes/Transcend/signalmedia-1m.jsonl",
    "@timestamp" => 2017-10-30T18:49:45.948Z,
      "@version" => "1",
          "host" => "Jettros-MBP.fritz.box",
            "id" => "a080f99a-07d9-47d1-8244-26a540017b7a",
        "source" => "My Sinchew",
     "published" => "2015-09-15T10:17:53Z",
         "title" => "Pay up or face legal action: DBKL",
    "media-type" => "News",
       "content" => "KUALA LUMPUR, Sept 15 (MySinchew) -- The Kuala Lumpur City Hall today issued ..."
}

Notice that besides the fields we expected: id, content, title, media-type, source and published we also got some additional fields. Before sending this to elasticsearch we want to clean it up. We do not need the path, host, @timestamp, @version. There is also something with the field id. We want to use the id field to create the document in elasticsearch, but we do not want to add it to the document. If we need the value of id in the output plugin later on, but we do not want to add it as a field to the document we can move it to the @metadata object. That is exactly what the first part of the filter does. The second part removes the fields we do not need.

filter {
	mutate {
		copy => {"id" => "[@metadata][id]"}
	}
	mutate {
		remove_field => ["@timestamp", "@version", "host", "path", "id"]
	}
}

With these filters in place the output of the same document would become:

{
        "source" => "My Sinchew",
     "published" => "2015-09-15T10:17:53Z",
         "title" => "Pay up or face legal action: DBKL",
    "media-type" => "News",
       "content" => "KUALA LUMPUR, Sept 15 (MySinchew) -- The Kuala Lumpur City Hall today issued ..."
}

Now the content is ready to be send to elasticsearch, so we need to configure the elasticsearch output plugin. When sending data to elastic you first need to think about creating the index and the mapping that goes with it. In this example I am going to create an index template. I am not going to explain a lot about the mappings as this is not an elasticsearch blog. But with the following code we insert the mapping template when connecting to elasticsearch and we can insert all documents. Do look at the way the document_id is created. Remember we talked about that @metadata and how we copied the id field into it. This is the reason why we did it. Now we use that value as the id of the document when inserting it into elasticsearch.

output {
	elasticsearch {
		index => "signalmedia"
		document_id => "%{[@metadata][id]}"
		document_type => "doc"
		manage_template => "true"
		template => "./signalmedia-template.json"
		template_name => "signalmediatemplate"
	}
	stdout { codec => dots }
}

Notice there are two outputs configured. The elasticsearch output of course, but also a stdout. This time not with the rubydebug codec, this would be way to verbose. We use the dots codec. This codec prints a dot for each document it parses.

For completeness I also want to show the mapping template. In this case I positioned it in the root folder of the logstash binary, usually this would of course be an absolute path somewhere else.

{
  "index_patterns": ["signalmedia"],
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 3
  },
  "mappings": {
    "doc": {
      "properties": {
        "source": {
          "type": "keyword"
        },
        "published": {
          "type": "date"
        },
        "title": {
          "type": "text"
        },
        "media-type": {
          "type": "keyword"
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

Now we want to import all the million documents and have a look at the monitoring along the way. Let’s do it.

Screen Shot 2017 10 30 at 20 50 36
Screen Shot 2017 10 30 at 20 48 21

Running a query

Of course we have to prove the documents are now available in elasticsearch. So lets execute one of my favourite queries that makes use of the new significant text aggregation. First the request and then parts of the response.

GET signalmedia/_search
{
  "query": {
    "match": {
      "content": "netherlands"
    }
  },
  "aggs": {
    "my_sampler": {
      "sampler": {
        "shard_size": 200
      },
      "aggs": {
        "keywords": {
          "significant_text": {
            "field": "content",
            "filter_duplicate_text": true
          }
        }
      }
    }
  },
  "size": 0
}

Just a very small part of the response, I stripped out a lot of the elements to make it better viewable. Good to see that that see dutch as a significant word when searching for the netherlands and of course geenstijl.

"buckets": [
  {"key": "netherlands","doc_count": 527},
  {"key": "dutch","doc_count": 196},
  {"key": "mmsi","doc_count": 7},
  {"key": "herikerbergweg","doc_count": 4},
  {"key": "konya","doc_count": 14},
  {"key": "geenstijl","doc_count": 3}
]

Concluding

Good to see the nice ui options in Kibana. The pipeline viewer is very useful. In a next blog post I’ll be looking at Kibana and all the new and interesting things in there.


Developing Kibana Plugins

For quite some time I’ve been meaning to rewrite an old Elasticsearch plugin to a new Kibana plugin. It’s quite different than you were used to. The Kibana plugins are quite new and were released in version 4.2.0. There are quite a few warnings on the Kibana Github issues regarding not having a public API yet or not making plugins at all. Essentially this means you need to keep up with the Kibana releases if you’d like to proceed anyway.

Starting

First you would need to set-up a Kibana development environment. In order to do this you can follow these steps: https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#development-environment-setup. I would recommend not to pick the master branch.

Kibana Plugin Yeoman Generator

Next up is generating the boilerplate for your Kibana plugin using the Kibana Plugin Yeoman Generator. When following the instructions of the readme please notice there is an instruction to put the Kibana development environment at the same folder level as your plugin named kibana.

What did you get?

After we generated the project there is in the root of the project a file called index.js listed below, which ties up the whole project together. In this file we see the configuration of a Kibana plugin. The uiExports object configures your frontend of the plugin. There are more variants other than ‘app’ that can be found here. The properties an app can have are listed in the source code here. The ‘main’ property lists the path to your angular code, which is in fact the public folder of your project.

import exampleRoute from './server/routes/example';

export default function (kibana) {
  return new kibana.Plugin({
    require: ['elasticsearch'],
    
    // Your frontend app
    uiExports: {
      app: {
        title: 'Example',
        description: 'An awesome Kibana plugin',
        main: 'plugins/example/app'
      }
    },

    config(Joi) {
      return Joi.object({
        enabled: Joi.boolean().default(true),
      }).default();
    },

    init(server, options) {
      // Add server routes and initalize the plugin here
      exampleRoute(server);
    }

  });
};

The latter part of the sample above contains configuration for a hapijs server. With this you’re able to write your custom backend API. More of this you can find in the server/routes folder.

The frontend

In the public/app.js file we can find two important parts of code. One of which is the part with uiRoutes. This object allows you to embed your routing within Kibana. The syntax of the routing is according angular’s ngRoute module. When you don’t need routing remove it.

uiRoutes.enable();
uiRoutes
    .when('/', {
        template,
        resolve: {
            currentTime($http) {
                return $http.get('../api/example/example').then(function (resp) {
                    return resp.data.time;
                });
            }
        }
    });

The latter part has the uiModules object which is in charge of generating dynamic modules on the fly and coupling them. You can see the uiModules.get() function as replacement for angular.module().

uiModules
    .get('app/example', [])
    .controller('exampleHelloWorld', function ($scope, $route, $interval) {
        $scope.title = 'Example';
        $scope.description = 'An awesome Kibana plugin';

        var currentTime = moment($route.current.locals.currentTime);
        $scope.currentTime = currentTime.format('HH:mm:ss');
        var unsubscribe = $interval(function () {
            $scope.currentTime = currentTime.add(1, 'second').format('HH:mm:ss');
        }, 1000);
        $scope.$watch('$destroy', unsubscribe);
    });

When writing the templates and styles you should keep in mind that Kibana uses Twitter Bootstrap. Another note is that in the previous file mentioned there is also a chrome object, which you can ignore it will be deprecated in Kibana 5.0.0. It was used to control the navbar.

Result

When all went well this should be the result.
Kibana Plugin Example Screenshot

Useful links

Source Sense
Source Timelion
Kibana styleguide
The plugin I’m converting
More information about plugins


Just ‘tailing a file’, exploring new Elasticsearch’s Ingest

With the upcoming release op Elasticsearch 5.0, Elastic will introduce a nice new feature: Ingest

When Elastic first presented the Ingest API, their introduction to this subject was: “I just want to tail a file”. What they meant with that, is that right now with the current Elastic Stack you need quite a bit of setup in order to get logs into Elasticsearch. A possible setup could be the following:

lifecycle ingest

First, you have some logs sitting in your application server. Maybe some application logs, maybe some access logs. Let’s take access logs as an example for this blog post. In order to get those logs into Elasticsearch, we can ship those to a queue (for example Redis), using Beats. We then have Logstash pulling logs out of the queue so that we can process these raw logs and turn them into JSON documents. One of the great things about Logstash is that we can enrich our logs as well as making them ‘Elasticsearch-ready’. After processing our logs we can tell Logstash to output our processed logs to Elasticsearch and voila, we can now either search for our logs in Elasticsearch or visualise them using Kibana.

Now if we take a closer look at the above setup, we can see that there are quite some components needed in order to ‘just tail a file’. With the new Ingest feature, Elasticsearch has taken the ‘filter’ part of Logstash so that we can do our processing of raw logs and enrichment within Elasticsearch.
At the moment of writing this post, the Ingest feature has been merged into the master branch of Elasticsearch which means that we can already play around with it. In this blog post, I will show you some of the new nifty features of Elasticsearch Ingest.

To make use of the Ingest features, you need to define ‘pipelines’. Elastic defines pipelines as: “A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared. A pipeline consists of two main fields: a description and a list of processors”. The description speaks for itself, but processors are actually where all the magic happens. Elasticsearch comes with a list of processors, which Logstash users will recognise as ‘filters’ in Logstash. Let’s create a new pipeline:

PUT _ingest/pipeline/log-parsing
{
  "description": "Our first log-parsing pipeline",
  "processors": [
    {
      "grok": {
        "field": "message",
        "pattern": "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes}"
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

Now on the top, we defined our pipeline to be named ‘log-parsing’. Now during indexing, we can refer to this pipeline so that Elasticsearch will try and process the document according to our defined processors. Ingest comes with a nice ‘_simulate’ endpoint so that we can see what would happen to our documents if we’re to ingest a document through our newly made pipeline. This makes testing your pipeline a lot easier:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "Our first log-parsing pipeline",
    "processors": [
      {
        "grok": {
          "field": "message",
          "pattern": "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes}"
        }
      },
      {
        "remove": {
          "field": "message"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "212.78.169.6 GET /index.html 26224"
      }
    }
  ]
}

Within the doc’s part of the JSON request, we can define our documents that we want to test against the pipeline. Our document contains one field with a fictional log line. If we run this query, the simulate api will show us the data transformation:

{
  "docs": [
    {
      "doc": {
        "_id": "_id",
        "_index": "_index",
        "_type": "_type",
        "_source": {
          "request": "/index.html",
          "method": "GET",
          "bytes": "26224",
          "client": "212.78.169.6"
        },
        "_ingest": {
          "timestamp": "2016-04-01T13:37:11.716+0000"
        }
      }
    }
  ]
}

As you can see our grok pattern matched and Ingest was able to process the log line and split up our message field into separate fields as defined in our grok pattern. Since we’re not interested in the message field anymore, it has been thrown out by the ‘remove’ processor. Next to having our fields available in ‘_source’, Ingest also adds an extra Ingest Metadata block to our documents. This metadata is also accessible in a processor.

As mentioned before, Elasticsearch comes with a list of processors which you use. These are the currently added processors:

Append Convert Date Fail 
ForEach Gsub Join Lowercase
Remove Rename Set Split
Trim Uppercase

In our first example, we added a pipeline to Elasticsearch. Now instead of using the ‘_simulate’ API, let’s index some documents using our pipeline!

POST logs/log/1?pipeline=log-parsing
{
  "message": "212.78.169.6 GET /index.html 26224"
}

Now if there were no errors during processing, we can retrieve our document and check it’s contents.

GET logs/log/1

Which results in:

{
  "_index": "logs",
  "_type": "log",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "request": "/index.html",
    "method": "GET",
    "bytes": "26224",
    "client": "212.78.169.6"
  }
}

It worked!

Next to creating pipeline’s through JSON, Elasticsearch has also extended Kibana with the possibility to chain processors and create your own pipeline’s through Kibana’s interface!

kibana-ingest

At the moment of writing this post, not all processors have been added to Kibana yet, so we’ll save that one for when all processors have been added to Kibana.

These are just the basics of the new Ingest feature. In my next blog post, we will combine Ingest with the new Reindex API, use Kibana to setup Ingest and we will also explore more advanced Ingest features.

If you’re excited about Ingest, you can read more on Elasticsearch’s GitHub page here.

Stay tuned!

Devcon_logo_web
If you are interested in Ingest you can also come to our conference where Luca Cavanna will do a deep dive into Ingest:
Ingest node reindexing and enriching documents within elasticsearch
Contacts us if you are interested in attending.


Infrastructure metrics with Elasticsearch stack

For the operations team of any IT organisation it’s of utmost importance to have an overview of it’s infrastructure at any given point of time, whether it’s the response time of various customer facing systems or memory consumption stats of the processes running across servers. Traditionally, these stats were explored after some problem has occurred which can be anything from slow response time to certain processes taking over the CPU cycles. Thus it’s better to have a real-time insight into the infrastructure stats which enables DevOps team to quickly find and fix the cause rather than waiting for the problem to occur. In this blog post we would be exploring the Elasticsearch product stack to propose a solution for effective monitoring of the infrastructure using Elasticsearch, Kibana and Beats.
(more…)


Video’s Devcon Luminis are available

It has been almost three weeks a go, Devcon Luminis. Initially it was exciting to think about the amount of people we could reach to come to our conference. In the end it was more exciting to find out if all the people who came, fitted into the room. Around 400 people joined us for the conference. A big group of people who came to listen to Luminis employees to learn about new technologies. After the event an evaluation was performed and we learned from the evaluation that most of the attendees were really satisfied with the conference. The quality of most of the presentations was good.

All presentations were taped and are now available through the devcon website as well as our Youtube channel. So if you want to learn about microservices, modular java applications, Docker or the ELK stack feel free to watch the video’s.

Our Fellow Jettro also did a presentation. He talked about real-time data analysis using the ELK stack: Logstash for importing, Elasticsearch for storing and Kibana for showing the data. If you are interested in the ELK stack and in visualising your data, be sure to checkout the presentation.


Elasticsearch on the web week 10

Every week there are a lot of posts on the web about elasticsearch and the other elasticsearch supplied products like Logstash and Kibana. I read a lot of these posts and some of them are interesting enough to share with you. In this weeks overview there are posts about Kibana, aggregations and sizing your servers/shards. Also a few older ones that were still on my to read list.

(more…)


Finding your blog abusers using Kibana 4 and logstash 1.5

A lot of us have a blog, just one like the one you are reading right now. I believe around 60% of the web is running wordpress. Nice to know if you are a hacker trying to break in to websites. Not so good if you are amongst those 60%. There are a lot of plugins available to help you secure your wordpress installation. Still it makes you curious which people are trying to get access to your blog. In this blog post I am using logstash to parse your access logs. Than I use Kibana 4 to answer the following questions:

  • Which urls are called that smell like abuse?
  • Who is abusing my blog?

Beware, this post is meant to show some capabilities of Logstash and Kibana. This is not the most rigorous post to help you secure your wordpress blog.

(more…)


Elasticsearch on the web week 8

This is the first overview post around what I found on the world wide web this week related to elasticsearch. This is not a generated post, but a post describing the most interesting articles I read this week. Some of them are older than this week, but they did help me this week with something. From now on I am regularly going to write these overview posts. If you want to stay up te date feel free to follow us on twitter. In this post news about Kibana 4, ELK and Docker, update to the shield plugin and the latest and greatest version of elasticsearch.

(more…)


Experiment with the Kibana 4 beta

A few weeks a go, the guys from elasticsearch made the public beta of Kibana 4 available. In this blog post I am going to generate some data and than play around with this data and the Kibana beta. If you have never heard about these tools, read the introduction, if you do know it, you can safely skip the introduction.

(more…)