Tag Archive: Ingest



Just ‘tailing a file’, exploring new Elasticsearch’s Ingest

With the upcoming release op Elasticsearch 5.0, Elastic will introduce a nice new feature: Ingest

When Elastic first presented the Ingest API, their introduction to this subject was: “I just want to tail a file”. What they meant with that, is that right now with the current Elastic Stack you need quite a bit of setup in order to get logs into Elasticsearch. A possible setup could be the following:

lifecycle ingest

First, you have some logs sitting in your application server. Maybe some application logs, maybe some access logs. Let’s take access logs as an example for this blog post. In order to get those logs into Elasticsearch, we can ship those to a queue (for example Redis), using Beats. We then have Logstash pulling logs out of the queue so that we can process these raw logs and turn them into JSON documents. One of the great things about Logstash is that we can enrich our logs as well as making them ‘Elasticsearch-ready’. After processing our logs we can tell Logstash to output our processed logs to Elasticsearch and voila, we can now either search for our logs in Elasticsearch or visualise them using Kibana.

Now if we take a closer look at the above setup, we can see that there are quite some components needed in order to ‘just tail a file’. With the new Ingest feature, Elasticsearch has taken the ‘filter’ part of Logstash so that we can do our processing of raw logs and enrichment within Elasticsearch.
At the moment of writing this post, the Ingest feature has been merged into the master branch of Elasticsearch which means that we can already play around with it. In this blog post, I will show you some of the new nifty features of Elasticsearch Ingest.

To make use of the Ingest features, you need to define ‘pipelines’. Elastic defines pipelines as: “A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared. A pipeline consists of two main fields: a description and a list of processors”. The description speaks for itself, but processors are actually where all the magic happens. Elasticsearch comes with a list of processors, which Logstash users will recognise as ‘filters’ in Logstash. Let’s create a new pipeline:

PUT _ingest/pipeline/log-parsing
{
  "description": "Our first log-parsing pipeline",
  "processors": [
    {
      "grok": {
        "field": "message",
        "pattern": "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes}"
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

Now on the top, we defined our pipeline to be named ‘log-parsing’. Now during indexing, we can refer to this pipeline so that Elasticsearch will try and process the document according to our defined processors. Ingest comes with a nice ‘_simulate’ endpoint so that we can see what would happen to our documents if we’re to ingest a document through our newly made pipeline. This makes testing your pipeline a lot easier:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "Our first log-parsing pipeline",
    "processors": [
      {
        "grok": {
          "field": "message",
          "pattern": "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes}"
        }
      },
      {
        "remove": {
          "field": "message"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "212.78.169.6 GET /index.html 26224"
      }
    }
  ]
}

Within the doc’s part of the JSON request, we can define our documents that we want to test against the pipeline. Our document contains one field with a fictional log line. If we run this query, the simulate api will show us the data transformation:

{
  "docs": [
    {
      "doc": {
        "_id": "_id",
        "_index": "_index",
        "_type": "_type",
        "_source": {
          "request": "/index.html",
          "method": "GET",
          "bytes": "26224",
          "client": "212.78.169.6"
        },
        "_ingest": {
          "timestamp": "2016-04-01T13:37:11.716+0000"
        }
      }
    }
  ]
}

As you can see our grok pattern matched and Ingest was able to process the log line and split up our message field into separate fields as defined in our grok pattern. Since we’re not interested in the message field anymore, it has been thrown out by the ‘remove’ processor. Next to having our fields available in ‘_source’, Ingest also adds an extra Ingest Metadata block to our documents. This metadata is also accessible in a processor.

As mentioned before, Elasticsearch comes with a list of processors which you use. These are the currently added processors:

Append Convert Date Fail 
ForEach Gsub Join Lowercase
Remove Rename Set Split
Trim Uppercase

In our first example, we added a pipeline to Elasticsearch. Now instead of using the ‘_simulate’ API, let’s index some documents using our pipeline!

POST logs/log/1?pipeline=log-parsing
{
  "message": "212.78.169.6 GET /index.html 26224"
}

Now if there were no errors during processing, we can retrieve our document and check it’s contents.

GET logs/log/1

Which results in:

{
  "_index": "logs",
  "_type": "log",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "request": "/index.html",
    "method": "GET",
    "bytes": "26224",
    "client": "212.78.169.6"
  }
}

It worked!

Next to creating pipeline’s through JSON, Elasticsearch has also extended Kibana with the possibility to chain processors and create your own pipeline’s through Kibana’s interface!

kibana-ingest

At the moment of writing this post, not all processors have been added to Kibana yet, so we’ll save that one for when all processors have been added to Kibana.

These are just the basics of the new Ingest feature. In my next blog post, we will combine Ingest with the new Reindex API, use Kibana to setup Ingest and we will also explore more advanced Ingest features.

If you’re excited about Ingest, you can read more on Elasticsearch’s GitHub page here.

Stay tuned!

Devcon_logo_web
If you are interested in Ingest you can also come to our conference where Luca Cavanna will do a deep dive into Ingest:
Ingest node reindexing and enriching documents within elasticsearch
Contacts us if you are interested in attending.