Recap of Elastic{ON} – 2

Elastic has released its 5.0 stack just a couple of months back. Starting with version 5.0, all the Elastic products versions are aligned – consistently releasing versions (major, minor, patch) at the same time – solving common occurring version uncertainty issues (ElasticSearch v2.x, Kibana v4.x, Beats v1.x, LogStash v2.x, XPack_). Whilst the current-stable version “was” 5.2.x, the Elastic{ON} conference was just on time to recap the major changes from previous versions of each of the Elastic products, and to get acquainted with expected features for the coming releases. Meanwhile, the current stable release is 5.3.0.

This year I had the opportunity to be one of Anchormen’s team members that travelled to San Francisco to attend the Elastic{ON} conference. Anchormen, being an established partner of Elastic, is always looking forward for new developments and innovation within the Elastic ecosystem. Directly after our return from the trip, Martin Schepers has provided a first recap on our visit highlighting the new SQL interface for ElasticSearch, X-pack additions as well as a use case from Blizzard – entertainment software. In this post, I will follow-up by providing simple pickups highlighting some additional innovations.

1- Streaming Capabilities

Elastic streaming capabilities demonstrated by directly connecting sensors to a ballerina dancing on stage

One of the catching features was the streaming capabilities of ElasticSearch x Kibana; this was demonstrated by having a ballerina dancing on stage, with sensors connected to various parts of her body, streaming the data with various measures into ElasticSearch and being able to visualize that in Kibana in real-time.

ElasticSearch can scale up to 1M writes per second, with data indexed being available for search after 1 second of index time. ElasticSearch also has a JSON data model and a flexible schema which makes it very suitable together with Kibana – for additional visualization – in various architectural models. At Anchormen, we’ve been using the combination of both in various deployments and architectures. In this post, my colleague Corne Versloot presents a high-level architecture to one of our Consumer 360 platforms, utilizing Kafka, Spark, ElasticSearch, Redis and Kibana. We’ve been working together on that platform since the beginning of 2016, and plans to extend the platform with the latest features from Elastic 5.x are already drafted.

2- ES-Hadoop – Improved streaming support

Whilst Hadoop allows processing on peta-byte scales of data for long periods, it lacks random reads and fast real-time aggregations. This makes ElasticSearch x Kibana, the perfect tool for performing fast real time analytics and visualizations. The es-hadoop connector is a two way connector to most of the Hadoop ecosystem components including MapReduce, Cascading, Pig, Hive, Storm and Spark. Through the connector, data can be indexed into ElasticSearch in batch or streaming fashion. Most of the features that has been highlighted during the conference talks, were existing with the 2.x versions of the connector, however, it was great to know how certain optimization details were considered and the new spark-streaming optimizations (Added in 5.0).

ES-hadoop with spark: Improved streaming support: Previous versions of the connector did not really provide native support for “DStreams”, instead one would use something like stream.foreachRDD(EsSpark.saveToEs(_, “index/type”)) this statement will be executed with each underlying DStream RDD, i.e. every x seconds, where x is the latency of your streaming application. On the executor level, each call will cause a connection to start then terminate (to ElasticSearch); too many of these connections can exhaust the OS connection resources. Starting v.5.0 es-hadoop includes first class support for Spark DSreams; with a call like stream.saveToEs(“index/type”) the connector takes care of automatically setting up connection pools and maintaining connections for up to 2 minutes. Further details on the connector and its usage with Spark, can be found in the official documentation.

3- BE@TS

My favourite jackpot from the conference, was getting the chance to consider Beats; how easy they are to use, extend or even create. Beats are a set of “data shippers” that are installed as “agents” on your servers to deliver different kinds of operational data to ElasticSearch. Different types of beats exist including FileBeats (shipping data/logs from running files), MetricBeats (shipping metrics from running services), … and many more, each concerned with different data aspects.

One of the catching additions specifically with these two Beats was the concept of modules, and how easy it is to get started by using any of these out-of-the box setups. For example, it is possible to monitor an NGINX server logs via file beats or visualize system logs in a very single command. On a fresh installation of ES, Kibana and NGINX servers, a command like filebeats -e -setup -modules=nginx will monitor NGINX logs, create syntax patterns, index the data to ElasticSearch, create searches, visualizations and dashboards on Kibana that are tailored specifically for NGINX; Changing the module to ex. -modules=system will allow a similar pipeline to be created out of the box for various system logs and files. The out of the box experience that is achieved just can’t get better.

At Anchormen, we have been using ElasticSearch x Kibana in one of our underdevelopment-platforms, on which various events are indexed in real-time into elastic-search and visualized on kibana. Other services are included in the pipeline like node.js, kafka and spark-streaming. In a matter of very few minutes, we’ve setup file and metric beats to experiment its various modules and currently we have out of the box monitoring and visualization to our nodes system logs and metrics (utilizing system modules in both file and metric beats) as well as various kafka metrics (utilizing the kafka module in metric beats). The following is a live screenshot to one of the visualizations for system metrics, CPU usage per process, clearly showing cpu-underutilization, while, java dominating the processes chart.

 

There are several modules that are supported by file beats and metric beats including system, NGINX, mysql, apache for the former, and Kafka, zookeeper, system for the latter. Each of These modules basically knows which kinds of statistics to expect from their corresponding targets, and what kinds of visualizations that are most suitable for such statistics and queried aggregations. Most of the mentioned features in this section are available as of the 5.3.0 release of Beats.

4- Use Cases

While all Elastic components are designed very well to integrate with each other, they are also designed to integrate with external systems (whenever possible). Beats inputs for example can be ingested to Kafka, and ElasticSearch has very well optimized connectors for Hadoop. Uber, Tinder and Blizzard has been speaking about their experiences with ElasticSearch and how they managed to solve their problems and reach additional optimizations by efficiently applying one or more Elastic components in their stack. Elastic is deployed very commonly in Lambda and real-time architectures, and has other common-use cases like Monitoring Hadoop. A very common use-case, on the opposing side, is to use HDFS as a back-up store for ElasticSearch. This can be achieved via ElasticSearch’s snapshot API.

5- Machine Learning

The last topic I’d like to cover is the machine learning features added to ElasticSearch. Elastic has acquired Prelert, integrating 7 years of work on behavioural analytics and anomaly detection. The new module, analyses log data residing in ElasticSearch, finds anomalies within the data, link them together and creates visualizations for the findings. The direction was indeed the extension of that module to include other machine learning features. With the Elastic 5.4 release, Machine Learning support will be provided natively; through the Kibana UI, it is possible to create a machine learning job (unit of work that performs some analysis), specify the job parameters and wait for the results and visualizations.

Together with my colleague Corne Versloot, we have been using ElasticSearch “purely” as a form of recommendation engine by utilizing Multi-term queries and their Scoring model; the resulting query scores where used as criteria for real time recommendations. I was glad to see another aspect of machine learning provided natively on top of ElasticSearch and a direction for yet more to come.

The Elastic{ON} conference this year was a great innovation and demonstrated the maturity, state and challenges with the Elastic products. All the video sessions are accessible via this link.

Leave a Reply

Your email address will not be published.