Real-time visualisation of Hadoop resources

At CERN we run multiple Hadoop clusters to satisfy demanding requirements from our experiments and accelerator communities. The usage and criticality of the clusters are increasing dramatically as more users are looking at Hadoop to process and archive the vast amounts of data coming out of LHC.

Sometimes, we as Hadoop administrators are faced with questions like 1) Do we have enough capacity (cpu & memory) to add new workloads/users 2) Why are some user applications getting killed? 3) When is preemption kicking in? and 4) Do we have sufficient resources to satisfy a particular user community’s SLA under peak conditions? Having this information available as in ‘viewing after the fact’ would greatly help in right-sizing the cluster, capacity planning, workload patterns and resource consumption.

Hadoop Metrics

YARN, the resource manager for Hadoop 2.0 has REST API’s that allow us to get information about the cluster such as status of the cluster, information about nodes in the cluster, metrics of the cluster, scheduler information and information about applications on the cluster. This information if we can collect over a continuous time interval would answer most of our questions and possibly provide more insights into usage of Hadoop clusters. Since we want this monitoring solution to be flexible, extensible and zero-configuration to the Hadoop cluster, we have decided to use logstash – to collect, parse and transform the metrics, elasticsearch – to store the metrics and kibana – for data visualisation.

The rest of the post explains how we built the solution and will enable you to implement the same if you are faced with similar challenges

Collect, Parse and Transform

We use logstash to collect, process and forward events. We collect the metrics from cluster, nodes, applications and scheduler API, the polling interval can be amended as per needs

logstash_input

The following transformation and enrichment is done on the events that come out of REST API

– split the JSON array
– remove the outer elements of JSON
– data type conversion
– data transformation (MB -> bytes) so that kibana can format the number as bytes

logstash_filter

Finally we forward the events to elasticsearch for indexing.

Store

The events are stored in elasticsearch and events from each endpoint is stored in separate index. Each index has mapping type to define format, datatype and string fields that should be not_analyzed. Following is an example of an index mapping


curl -XPUT myelastic:9200/apps-d/_mapping/apps -d '
{
"properties": {
"[apps][app][startedTime]": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"[apps][app][finishedTime]": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"[apps][app][queue]": {
"type": "string"
},
"[apps][app][amContainerLogs]": {
"type": "string",
"index": "not_analyzed"
},
"[apps][app][name]": {
"type": "string",
"index": "not_analyzed"
}
}
}'

Visualise

This is the interesting part where you can build visualisations on top of the data you index to elasticsearch. You can build your own or load the dashboards and visualisations from the github repository as below


curl -XPOST 'http://myelastic:9200/.kibana/visualization/MyFirstVisual' -d @filename

They look as below

kibana_visual4kibana_visual3

and you can identify some interesting stuff

kibana_visual5

All the code is available in the github repository. By changing the Hadoop namenode and elasticsearch endpoint you should be able to get this up and running at your organisation. If you are interested, please create pull request to contribute and collaborate on this work.

Summary

The metrics exposed through YARN and SPARK REST APIs can be successfully exploited to provide more information to Hadoop administrators and management to aid in capacity planning and right-sizing the Hadoop Cluster.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s