Introduction to ELK

For the Summer of 2018, I took an internship that allowed me to continue learning hands-on in the field of Digital Forensics while also exploring the work of a Security Operations Center (SOC) Analyst. For those who are unaware what this current position entails, as I was, CIS Security defines this position as someone who helps “coordinate and report on cyber incidents impacting State, Local, Tribal and Territorial (SLTT) governments,” or a position in which you monitor client endpoint logs for any suspicious or malicious activity. An analyst would take machine data and analyze it for real-time insight into client networks and infrastructure.

The organization that I was working for during the summer recently started taking clients and was testing their ability to perform these SOC services for the public, which is why they brought myself and three other students on-board. To perform these tasks, we needed to implement and use “data shippers” to take logs we wished to ingest from client machines and  “ship” them to our own infrastructure to monitor and assess. The industry standard, Splunk, was an effective but costly method of handling this task. However, there was another system of ingesting and filtering logs that was starting to become more and more popular; the Elastic Stack, otherwise known as ELK.

What is ELK?

image21-1024x328

Implementing an ELK stack allows an organization to take data from any endpoint, in any format, and catalog, search, and visualize the data in real-time. If this implementation is done using one’s own resources, it is entirely free. There are also paid services offered by Elastic which utilize their resources and pre-designed stacks. The service also supports the implementation of SSL, which allows you to securely and reliably transfer data between endpoints and your created stack. ELK is often referred to as a “stack” due to the architecture of the service, and gets its name from the services associated with its function; Elasticsearch, Logstash, and Kibana. Elastic’s website defines these services as such:

Elasticsearch – As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Logstash – An open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.”

Kibana – Lets you visualize your Elasticsearch data and navigate the Elastic Stack

In short, you would install one or a combination of Elastic’s Beats Family on your endpoints, which would then work to ship all information they gather to your Logstash node. Here, all information is then “filtered” through YML configuration files found on the Logstash node which can be tailored to your specific needs. From here, the essential information that has been filtered and indexed into indices is then sent to your Elasticsearch node(s). The nodes in this array then share information with each other in the form of “shards,” constantly shifting your indices around to prevent data loss and balance the load. You can have as many of these nodes as you want, really depending on the volume of data you are ingesting and how redundant you want your data to be. From here, your Kibana node then queries the Elasticsearch array for information that you are requesting. Using this query, a user can use Kibana to peruse the logs, create visualizations to show data over time, and create dashboards to show multiple visualizations tailored to their specific needs.

Why Should You Care?

Splunk has often been the “go-to” service for log analysis and ingestion, and from what I have seen has completely dominated the industry. They have the publicity, the power, and the experience to be a serious competitor in this market for a long time, as well as an outstanding service provider. Before taking this job, I had never even heard of the Elastic Stack before, let alone the other services involved. When it came to command line in Ubuntu, I was an absolute potato, getting lost just on my way to the /etc directory. However, using ELK was one of the easiest things I have ever learned from scratch, mostly due to the immense documentation on their official website. My experience with these services over the summer has thus helped develop my ability to use command line, understanding how it works and visualizing the directories in my mind as I navigate through them for configuration files and service installation. On top of the personal benefits, ELK can be free, which can have incredible financial benefits for any organization currently paying to use Splunk. Organizations can have full control of the resources that go into each node, and customize the filtering to their specific needs. If you have the resources, the time, and the patience to try out and implement ELK into your log ingestion system, I would highly recommend giving it a shot.

TL;DR

The Elastic Stack (ELK) is an incredible and highly customizable service for ingesting data from endpoints and changing them into real-time visualizations, and can be seen as a perfect alternative to the paid service known as Splunk. Elasticsearch, Logstash, and Kibana work seamlessly together to create an easy-to-learn experience, that can result in a lot of different benefits for its users. I am hoping to go into further detail about how to create an ELK stack in a later post.

Hopefully this has provided a greater insight into the options one can have when it comes to log ingestion. If you have any questions/additional commentary, please feel free to comment below or contact me!

Leave a comment