All Machine Data is Security Data
by Dave Herrald, on Mar 6, 2013 4:02:30 PM
Over the last month I've had a really cool opportunity to do a deep dive using Splunk to deliver IT security intelligence. Splunk is much more than a security tool. In fact it's a platform for collecting, analyzing, and reporting on all types of machine generated data. Splunk has many use cases including IT operational intelligence, application management, web analytics, and even business intelligence just to name a few. Keeping Splunk's larger set of capabilities in mind, in this post I'll focus on why Splunk is uniquely suited to derive security intelligence.
So what makes Splunk such a great security tool? The answer lies in the fact that all machine data is security relevant (thus the title of this post.) This is common knowledge among security professionals and, as a result, centralized log management has long been established as a best practice, and compliance requirement. The problem has been that although security professionals know machine data is the source of record for what's actually happening in their IT systems, they've not had the capability to search, analyze, and alert on this data in a scalable and performant way. Enter Splunk. With it, security teams can not only store all machine generated data, but they can scale the system in a predictable way, search and alert in real time, correlate disparate events, and perform analytics even including predictive analysis.
Splunk offers capabilities not seen before in other log analysis products. Certainly syslog and a number of commercial log collection and security information and event management (SIEM) appliances have been available for many years. Unfortunately, these technologies often suffer one or more major shortcomings. First off, existing tools generally do not scale well, especially in comparison to the rate at which machine data is growing. Often the lack of scalability is due to a reliance on an underlying relational database. Scaling databases is complicated and licensing costs can drive up the overall price tag of the solution. Another problem with solutions built on relational databases is that they enforce a rigid schema requiring detailed knowledge of the data before it can be stored. This forces a decision early on about what data must be saved and what can be discarded. Even worse, some practitioners may decide to simply not store machine data from systems for which there is no “connector” which can leave organizations with a serious security blind spot. These shortcomings combine and result in a general underuse of the rich insight held in machine data. Indeed I have personally seen many organizations rely on traditional log management and SIEM solutions only to later learn that they failed to recognize clear signs of attack and indicators of compromise in their own logs.
Contrast this with Splunk which can index any machine generated data whether it's structured, semi-structured, or completely unstructured. The machine data is stored in its raw form in full fidelity without the need for an underlying relational database. Splunk applies schema on the fly at search time so it does not need to know anything about the data in order to index it and make it available for analysis. Another way to think about this is that with Splunk the requirement to know anything at all about the machine data is deferred until search time. In fact learning about the structure of machine data is actually much easier after it’s been indexed because Splunk’s powerful built-in searching and analytics features can be leveraged. As one learns about the machine data, useful fields can be extracted, searches can be refined and saved, real-time alerts can be configured, and all this can be made available to other members of the team. With Splunk there is no need to re-invent the wheel. If someone else has done this work already, anyone can leverage it by downloading and installing “apps” and “add-ons” created by Splunk, IT vendors, or any members of the Splunk community.
Before we get back to security intelligence, I want to say a few things about Splunk and big data. Big data is an umbrella term encompassing a large and growing number of tools, techniques, and products to help business, IT, and security keep up with and derive value from the incredible volumes of unstructured data being generated today. Splunk's architecture is firmly based on big data concepts like distributed storage, horizontal scaling using commodity hardware, and distributed search using the MapReduce paradigm. This allows organizations to scale their Splunk deployment to ensure they can continue to leverage machine data in real time and perform long-term searches with acceptable response times even as machine data volumes increase dramatically. Splunk users get the power if big data in a very easy to consume package. In a matter of thirty minutes, Splunk can be downloaded, installed, indexing data and performing useful searches-not something that can be said about many other big data tools.
So how does this all tie back to security intelligence? First, with full access to all the machine data the security analyst can begin exploring correlations between events. The possibilities are only limited by the creativity of the analyst. Perhaps she wants to correlate abnormal system logins with high traffic patterns between the system in question and a particular internet IP address. Perhaps a malware infection on a PC can be correlated to a recent visit to a URL with a questionable reputation in the web proxy. Once an incident has been declared we can quickly search for other systems displaying the same indicators of compromise. After eradication of an intruder we can monitor with real time for evidence of his or her return. The possibilities for increased security intelligence based on full visibility of machine data really are endless.
In my next post in this series I will talk about the Splunk App for Enterprise Security which turns a Splunk Enterprise deployment into a market-leading SIEM.