Apache Metron Proposal


(warning) FINAL (warning)

This proposal is now complete and has been submitted for a VOTE.


Abstract

The Metron project is an open source project dedicated to providing an extensible and scalable advanced security analytics tool. It has strong foundations in the Apache Hadoop ecosystem.

Proposal

Metron integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis. Metron provides capabilities for log aggregation, full packet capture indexing, storage, advanced behavioral analytics and data enrichment, while applying the most current threat-intelligence information to security telemetry within a single platform.

Metron can be divided into 4 areas:

  1. A mechanism to capture, store, and normalize any type of security telemetry at extremely high rates. Because security telemetry is constantly being generated, it requires a method for ingesting the data at high speeds and pushing it to various processing units for advanced computation and analytics.
  2. Real time processing and application of enrichments such as threat intelligence, geolocation, and DNS information to telemetry being collected. The immediate application of this information to incoming telemetry provides the context and situational awareness, as well as the “who” and “where” information that is critical for investigation.
  3. Efficient information storage based on how the information will be used:
    1. Logs and telemetry are stored such that they can be efficiently mined and analyzed for concise security visibility
    2. The ability to extract and reconstruct full packets helps an analyst answer questions such as who the true attacker was, what data was leaked, and where that data was sent
    3. Long-term storage not only increases visibility over time, but also enables advanced analytics such as machine learning techniques to be used to create models on the information. Incoming data can then be scored against these stored models for advanced anomaly detection.
  4. An interface that gives a security investigator a centralized view of data and alerts passed through the system. Metron’s interface presents alert summaries with threat intelligence and enrichment data specific to that alert on one single page. Furthermore, advanced search capabilities and full packet extraction tools are presented to the analyst for investigation without the need to pivot into additional tools.

Big data is a natural fit for powerful security analytics. The Metron framework integrates a number of elements from the Hadoop ecosystem to provide a scalable platform for security analytics, incorporating such functionality as full-packet capture, stream processing, batch processing, real-time search, and telemetry aggregation. With Metron, our goal is to tie big data into security analytics and drive towards an extensible centralized platform to effectively enable rapid detection and rapid response for advanced security threats.

Background

OpenSOC was developed by Cisco over the last two years and pushed out to Github (https://github.com/OpenSOC/opensoc) under the ALv2. However, the development was mostly closed and has largely stopped. As evidence of the inactivity, users have complained that pull requests are not answered for a while https://groups.google.com/d/msg/opensoc-support/R2W-ZFux8Vk/Y-5tL-EmAAAJ. Finally, no public releases of OpenSOC have been made. From an Apache point of view, the current community is not viable.

However, some of the developers of the project have left Cisco and have found interest from several others that would like to work together to form an active and open community at Apache starting from the current OpenSOC code base. A message to the current support group proposing moving to Apache got a single positive response. https://groups.google.com/d/msg/opensoc-support/rFlW2uSSvmU/09PIsWL4AAAJ

In general Apache accepts only voluntary contributions and avoids hostile forks. In this case, given that the community is demonstrably dead, it seems fair to fork the existing code at Apache to allow a new community to work on it. Once incubation starts, we will send a message pointing to the new home to the OpenSOC support group.

Because Cisco is not currently interested in being involved, the project expects to change their name. The project would like to use Metron, although we will perform a podling name search to check for conflicts. Metron, meaning measure, is half of the greek root for the word 'telemetry.' Metron is also a DC Comics character who “... wanders in search of greater knowledge beyond his own”.

Rationale

Metron strives to move the state of the art in security analytics forward. We want to move away from the proprietary nature of legacy security point tools and develop an open platform where people can contribute and share datasets, machine learning models, telemetry parsers, sources of telemetry enrichment, and threat intelligence feeds. Cyber security is too large of a problem for a single corporation to tackle on its own and the current tooling is too fragmented and proprietary for us to be able to rally around a single tool or vendor.

In addition to being open and facilitating advancement in security analytics, Metron has several advantages over a conventional Security Information Management System (SIEM).

  • Metron uses all open source stack under the hood and runs on commodity hardware. This means Metron is much cheaper to run then the competition. In security cost plays a major factor because the cost of your countermeasure for monitoring and reacting to a threat should not exceed the cost of what is being protected. By driving down the cost of security the economics works for more assets to be monitored, which means more secure data centers.
  • Metron, being in the open, allows additional vetting and scrutiny by the open source community for all of its components. This is a better model for a security-oriented tool than doing it closed source. All the problems should be flushed out and fixed in the open. The closed source competition does not have this kind of rigor, is motivated by marketing and sales, and thus, does not inspire confidence when it comes to security.
  • Being Hadoop-based, Metron can process unprecedented volumes of streaming data via Apache Storm. When an organization is hit with malware or malicious behavior most commonly this happens as a part of a global malware campaign, signatures for which are known and are available from third party threat intelligence feeds. Having the ability to take in all the feeds and reference them against every telemetry message processed by Metron in real time does not only facilitate detection of such campaigns, it changes the economics for the “bad guys”. If you have to customize your malware for each of your targets these global attacks become a lot more expensive and non viable for them.
  • Metron strives to shift conventional SOC workflows away from being rules-driven to a more data-driven approach that incorporates machine learning and a higher degree of automation and autonomous detection. The modern threat landscape is too dynamic to be manageable via static rules alone, which is what conventional SIEMs rely on. Rule bases tend to bloat, and if improperly maintained turn themselves into sources of false positive alerts.

The ability to analyze and model large volumes of data at rest and then being able to push up the output of that into a stream processor is essential in disrupting the

Current Status

As stated in the background section, the current community isn’t healthy, which is why we are proposing moving to Apache Incubator. In this section, we will describe the current state of the OpenSOC project.

Meritocracy

The OpenSOC development is controlled by Cisco and pull requests are being ignored. The development list is private and requests to join are rejected because there is no activity on it. The goal of moving to Apache is to form a meritocracy where a variety of individuals, regardless of their current employer, come together and work together. We understand that diversity, open development, and open governance are critical to being a successful Apache project.

Community

The OpenSOC project is not responding to pull requests or making releases. The easiest solution would be to create a variety of forks of the project on github, but that would further fracture the community and prevent it from reaching critical mass. Our prefered solution is to build a single large diverse and open community at Apache.

Core Developers

The core developers of Metron are James Sirota, Charles Porter, and Mark Bittmann. None of them have experience running an open source project, but they are eager to learn.

Alignment

The ASF is a natural host for Metron given that it is already the home of Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data projects. Metron leverages many of Apache open-source products. We are very interested in a place to develop our community and integrations with the other Apache big data projects.

Known Risks

Orphaned Products

The current product developers are all salaried developers at a small number of companies and thus there is a risk of becoming an orphaned product. However, the companies view Metron as very important to their product offering and plan to ramp up their work in the space. The project is unique in the product space and thus has strong potential to become a sustainable community.

Inexperience with Open Source

The vast majority of the developers are inexperienced with open source development and the Apache Way. One of the major hurdles to graduation from the Apache Incubator will be demonstrating that they have learned the Apache Way and are applying it to how the project is managed. Vinod Kumar Vavilapalli is an Apache Member and plans on actively working as a committer in the project. They also have the other mentors to help them learn as they progress.

Homogenous Developers

The developers are employed by four diverse companies (B23, Hortonworks, Mantech, and Rackspace), They are distributed across the United States. We hope to attract additional diversity as an Apache project.

Reliance on Salaried Developers

Metron is currently being developed exclusively by salaried developers, but the goal of coming to Apache is to form a community of users and developers that is much more diverse including non-salaried developers.

Relationships with Other Apache Products

Metron has a strong relationship and dependency with Apache Flume, Hadoop, HBase, Hive, Kafka, Spark, and Storm. Being part of Apache’s Incubation community could help with a closer collaboration among these projects and as well as others.

We note that although there is a superficial resemblance to Apache Eagle, which does security analysis of Hadoop audit events, the projects are significantly different. In particular, Metron is focused on analyzing network packet traffic and thus has a very different scope and scale of events than Eagle.

An Excessive Fascination with the Apache Brand

While the Apache brand is important, we are much more interested in finding a home for the project that encourages open development and open governance. We want to form the new community using the Apache Way with its strong focus on meritocracy, organizational independence, and open development.

Documentation

The current information on the OpenSOC project is here: http://opensoc.github.io/
A slide deck presenting background material is here: http://www.slideshare.net/JamesSirota/cisco-opensoc

Initial Source

The initial code is on github: http://opensoc.github.io/

External Dependencies

Metron has the following external dependencies:

  • Apache Flume
  • Apache Hadoop
  • Apache HBase
  • Apache Hive
  • Apache Kafka
  • Apache Spark
  • Apache Storm
  • ElasticSearch
  • MySQL

The project understands that it will need to support alternatives for MySQL that are licensed under a ALv2 compatible license.

Cryptography

Metron will eventually support encryption on the wire, but this is not one of the initial goals, and we do not expect Metron to be a controlled export item due to the use of encryption. Metron supports but does not require the Kerberos authentication mechanism to access secured Hadoop services.

Required Resources

Mailing List

  • metron-private for private PMC discussions
  • metron-dev for developers
  • metron-commits for all commits
  • metron-users for all users

Version Control

Git is the preferred source control system.

Issue Tracking

  • JIRA (METRON)

Other Resources

The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.

Initial Committers

  • Jim Baker < jim.baker at rackspace dot com >
  • Mark Bittmann < mark at b23 dot io >
  • Sheetal Dolas < sheetal at hortonworks dot com >
  • Discovery Gerdes < discovery.gerdes at rackspace dot com >
  • P. Taylor Goetz < ptgoetz at apache dot org >
  • Andrew Hartnett < andrew.hartnett at rackspace dot com >
  • Dave Hirko < dave at b23 dot io >
  • Paul Kehrer < paul.kehrer at rackspace dot com >
  • Brad Kolarov < brad at b23 dot io >
  • Kiran Komaravolu <kkomaravolu at hortonworks dot com >
  • Larry McCay < lmccay at appache.org >
  • Ryan Merriman < rmerriman at hortonworks dot com >
  • Michael Perez < michael.perez at hortonworks dot com>
  • Charles Porter < Charles.Porter at mcs dot mantech dot com >
  • Phillip Rhodes < motley.crue.fan at gmail dot com >
  • Sean Schulte < sean.schulte at rackspace dot com >
  • James Sirota < jsirota at hortonworks dot com >
  • Casey Stella < cstella at hortonworks dot com >
  • Bryan Taylor < bryan.taylor at rackspace dot com >
  • Ray Urciuoli < Ray.Urciuoli at mcs dot mantech dot com >
  • Vinod Kumar Vavilapalli < vinodkv at apache dot org >
  • George Vetticaden < gvetticaden at hortonworks dot com >
  • Oskar Zabik < oskar.zabik at rackspace dot com >

Affiliations

The initial committers are employees of:

  • Jim Baker - Rackspace
  • Mark Bittmann - B23
  • Sheetal Dolas - Hortonworks
  • Discovery Gerdes - Rackspace
  • P. Taylor Goetz - Hortonworks
  • Andrew Hartnett - Rackspace
  • Dave Hirko - B23
  • Paul Kehrer - Rackspace
  • Brad Kolarov - B23
  • Kiran Komaravolu - Hortonworks
  • Larry McCay - Hortonworks
  • Ryan Merriman - Hortonworks
  • Michael Perez - Hortonworks
  • Charles Porter - Mantech
  • Phillip Rhodes - Fogbeam Labs
  • Sean Schulte - Rackspace
  • James Sirota - Hortonworks
  • Casey Stella - Hortonworks
  • Bryan Taylor - Rackspace
  • Ray Urciuoli - Mantech
  • Vinod Kumar Vavilapalli - Hortonworks
  • George Vetticaden - Hortonworks
  • Oskar Zabik - Rackspace

Sponsors

Champion

  • Owen O’Malley - Apache IPMC member

Nominated Mentors

  • P. Taylor Goetz < ptgoetz at apache dot org > - Apache IPMC member, Hortonworks
  • Chris Mattmann < mattmann at apache dot org > - Apache IPMC member, NASA
  • Owen O’Malley < omalley at apache dot org > - Apache IPMC member, Hortonworks
  • Billie Rinaldi < billie at apache dot org > - Apache IPMC member, Hortonworks
  • Vinod Kumar Vavilapalli < vinodkv at apache dot org > - Apache IPMC member, Hortonworks

Sponsoring Entity

We are requesting the Incubator to sponsor this project.

Addendum

After the vote on the proposal started, Debo Dutta (dedutta at cisco dot com) in the office of the Cloud CTO at Cisco has commented that his team at Cisco is very interested in joining the Metron community at Apache.

  • No labels