Scalable hybrid stream and hadoop network analysis system

Vernon K.C. Bumgardner, Victor W. Marek

Research output: Contribution to conferencePaperpeer-review

21 Scopus citations

Abstract

Collections of network traces have long been used in network traffic analysis. Flow analysis can be used in network anomaly discovery, intrusion detection and more generally, discovery of actionable events on the network. The data collected during processing may be also used for prediction and avoidance of traffic congestion, network capacity planning, and the development of software-defined networking rules. As network flow rates increase and new network technologies are introduced on existing hardware platforms, many organizations find themselves either technically or financially unable to generate, collect, and/or analyze network flow data. The continued rapid growth of network trace data, requires new methods of scalable data collection and analysis. We report on our deployment of a system designed and implemented at the University of Kentucky that supports analysis of network traffic across the enterprise. Our system addresses problems of scale in existing systems, by using distributed computing methodologies, and is based on a combination of stream and batch processing techniques. In addition to collection, stream processing using Storm is utilized to enrich the data stream with ephemeral environment data. Enriched stream-data is then used for event detection and near real-time flow analysis by an in-line complex event processor. Batch processing is performed by the Hadoop MapReduce framework, from data stored in HBase BigTable storage. In benchmarks on our 10 node cluster, using actual network data, we were able to stream process over 315k flows/sec. In batch analysis were we able to process over 2.6M flows/sec with a storage compression ratio of 6.7:1.

Original languageEnglish
Pages219-224
Number of pages6
DOIs
StatePublished - 2014
Event5th ACM/SPEC International Conference on Performance Engineering, ICPE 2014 - Dublin, Ireland
Duration: Mar 22 2014Mar 26 2014

Conference

Conference5th ACM/SPEC International Conference on Performance Engineering, ICPE 2014
Country/TerritoryIreland
CityDublin
Period3/22/143/26/14

Funding

FundersFunder number
National Science Foundation Arctic Social Science ProgramOCI-1246332

    Keywords

    • Complex event processing
    • Hadoop
    • NetFlow
    • SDN
    • Stream processing

    ASJC Scopus subject areas

    • Software

    Fingerprint

    Dive into the research topics of 'Scalable hybrid stream and hadoop network analysis system'. Together they form a unique fingerprint.

    Cite this