Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

08/25/2018
by   Vijay Gadepally, et al.
0

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2016

Benchmarking SciDB Data Import on HPC Systems

SciDB is a scalable, computational database management system that uses ...
research
05/25/2021

FENXI: Deep-learning Traffic Analytics at the Edge

Live traffic analysis at the first aggregation point in the ISP network ...
research
04/24/2018

Automated Big Traffic Analytics for Cyber Security

Network traffic analytics technology is a cornerstone for cyber security...
research
02/08/2018

System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...
research
08/14/2016

Julia Implementation of the Dynamic Distributed Dimensional Data Model

Julia is a new language for writing data analysis programs that are easy...
research
08/31/2018

Scalable Manifold Learning for Big Data with Apache Spark

Non-linear spectral dimensionality reduction methods, such as Isomap, re...

Please sign up or login with your details

Forgot password? Click here to reset