Towards an Open Format for Scalable System Telemetry

01/25/2021
by   Teryl Taylor, et al.
0

A data representation for system behavior telemetry for scalable big data security analytics is presented, affording telemetry consumers comprehensive visibility into workloads at reduced storage and processing overheads. The new abstraction, SysFlow, is a compact open data format that lifts the representation of system activities into a flow-centric, object-relational mapping that records how applications interact with their environment, relating processes to file accesses, network activities, and runtime information. The telemetry format supports single-event and volumetric flow representations of process control flows, file interactions, and network communications. Evaluation on enterprise-grade benchmarks shows that SysFlow facilitates deeper introspection into attack kill chains while yielding traces orders of magnitude smaller than current state-of-the-art system telemetry approaches – drastically reducing storage requirements and enabling feature-filled system analytics, process-level provenance tracking, and long-term data archival for cyber threat discovery and forensic analysis on historical data.

READ FULL TEXT
research
02/11/2022

A Scalable Database for the Storage of Object-Centric Event Logs

Object-centric process mining provides a set of techniques for the analy...
research
04/29/2020

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open...
research
09/05/2022

Spatial Parquet: A Column File Format for Geospatial Data Lakes [Extended Version]

Modern data analytics applications prefer to use column-storage formats ...
research
06/11/2018

A Cost-based Storage Format Selector for Materialization in Big Data Frameworks

Modern big data frameworks (such as Hadoop and Spark) allow multiple use...
research
02/02/2018

Scalable Architecture for Personalized Healthcare Service Recommendation using Big Data Lake

The personalized health care service utilizes the relational patient dat...
research
06/30/2020

Lachesis: Automated Generation of Persistent Partitionings for UDF-Centric Analytics

Persistent partitioning is effective in avoiding expensive shuffling ope...
research
03/31/2020

The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle

The WARC file format is widely used by web archives to preserve collecte...

Please sign up or login with your details

Forgot password? Click here to reset