Learning state machines via efficient hashing of future traces

07/04/2022
by   Robert Baumgartner, et al.
0

State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when dealing with large datasets. In this paper we propose a method to learn state machines from data streams using the count-min-sketch data structure to reduce memory requirements. We apply state merging using the well-known red-blue-framework to reduce the search space. We implemented our approach in an established framework for learning state machines, and evaluated it on a well know dataset to provide experimental data, showing the effectiveness of our approach with respect to quality of the results and run-time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2023

Count-min sketch with variable number of hash functions: an experimental study

Conservative Count-Min, an improved version of Count-Min sketch [Cormode...
research
11/07/2017

Finding Heavily-Weighted Features in Data Streams

We introduce a new sub-linear space data structure---the Weight-Median S...
research
02/07/2021

A Bayesian nonparametric approach to count-min sketch under power-law data streams

The count-min sketch (CMS) is a randomized data structure that provides ...
research
01/30/2023

Streaming Anomaly Detection

Anomaly detection is critical for finding suspicious behavior in innumer...
research
10/28/2019

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

In the last decade, it has been shown that many hard AI tasks, especiall...
research
02/24/2021

SALSA: Self-Adjusting Lean Streaming Analytics

Counters are the fundamental building block of many data sketching schem...
research
02/28/2018

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

In marketing we are often confronted with a continuous stream of respons...

Please sign up or login with your details

Forgot password? Click here to reset