Finding Heavily-Weighted Features in Data Streams

11/07/2017
by   Kai Sheng Tai, et al.
0

We introduce a new sub-linear space data structure---the Weight-Median Sketch---that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracy-memory trade-offs over alternatives, including count-based sketches and feature hashing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2019

Exponential Separations Between Turnstile Streaming and Linear Sketching

Almost every known turnstile streaming algorithm is implementable as a l...
research
12/16/2019

Trapezoidal Sketch: A Sketch Structure for Frequency Estimation of Data Streams

The sketch is one of the typical and widely used data structures for est...
research
07/04/2022

Learning state machines via efficient hashing of future traces

State machines are popular models to model and visualize discrete system...
research
11/15/2018

Sketch based Reduced Memory Hough Transform

This paper proposes using sketch algorithms to represent the votes in Ho...
research
06/29/2019

Streaming Quantiles Algorithms with Small Space and Update Time

Approximating quantiles and distributions over streaming data has been s...
research
08/21/2018

Composite Hashing for Data Stream Sketches

In rapid and massive data streams, it is often not possible to estimate ...
research
04/18/2020

UDDSketch: Accurate Tracking of Quantiles in Data Streams

We present UDDSketch (Uniform DDSketch), a novel sketch for fast and acc...

Please sign up or login with your details

Forgot password? Click here to reset