Graphical Model Sketch

02/09/2016
by   Branislav Kveton, et al.
0

Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by "sketches", which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2017

HyperMinHash: Jaccard index sketching in LogLog space

In this extended abstract, we describe and analyse a streaming probabili...
research
10/16/2012

Fast Exact Inference for Recursive Cardinality Models

Cardinality potentials are a generally useful class of high order potent...
research
05/24/2020

HyperLogLog Sketch Acceleration on FPGA

Data sketches are a set of widely used approximated data summarizing tec...
research
08/20/2020

Simple and Efficient Cardinality Estimation in Data Streams

We study sketching schemes for the cardinality estimation problem in dat...
research
03/06/2018

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries

Interactive analytics increasingly involves querying for quantiles over ...
research
10/23/2017

HyperMinHash: MinHash in LogLog space

In this extended abstract, we describe and analyse a streaming probabili...
research
03/28/2022

A Formal Analysis of the Count-Min Sketch with Conservative Updates

Count-Min Sketch with Conservative Updates (CMS-CU) is a popular algorit...

Please sign up or login with your details

Forgot password? Click here to reset