Learning Graphical Models from a Distributed Stream

10/05/2017
by   Yu Zhang, et al.
0

A current challenge for data management systems is to support the construction and maintenance of machine learning models over data that is large, multi-dimensional, and evolving. While systems that could support these tasks are emerging, the need to scale to distributed, streaming data requires new models and algorithms. In this setting, as well as computational scalability and model accuracy, we also need to minimize the amount of communication between distributed processors, which is the chief component of latency. We study Bayesian networks, the workhorse of graphical models, and present a communication-efficient method for continuously learning and maintaining a Bayesian network model over data that is arriving as a distributed stream partitioned across multiple processors. We show a strategy for maintaining model parameters that leads to an exponential reduction in communication when compared with baseline approaches to maintain the exact MLE (maximum likelihood estimation). Meanwhile, our strategy provides similar prediction errors for the target distribution and for classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2016

Communication-efficient Distributed Estimation and Inference for Transelliptical Graphical Models

We propose communication-efficient distributed estimation and inference ...
research
01/30/2023

Structure Learning and Parameter Estimation for Graphical Models via Penalized Maximum Likelihood Methods

Probabilistic graphical models (PGMs) provide a compact and flexible fra...
research
02/10/2019

A Bayesian Approach to Joint Estimation of Multiple Graphical Models

The problem of joint estimation of multiple graphical models from high d...
research
03/11/2017

Learning Large-Scale Bayesian Networks with the sparsebn Package

Learning graphical models from data is an important problem with wide ap...
research
12/21/2018

A new approach to learning in Dynamic Bayesian Networks (DBNs)

In this paper, we revisit the parameter learning problem, namely the est...
research
04/27/2016

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

In this paper, we discuss software design issues related to the developm...
research
03/24/2023

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

As deep learning models and input data are scaling at an unprecedented r...

Please sign up or login with your details

Forgot password? Click here to reset