Log In Sign Up

Near-Optimal Data Source Selection for Bayesian Learning

by   Lintao Ye, et al.

We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. We provide insights into the performance guarantees of the greedy algorithms by analyzing special classes of the problem. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.


page 1

page 2

page 3

page 4


Scalable Greedy Feature Selection via Weak Submodularity

Greedy algorithms are widely used for problems in machine learning such ...

Distributed Submodular Maximization with Parallel Execution

The submodular maximization problem is widely applicable in many enginee...

Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection

We study the problem of selecting a subset of k random variables from a ...

apricot: Submodular selection for data summarization in Python

We present apricot, an open source Python package for selecting represen...

FedQPL: A Language for Logical Query Plans over Heterogeneous Federations of RDF Data Sources (Extended Version)

Federations of RDF data sources provide great potential when queried for...

Near-optimal irrevocable sample selection for periodic data streams with applications to marine robotics

We consider the task of monitoring spatiotemporal phenomena in real-time...

Improving Screening Processes via Calibrated Subset Selection

Many selection processes such as finding patients qualifying for a medic...