Active Cost-aware Labeling of Streaming Data

04/13/2023
by   Ting Cai, et al.
0

We study actively labeling streaming data, where an active learner is faced with a stream of data points and must carefully choose which of these points to label via an expensive experiment. Such problems frequently arise in applications such as healthcare and astronomy. We first study a setting when the data's inputs belong to one of K discrete distributions and formalize this problem via a loss that captures the labeling cost and the prediction error. When the labeling cost is B, our algorithm, which chooses to label a point if the uncertainty is larger than a time and cost dependent threshold, achieves a worst-case upper bound of O(B^1/3 K^1/3 T^2/3) on the loss after T rounds. We also provide a more nuanced upper bound which demonstrates that the algorithm can adapt to the arrival pattern, and achieves better performance when the arrival pattern is more favorable. We complement both upper bounds with matching lower bounds. We next study this problem when the inputs belong to a continuous domain and the output of the experiment is a smooth function with bounded RKHS norm. After T rounds in d dimensions, we show that the loss is bounded by O(B^1/d+3 T^d+2/d+3) in an RKHS with a squared exponential kernel and by O(B^1/2d+3 T^2d+2/2d+3) in an RKHS with a Matérn kernel. Our empirical evaluation demonstrates that our method outperforms other baselines in several synthetic experiments and two real experiments in medicine and astronomy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Optimal Space and Time for Streaming Pattern Matching

In this work, we study longest common substring, pattern matching, and w...
research
07/16/2020

Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data

Deep neural networks are known to be overconfident when applied to out-o...
research
06/29/2021

Exponential Weights Algorithms for Selective Learning

We study the selective learning problem introduced by Qiao and Valiant (...
research
05/07/2022

Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm

We study the sequential general online regression, known also as the seq...
research
10/25/2020

Even the Easiest(?) Graph Coloring Problem is not Easy in Streaming!

We study a graph coloring problem that is otherwise easy but becomes qui...
research
04/09/2018

Contextual Search via Intrinsic Volumes

We study the problem of contextual search, a multidimensional generaliza...

Please sign up or login with your details

Forgot password? Click here to reset