Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO

11/05/2019
by   Pablo Morales-Alvarez, et al.
31

In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied to the data acquired by the laureate Laser Interferometer Gravitational Waves Observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it deals with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian Processes (GP), have proven successful in modeling this setting. However, GPs do not scale well to large data sets, which hampers their broad adoption in real practice (in particular at LIGO). This has led to the recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art. However, the accurate uncertainty quantification of GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we leverage the most popular sparse GP approximation to develop a novel GP based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. The approach, which we refer to as Scalable Variational Gaussian Processes for Crowdsourcing (SVGPCR), brings back GP-based methods to the state-of-the-art, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic approaches when applied to the LIGO data. Moreover, its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set.

READ FULL TEXT

page 1

page 7

page 8

page 14

page 15

research
02/10/2015

Distributed Gaussian Processes

To scale Gaussian processes (GPs) to large data sets we introduce the ro...
research
05/06/2022

Optimal recovery and uncertainty quantification for distributed Gaussian process regression

Gaussian Processes (GP) are widely used for probabilistic modeling and i...
research
04/10/2022

Gaussian Processes for Missing Value Imputation

Missing values are common in many real-life datasets. However, most of t...
research
05/26/2023

Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

For regression tasks, standard Gaussian processes (GPs) provide natural ...
research
06/03/2021

Uncertainty Quantification of a Computer Model for Binary Black Hole Formation

In this paper, a fast and parallelizable method based on Gaussian Proces...
research
09/18/2023

A Unifying Perspective on Non-Stationary Kernels for Deeper Gaussian Processes

The Gaussian process (GP) is a popular statistical technique for stochas...
research
02/08/2023

Probabilistic Attention based on Gaussian Processes for Deep Multiple Instance Learning

Multiple Instance Learning (MIL) is a weakly supervised learning paradig...

Please sign up or login with your details

Forgot password? Click here to reset