Reducing the Long Tail Losses in Scientific Emulations with Active Learning

11/15/2021
by   Yi Heng Lim, et al.
0

Deep-learning-based models are increasingly used to emulate scientific simulations to accelerate scientific research. However, accurate, supervised deep learning models require huge amount of labelled data, and that often becomes the bottleneck in employing neural networks. In this work, we leveraged an active learning approach called core-set selection to actively select data, per a pre-defined budget, to be labelled for training. To further improve the model performance and reduce the training costs, we also warm started the training using a shrink-and-perturb trick. We tested on two case studies in different fields, namely galaxy halo occupation distribution modelling in astrophysics and x-ray emission spectroscopy in plasma physics, and the results are promising: we achieved competitive overall performance compared to using a random sampling baseline, and more importantly, successfully reduced the larger absolute losses, i.e. the long tail in the loss distribution, at virtually no overhead costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

Deep Active Learning for Scientific Computing in the Wild

Deep learning (DL) is revolutionizing the scientific computing community...
research
06/26/2019

Selection Via Proxy: Efficient Data Selection For Deep Learning

Data selection methods such as active learning and core-set selection ar...
research
07/06/2023

Active Learning with Contrastive Pre-training for Facial Expression Recognition

Deep learning has played a significant role in the success of facial exp...
research
09/07/2019

Active learning to optimise time-expensive algorithm selection

Hard optimisation problems such as Boolean Satisfiability typically have...
research
03/11/2022

Can I see an Example? Active Learning the Long Tail of Attributes and Relations

There has been significant progress in creating machine learning models ...
research
02/25/2019

Interpreting Active Learning Methods Through Information Losses

We propose a new way of interpreting active learning methods by analyzin...
research
08/07/2023

Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization

Supervised deep learning models require significant amount of labelled d...

Please sign up or login with your details

Forgot password? Click here to reset