Reinforced Data Sampling for Model Diversificatio

06/24/2020
by   Harry Nguyen, et al.
0

With the rising number of machine learning competitions, the world has witnessed an exciting race for the best algorithms. However, the involved data selection process may fundamentally suffer from evidence ambiguity and concept drift issues, thereby possibly leading to deleterious effects on the performance of various models. This paper proposes a new Reinforced Data Sampling (RDS) method to learn how to sample data adequately on the search for useful models and insights. We formulate the optimisation problem of model diversification δ−div in data sampling to maximise learning potentials and optimum allocation by injecting model diversity. This work advocates the employment of diverse base learners as value functions such as neural networks, decision trees, or logistic regressions to reinforce the selection process of data subsets with multi-modal belief. We introduce different ensemble reward mechanisms, including soft voting and stochastic choice to approximate optimal sampling policy. The evaluation conducted on four datasets evidently highlights the benefits of using RDS method over traditional sampling approaches. Our experimental results suggest that the trainable sampling for model diversification is useful for competition organisers, researchers, or even starters to pursue full potentials of various machine learning tasks such as classification and regression. The source code is available at https://github.com/probeu/RDS.

READ FULL TEXT
research
06/12/2020

Reinforced Data Sampling for Model Diversification

With the rising number of machine learning competitions, the world has w...
research
07/06/2018

The CodRep Machine Learning on Source Code Competition

CodRep is a machine learning competition on source code data. It is care...
research
01/06/2022

Efficiently Disentangle Causal Representations

This paper proposes an efficient approach to learning disentangled repre...
research
10/29/2020

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method...
research
01/25/2022

ML4CO-KIDA: Knowledge Inheritance in Data Aggregation

The Machine Learning for Combinatorial Optimization (ML4CO) NeurIPS 2021...
research
10/10/2021

A computational approach to the Kiefer-Weiss problem for sampling from a Bernoulli population

We present a computational approach to solution of the Kiefer-Weiss prob...
research
09/15/2020

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

This work studies the widely adopted ancestral sampling algorithms for a...

Please sign up or login with your details

Forgot password? Click here to reset