Exploiting non-i.i.d. data towards more robust machine learning algorithms

10/07/2020
by   Wim Casteels, et al.
0

In the field of machine learning there is a growing interest towards more robust and generalizable algorithms. This is for example important to bridge the gap between the environment in which the training data was collected and the environment where the algorithm is deployed. Machine learning algorithms have increasingly been shown to excel in finding patterns and correlations from data. Determining the consistency of these patterns and for example the distinction between causal correlations and nonsensical spurious relations has proven to be much more difficult. In this paper a regularization scheme is introduced that prefers universal causal correlations. This approach is based on 1) the robustness of causal correlations and 2) the data not being independently and identically distribute (i.i.d.). The scheme is demonstrated with a classification task by clustering the (non-i.i.d.) training set in subpopulations. A non-i.i.d. regularization term is then introduced that penalizes weights that are not invariant over these clusters. The resulting algorithm favours correlations that are universal over the subpopulations and indeed a better performance is obtained on an out-of-distribution test set with respect to a more conventional l_2-regularization.

READ FULL TEXT
research
10/03/2021

Enhancing Model Robustness and Fairness with Causality: A Regularization Approach

Recent work has raised concerns on the risk of spurious correlations and...
research
08/21/2023

Spurious Correlations and Where to Find Them

Spurious correlations occur when a model learns unreliable features from...
research
06/09/2020

Stable Prediction via Leveraging Seed Variable

In this paper, we focus on the problem of stable prediction across unkno...
research
06/02/2021

Towards Robust Classification Model by Counterfactual and Invariant Data Generation

Despite the success of machine learning applications in science, industr...
research
07/28/2022

Diversity Boosted Learning for Domain Generalization with Large Number of Domains

Machine learning algorithms minimizing the average training loss usually...
research
07/26/2021

Compensation Learning

Weighting strategy prevails in machine learning. For example, a common a...
research
10/08/2018

Effective Parallelisation for Machine Learning

We present a novel parallelisation scheme that simplifies the adaptation...

Please sign up or login with your details

Forgot password? Click here to reset