DeepAI AI Chat
Log In Sign Up

A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off

by   Ou Wu, et al.

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.


Exploring the Learning Difficulty of Data Theory and Measure

As learning difficulty is crucial for machine learning (e.g., difficulty...

Compensation Learning

Weighting strategy prevails in machine learning. For example, a common a...

Denoising after Entropy-based Debiasing A Robust Training Method for Dataset Bias with Noisy Labels

Improperly constructed datasets can result in inaccurate inferences. For...

Which Samples Should be Learned First: Easy or Hard?

An effective weighting scheme for training samples is essential for lear...

Improved Preterm Prediction Based on Optimized Synthetic Sampling of EHG Signal

Preterm labor is the leading cause of neonatal morbidity and mortality a...

Intuitiveness in Active Teaching

Machine learning is a double-edged sword: it gives rise to astonishing r...