DeepAI AI Chat
Log In Sign Up

The Mismatch Principle: Statistical Learning Under Large Model Uncertainties

by   Martin Genzel, et al.

We study the learning capacity of empirical risk minimization with regard to the squared loss and a convex hypothesis class consisting of linear functions. While these types of estimators were originally designed for noisy linear regression problems, it recently turned out that they are in fact capable of handling considerably more complicated situations, involving highly non-linear distortions. This work intends to provide a comprehensive explanation of this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple, yet generic recipe to establish theoretical error bounds for empirical risk minimization. The scope of our results is fairly general, permitting arbitrary sub-Gaussian input-output pairs, possibly with strongly correlated feature variables. Noteworthy, the mismatch principle also generalizes to a certain extent the classical orthogonality principle for ordinary least squares. This adaption allows us to investigate problem setups of recent interest, most importantly, high-dimensional parameter regimes and non-linear observation processes. In particular, our theoretical framework is applied to various scenarios of practical relevance, such as single-index models, variable selection, and strongly correlated designs. We thereby demonstrate the key purpose of the mismatch principle, that is, learning (semi-)parametric output rules under large model uncertainties and misspecifications.


page 1

page 2

page 3

page 4


Chaining Bounds for Empirical Risk Minimization

This paper extends the standard chaining technique to prove excess risk ...

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

In this paper, we study the challenge of feature selection based on a re...

On Medians of (Randomized) Pairwise Means

Tournament procedures, recently introduced in Lugosi Mendelson (2016...

High Dimensional Classification via Empirical Risk Minimization: Improvements and Optimality

In this article, we investigate a family of classification algorithms de...

Generalization Bounds for Vicinal Risk Minimization Principle

The vicinal risk minimization (VRM) principle, first proposed by vapnik1...

On the ERM Principle with Networked Data

Networked data, in which every training example involves two objects and...