Learnability, Sample Complexity, and Hypothesis Class Complexity for Regression Models

03/28/2023
by   Soosan Beheshti, et al.
0

The goal of a learning algorithm is to receive a training data set as input and provide a hypothesis that can generalize to all possible data points from a domain set. The hypothesis is chosen from hypothesis classes with potentially different complexities. Linear regression modeling is an important category of learning algorithms. The practical uncertainty of the target samples affects the generalization performance of the learned model. Failing to choose a proper model or hypothesis class can lead to serious issues such as underfitting or overfitting. These issues have been addressed by alternating cost functions or by utilizing cross-validation methods. These approaches can introduce new hyperparameters with their own new challenges and uncertainties or increase the computational complexity of the learning algorithm. On the other hand, the theory of probably approximately correct (PAC) aims at defining learnability based on probabilistic settings. Despite its theoretical value, PAC does not address practical learning issues on many occasions. This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues. The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy) and proposes a new related typical set in the set of hyperparameters to tackle the learnability issue. Moreover, it enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon in the epsilon CoAC framework. Not only the epsilon CoAC learnability overcomes the issues of overfitting and underfitting, but it also shows advantages and superiority over the well known cross-validation method in the sense of time consumption as well as in the sense of accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2011

Multi-Instance Learning with Any Hypothesis Class

In the supervised learning setting termed Multiple-Instance Learning (MI...
research
06/09/2020

Probably Approximately Correct Constrained Learning

As learning solutions reach critical applications in social, industrial,...
research
11/10/2022

Probabilistically Robust PAC Learning

Recently, Robey et al. propose a notion of probabilistic robustness, whi...
research
04/07/2020

On the Complexity of Learning from Label Proportions

In the problem of learning with label proportions, which we call LLP lea...
research
04/27/2023

A Parameterized Theory of PAC Learning

Probably Approximately Correct (i.e., PAC) learning is a core concept of...
research
11/01/2020

Measure Theoretic Approach to Nonuniform Learnability

An earlier introduced characterization of nonuniform learnability that a...
research
09/20/2019

Do Compressed Representations Generalize Better?

One of the most studied problems in machine learning is finding reasonab...

Please sign up or login with your details

Forgot password? Click here to reset