Optimal Ratio for Data Splitting

02/07/2022
by   V. Roshan Joseph, et al.
0

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is √(p):1, where p is the number of parameters in a linear regression model that explains the data well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2020

SPlit: An Optimal Method for Data Splitting

In this article we propose an optimal method referred to as SPlit for sp...
research
07/15/2020

Experimental Design for Bathymetry Editing

We describe an application of machine learning to a real-world computer ...
research
03/19/2020

Homeostasis phenomenon in predictive inference when using a wrong learning model: a tale of random split of data into training and test sets

This note uses a conformal prediction procedure to provide further suppo...
research
02/21/2022

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

In the application of deep learning on optical coherence tomography (OCT...
research
10/21/2021

Data splitting improves statistical performance in overparametrized regimes

While large training datasets generally offer improvement in model perfo...
research
10/06/2021

Data Twinning

In this work, we develop a method named Twinning, for partitioning a dat...
research
09/04/2022

Beyond Random Split for Assessing Statistical Model Performance

Even though a train/test split of the dataset randomly performed is a co...

Please sign up or login with your details

Forgot password? Click here to reset