Data Twinning

10/06/2021
by   Akhil Vakayil, et al.
0

In this work, we develop a method named Twinning, for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model-independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning can also be used for generating multiple splits of a given dataset to aid divide-and-conquer procedures and k-fold cross validation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2020

SPlit: An Optimal Method for Data Splitting

In this article we propose an optimal method referred to as SPlit for sp...
research
02/07/2022

Optimal Ratio for Data Splitting

It is common to split a dataset into training and testing sets before fi...
research
08/22/2019

Efficient Cross-Validation of Echo State Networks

Echo State Networks (ESNs) are known for their fast and precise one-shot...
research
02/21/2022

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

In the application of deep learning on optical coherence tomography (OCT...
research
11/08/2018

Efficient Identification of Approximate Best Configuration of Training in Large Datasets

A configuration of training refers to the combinations of feature engine...
research
09/04/2022

Beyond Random Split for Assessing Statistical Model Performance

Even though a train/test split of the dataset randomly performed is a co...
research
09/11/2020

DART: Data Addition and Removal Trees

How can we update data for a machine learning model after it has already...

Please sign up or login with your details

Forgot password? Click here to reset