Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

02/21/2022
by   Iulian Emil Tampu, et al.
3

In the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for two classification tasks using two OCT open-access datasets extensively used in the literature, Kermany's ophthalmology dataset and AIIMS breast tissue dataset. Our results show that the classification accuracy is inflated by 3.9 to 26 percentage units for models tested on a dataset with improper splitting, highlighting the considerable effect of dataset handling on model evaluation. This study intends to raise awareness on the importance of dataset splitting for research on deep learning using OCT data and volumetric data in general.

READ FULL TEXT

page 1

page 2

research
02/07/2022

Optimal Ratio for Data Splitting

It is common to split a dataset into training and testing sets before fi...
research
12/20/2020

SPlit: An Optimal Method for Data Splitting

In this article we propose an optimal method referred to as SPlit for sp...
research
10/06/2021

Data Twinning

In this work, we develop a method named Twinning, for partitioning a dat...
research
09/04/2022

Beyond Random Split for Assessing Statistical Model Performance

Even though a train/test split of the dataset randomly performed is a co...
research
02/22/2023

Magnification Invariant Medical Image Analysis: A Comparison of Convolutional Networks, Vision Transformers, and Token Mixers

Convolution Neural Networks (CNNs) are widely used in medical image anal...
research
03/19/2020

Homeostasis phenomenon in predictive inference when using a wrong learning model: a tale of random split of data into training and test sets

This note uses a conformal prediction procedure to provide further suppo...

Please sign up or login with your details

Forgot password? Click here to reset