Natural vs Balanced Distribution in Deep Learning on Whole Slide Images for Cancer Detection

12/21/2020
by   Ismat Ara Reshma, et al.
7

The class distribution of data is one of the factors that regulates the performance of machine learning models. However, investigations on the impact of different distributions available in the literature are very few, sometimes absent for domain-specific tasks. In this paper, we analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images, also known as whole slide images (WSIs). WSIs are considered as the gold standard for cancer diagnosis. In recent years, researchers have turned their attention to DL models to automate and accelerate the diagnosis process. In the training of such DL models, filtering out the non-regions-of-interest from the WSIs and adopting an artificial distribution (usually, a balanced distribution) is a common trend. In our analysis, we show that keeping the WSIs data in their usual distribution (which we call natural distribution) for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution. We conduct an empirical comparative study with 10 random folds for each distribution, comparing the resulting average performance levels in terms of five different evaluation metrics. Experimental results show the effectiveness of the natural distribution over the balanced one across all the evaluation metrics.

READ FULL TEXT

page 2

page 5

page 7

page 8

research
12/10/2020

Performance Comparison of Balanced and Unbalanced Cancer Datasets using Pre-Trained Convolutional Neural Network

Cancer disease is one of the leading causes of death all over the world....
research
01/25/2021

Automatic Liver Segmentation from CT Images Using Deep Learning Algorithms: A Comparative Study

Medical imaging has been employed to support medical diagnosis and treat...
research
03/27/2023

Evaluating XGBoost for Balanced and Imbalanced Data: Application to Fraud Detection

This paper evaluates XGboost's performance given different dataset sizes...
research
07/13/2021

Thinkback: Task-SpecificOut-of-Distribution Detection

The increased success of Deep Learning (DL) has recently sparked large-s...
research
01/06/2021

The data synergy effects of time-series deep learning models in hydrology

When fitting statistical models to variables in geoscientific discipline...
research
07/22/2022

Deep Learning Hyperparameter Optimization for Breast Mass Detection in Mammograms

Accurate breast cancer diagnosis through mammography has the potential t...

Please sign up or login with your details

Forgot password? Click here to reset