Cluster-to-Conquer: A Framework for End-to-End Multi-Instance Learning for Whole Slide Image Classification

by   Yash Sharma, et al.

In recent years, the availability of digitized Whole Slide Images (WSIs) has enabled the use of deep learning-based computer vision techniques for automated disease diagnosis. However, WSIs present unique computational and algorithmic challenges. WSIs are gigapixel-sized (∼100K pixels), making them infeasible to be used directly for training deep neural networks. Also, often only slide-level labels are available for training as detailed annotations are tedious and can be time-consuming for experts. Approaches using multiple-instance learning (MIL) frameworks have been shown to overcome these challenges. Current state-of-the-art approaches divide the learning framework into two decoupled parts: a convolutional neural network (CNN) for encoding the patches followed by an independent aggregation approach for slide-level prediction. In this approach, the aggregation step has no bearing on the representations learned by the CNN encoder. We have proposed an end-to-end framework that clusters the patches from a WSI into k-groups, samples k' patches from each group for training, and uses an adaptive attention mechanism for slide level prediction; Cluster-to-Conquer (C2C). We have demonstrated that dividing a WSI into clusters can improve the model training by exposing it to diverse discriminative features extracted from the patches. We regularized the clustering mechanism by introducing a KL-divergence loss between the attention weights of patches in a cluster and the uniform distribution. The framework is optimized end-to-end on slide-level cross-entropy, patch-level cross-entropy, and KL-divergence loss (Implementation:


page 12

page 16


HistoTransfer: Understanding Transfer Learning for Histopathology

Advancement in digital pathology and artificial intelligence has enabled...

PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification

Automatic pavement distress classification facilitates improving the eff...

Whole Slide Images based Cancer Survival Prediction using Attention Guided Deep Multiple Instance Learning Networks

Traditional image-based survival prediction models rely on discriminativ...

Representation-Aggregation Networks for Segmentation of Multi-Gigapixel Histology Images

Convolutional Neural Network (CNN) models have become the state-of-the-a...

Aggregation Cross-Entropy for Sequence Recognition

In this paper, we propose a novel method, aggregation cross-entropy (ACE...

Magnifying Networks for Images with Billions of Pixels

The shift towards end-to-end deep learning has brought unprecedented adv...