Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

by   Gabriele Campanella, et al.

In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task, that of prostate cancer diagnosis on needle biopsies. We performed a thorough evaluation of the performance of our MIL pipeline under several conditions achieving an AUC of 0.98 on a held-out test set of 1,824 slides. These results open the way for training accurate diagnosis prediction models at scale, laying the foundation for decision support system deployment in the clinic.


page 8

page 12


Effects of annotation granularity in deep learning models for histopathological images

Pathological is crucial to cancer diagnosis. Usually, Pathologists draw ...

Deep weakly-supervised learning methods for classification and localization in histology images: a survey

Using state-of-the-art deep learning models for the computer-assisted di...

RMDL: Recalibrated multi-instance deep learning for whole slide gastric image classification

The whole slide histopathology images (WSIs) play a critical role in gas...

An attention-based multi-resolution model for prostate whole slide imageclassification and localization

Histology review is often used as the `gold standard' for disease diagno...

Multi-Scale Attention-based Multiple Instance Learning for Classification of Multi-Gigapixel Histology Images

Histology images with multi-gigapixel of resolution yield rich informati...

Virchow: A Million-Slide Digital Pathology Foundation Model

Computational pathology uses artificial intelligence to enable precision...

Spurious Features Everywhere – Large-Scale Detection of Harmful Spurious Features in ImageNet

Benchmark performance of deep learning classifiers alone is not a reliab...

Please sign up or login with your details

Forgot password? Click here to reset