Giga-SSL: Self-Supervised Learning for Gigapixel Images

12/06/2022
by   Tristan Lazard, et al.
0

Whole slide images (WSI) are microscopy images of stained tissue slides routinely prepared for diagnosis and treatment selection in medical practice. WSI are very large (gigapixel size) and complex (made of up to millions of cells). The current state-of-the-art (SoTA) approach to classify WSI subdivides them into tiles, encodes them by pre-trained networks and applies Multiple Instance Learning (MIL) to train for specific downstream tasks. However, annotated datasets are often small, typically a few hundred to a few thousand WSI, which may cause overfitting and underperforming models. Conversely, the number of unannotated WSI is ever increasing, with datasets of tens of thousands (soon to be millions) of images available. While it has been previously proposed to use these unannotated data to identify suitable tile representations by self-supervised learning (SSL), downstream classification tasks still require full supervision because parts of the MIL architecture is not trained during tile level SSL pre-training. Here, we propose a strategy of slide level SSL to leverage the large number of WSI without annotations to infer powerful slide representations. Applying our method to The Cancer-Genome Atlas, one of the most widely used data resources in cancer research (16 TB image data), we are able to downsize the dataset to 23 MB without any loss in predictive power: we show that a linear classifier trained on top of these embeddings maintains or improves previous SoTA performances on various benchmark WSI classification tasks. Finally, we observe that training a classifier on these representations with tiny datasets (e.g. 50 slides) improved performances over SoTA by an average of +6.3 AUC points over all downstream tasks.

READ FULL TEXT
research
06/13/2022

Virtual embeddings and self-consistency for self-supervised learning

Self-supervised Learning (SSL) has recently gained much attention due to...
research
01/01/2023

MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction

There are multiple scales of abstraction from which we can describe the ...
research
09/14/2023

Virchow: A Million-Slide Digital Pathology Foundation Model

Computational pathology uses artificial intelligence to enable precision...
research
02/10/2023

Self-Supervised Learning-Based Cervical Cytology Diagnostics in Low-Data Regime and Low-Resource Setting

Screening Papanicolaou test samples effectively reduces cervical cancer-...
research
02/19/2023

Evaluating Representations with Readout Model Switching

Although much of the success of Deep Learning builds on learning good re...

Please sign up or login with your details

Forgot password? Click here to reset