Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

08/30/2022
by   Tianyuan Yao, et al.
8

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.

READ FULL TEXT

page 3

page 4

page 7

page 9

page 10

research
07/19/2021

Compound Figure Separation of Biomedical Images with Side Loss

Unsupervised learning algorithms (e.g., self-supervised learning, auto-e...
research
11/27/2020

Self supervised contrastive learning for digital histopathology

Unsupervised learning has been a long-standing goal of machine learning ...
research
05/08/2018

Efficient online learning for large-scale peptide identification

Motivation: Post-database searching is a key procedure in peptide dentif...
research
03/09/2021

SimTriplet: Simple Triplet Representation Learning with a Single GPU

Contrastive learning is a key technique of modern self-supervised learni...
research
01/04/2023

MoBYv2AL: Self-supervised Active Learning for Image Classification

Active learning(AL) has recently gained popularity for deep learning(DL)...
research
06/03/2016

Automatic Separation of Compound Figures in Scientific Articles

Content-based analysis and retrieval of digital images found in scientif...
research
08/26/2021

Self-supervised Multi-scale Consistency for Weakly Supervised Segmentation Learning

Collecting large-scale medical datasets with fine-grained annotations is...

Please sign up or login with your details

Forgot password? Click here to reset