Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining

07/26/2022
by   Soumen Basu, et al.
7

Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multiple views of an object in natural scene videos, an Ultrasound (US) video captures different 2D slices of an organ. Hence, there is almost no similarity between the temporally distant frames of even the same US video. In this paper we propose to instead utilize such frames as hard negatives. We advocate mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum in a UCL framework to learn rich image representations. We deploy our framework to learn the representations of Gallbladder (GB) malignancy from US videos. We also construct the first large-scale US video dataset containing 64 videos and 15,800 frames for learning GB representations. We show that the standard ResNet50 backbone trained with our framework improves the accuracy of models pretrained with SOTA UCL techniques as well as supervised pretrained models on ImageNet for the GB malignancy detection task by 2-6 our method on a publicly available lung US image dataset of COVID-19 pathologies and show an improvement of 1.5 dataset, and models are available at https://gbc-iitd.github.io/usucl.

READ FULL TEXT

page 4

page 11

research
08/06/2022

Frozen CLIP Models are Efficient Video Learners

Video recognition has been dominated by the end-to-end learning paradigm...
research
03/13/2023

Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning

Capturing high dynamic range (HDR) images (videos) is attractive because...
research
08/27/2020

Learning Representations of Endoscopic Videos to Detect Tool Presence Without Supervision

In this work, we explore whether it is possible to learn representations...
research
11/25/2020

Effective Sample Pair Generation for Ultrasound Video Contrastive Representation Learning

Most deep neural networks (DNNs) based ultrasound (US) medical image ana...
research
03/16/2022

Learning video retrieval models with relevance-aware online mining

Due to the amount of videos and related captions uploaded every hour, de...
research
03/18/2020

Watching the World Go By: Representation Learning from Unlabeled Videos

Recent single image unsupervised representation learning techniques show...
research
09/19/2018

Towards Large-Scale Video Video Object Mining

We propose to leverage a generic object tracker in order to perform obje...

Please sign up or login with your details

Forgot password? Click here to reset