Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

06/05/2023
by   Xinrui Zhou, et al.
0

Localization of the narrowest position of the vessel and corresponding vessel and remnant vessel delineation in carotid ultrasound (US) are essential for carotid stenosis grading (CSG) in clinical practice. However, the pipeline is time-consuming and tough due to the ambiguous boundaries of plaque and temporal variation. To automatize this procedure, a large number of manual delineations are usually required, which is not only laborious but also not reliable given the annotation difficulty. In this study, we present the first video classification framework for automatic CSG. Our contribution is three-fold. First, to avoid the requirement of laborious and unreliable annotation, we propose a novel and effective video classification network for weakly-supervised CSG. Second, to ease the model training, we adopt an inflation strategy for the network, where pre-trained 2D convolution weights can be adapted into the 3D counterpart in our network. In this way, the existing pre-trained large model can be used as an effective warm start for our network. Third, to enhance the feature discrimination of the video, we propose a novel attention-guided multi-dimension fusion (AMDF) transformer encoder to model and integrate global dependencies within and across spatial and temporal dimensions, where two lightweight cross-dimensional attention mechanisms are designed. Our approach is extensively validated on a large clinically collected carotid US video dataset, demonstrating state-of-the-art performance compared with strong competitors.

READ FULL TEXT

page 2

page 8

research
08/02/2021

Flip Learning: Erase to Segment

Nodule segmentation from breast ultrasound images is challenging yet ess...
research
06/27/2022

Key-frame Guided Network for Thyroid Nodule Recognition using Ultrasound Videos

Ultrasound examination is widely used in the clinical diagnosis of thyro...
research
11/21/2020

Boundary-sensitive Pre-training for Temporal Localization in Videos

Many video analysis tasks require temporal localization thus detection o...
research
07/01/2022

Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling

Ultrasound (US) is widely used for its advantages of real-time imaging, ...
research
09/14/2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

Recently, large-scale pre-trained language-image models like CLIP have s...
research
05/17/2019

Weakly-Supervised Temporal Localization via Occurrence Count Learning

We propose a novel model for temporal detection and localization which a...
research
08/16/2023

OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

Ultrasound (US) imaging is indispensable in clinical practice. To diagno...

Please sign up or login with your details

Forgot password? Click here to reset