A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

09/09/2023
by   Chao Qin, et al.
0

Detecting breast lesion in videos is crucial for computer-aided diagnosis. Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation. We argue that such a strategy struggles to effectively perform deep feature aggregation and ignores the useful local information. To tackle these issues, we propose a spatial-temporal deformable attention based framework, named STNet. Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion. The spatial-temporal deformable attention module enables deep feature aggregation in each stage of both encoder and decoder. To further accelerate the detection speed, we introduce an encoder feature shuffle strategy for multi-frame prediction during inference. In our encoder feature shuffle strategy, we share the backbone and encoder features, and shuffle encoder features for decoder to generate the predictions of multiple frames. The experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance, while operating twice as fast inference speed. The code and model are available at https://github.com/AlfredQin/STNet.

READ FULL TEXT
research
07/01/2022

A New Dataset and A Baseline Model for Breast Lesion Detection in Ultrasound Videos

Breast lesion detection in ultrasound is critical for breast cancer diag...
research
07/16/2019

Semi-supervised Breast Lesion Detection in Ultrasound Video Based on Temporal Coherence

Breast lesion detection in ultrasound video is critical for computer-aid...
research
06/22/2022

No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers

Video restoration, aiming at restoring clear frames from degraded videos...
research
07/08/2021

Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation

Precise localization of polyp is crucial for early cancer screening in g...
research
11/27/2017

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Recently, substantial research effort has focused on how to apply CNNs o...
research
08/29/2019

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending

We propose the first multi-frame video object detection framework traine...
research
11/20/2022

MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection

In this paper, we introduce MINTIME, a video deepfake detection approach...

Please sign up or login with your details

Forgot password? Click here to reset