UGCANet: A Unified Global Context-Aware Transformer-based Network with Feature Alignment for Endoscopic Image Analysis

by   Pham Vu Hung, et al.

Gastrointestinal endoscopy is a medical procedure that utilizes a flexible tube equipped with a camera and other instruments to examine the digestive tract. This minimally invasive technique allows for diagnosing and managing various gastrointestinal conditions, including inflammatory bowel disease, gastrointestinal bleeding, and colon cancer. The early detection and identification of lesions in the upper gastrointestinal tract and the identification of malignant polyps that may pose a risk of cancer development are critical components of gastrointestinal endoscopy's diagnostic and therapeutic applications. Therefore, enhancing the detection rates of gastrointestinal disorders can significantly improve a patient's prognosis by increasing the likelihood of timely medical intervention, which may prolong the patient's lifespan and improve overall health outcomes. This paper presents a novel Transformer-based deep neural network designed to perform multiple tasks simultaneously, thereby enabling accurate identification of both upper gastrointestinal tract lesions and colon polyps. Our approach proposes a unique global context-aware module and leverages the powerful MiT backbone, along with a feature alignment block, to enhance the network's representation capability. This novel design leads to a significant improvement in performance across various endoscopic diagnosis tasks. Extensive experiments demonstrate the superior performance of our method compared to other state-of-the-art approaches.


page 9

page 12

page 14


Leveraging object detection for the identification of lung cancer

Lung cancer poses a significant global public health challenge, emphasiz...

Meta-information-aware Dual-path Transformer for Differential Diagnosis of Multi-type Pancreatic Lesions in Multi-phase CT

Pancreatic cancer is one of the leading causes of cancer-related death. ...

AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows

Recently Transformer has shown good performance in several vision tasks ...

3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

Dense prediction in medical volume provides enriched guidance for clinic...

Lesion-aware Dynamic Kernel for Polyp Segmentation

Automatic and accurate polyp segmentation plays an essential role in ear...

Chunk Content is not Enough: Chunk-Context Aware Resemblance Detection for Deduplication Delta Compression

With the growing popularity of cloud storage, removing duplicated data a...

ColNav: Real-Time Colon Navigation for Colonoscopy

Colorectal cancer screening through colonoscopy continues to be the domi...

Please sign up or login with your details

Forgot password? Click here to reset