Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

06/22/2022
by   Lalithkumar Seenivasan, et al.
0

Visual question answering (VQA) in surgery is largely unexplored. Expert surgeons are scarce and are often overloaded with clinical and academic workloads. This overload often limits their time answering questionnaires from patients, medical students or junior residents related to surgical procedures. At times, students and junior residents also refrain from asking too many questions during classes to reduce disruption. While computer-aided simulators and recording of past surgical procedures have been made available for them to observe and improve their skills, they still hugely rely on medical experts to answer their questions. Having a Surgical-VQA system as a reliable 'second opinion' could act as a backup and ease the load on the medical experts in answering these questions. The lack of annotated medical data and the presence of domain-specific terms has limited the exploration of VQA for surgical procedures. In this work, we design a Surgical-VQA task that answers questionnaires on surgical procedures based on the surgical scene. Extending the MICCAI endoscopic vision challenge 2018 dataset and workflow recognition dataset further, we introduce two Surgical-VQA datasets with classification and sentence-based answers. To perform Surgical-VQA, we employ vision-text transformers models. We further introduce a residual MLP-based VisualBert encoder model that enforces interaction between visual and text tokens, improving performance in classification-based answering. Furthermore, we study the influence of the number of input image patches and temporal visual features on the model performance in both classification and sentence-based answering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Despite the availability of computer-aided simulators and recorded video...
research
04/19/2023

SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery

Advances in GPT-based large language models (LLMs) are revolutionizing n...
research
07/11/2023

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Medical students and junior surgeons often rely on senior surgeons and s...
research
03/22/2020

DAISI: Database for AI Surgical Instruction

Telementoring surgeons as they perform surgery can be essential in the t...
research
10/29/2019

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors b...
research
04/04/2023

Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

Medical Visual Question Answering (VQA) systems play a supporting role t...
research
09/21/2023

Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

The increase in the availability of online videos has transformed the wa...

Please sign up or login with your details

Forgot password? Click here to reset