Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

09/21/2023
by   Deepak Gupta, et al.
0

The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we proposed monomodal and multimodal approaches that can effectively provide visual answers from medical videos to natural language questions. We conducted a comprehensive analysis of the results, focusing on the impact of the created datasets on model training and the significance of visual features in enhancing the performance of the monomodal and multi-modal approaches. Our findings suggest that these datasets have the potential to enhance the performance of medical visual answer localization tasks and provide a promising future direction to further enhance the performance by using pre-trained language-vision models.

READ FULL TEXT
research
01/30/2022

A Dataset for Medical Instructional Video Classification and Question Answering

This paper introduces a new challenge and datasets to foster research to...
research
02/19/2023

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically releva...
research
06/10/2023

Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark

With the availability of large-scale, comprehensive, and general-purpose...
research
05/21/2020

Automated Question Answer medical model based on Deep Learning Technology

Artificial intelligence can now provide more solutions for different pro...
research
07/11/2023

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Medical students and junior surgeons often rely on senior surgeons and s...
research
06/22/2022

Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

Visual question answering (VQA) in surgery is largely unexplored. Expert...
research
09/04/2019

Towards Automatic Detection of Misinformation in Online Medical Videos

Recent years have witnessed a significant increase in the online sharing...

Please sign up or login with your details

Forgot password? Click here to reset