A Dataset for Medical Instructional Video Classification and Question Answering

01/30/2022
by   Deepak Gupta, et al.
6

This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aids, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 annotated videos for the MVC task and 3,010 annotated questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and proposed the multimodal learning methods that set competitive baselines for future research.

READ FULL TEXT

page 2

page 3

page 4

page 6

research
09/21/2023

Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

The increase in the availability of online videos has transformed the wa...
research
05/10/2022

Learning to Answer Visual Questions from Web Videos

Recent methods for visual question answering rely on large-scale annotat...
research
12/01/2020

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Modern approaches to visual question answering require large annotated d...
research
05/15/2018

CLINIQA: A Machine Intelligence Based Clinical Question Answering System

The recent developments in the field of biomedicine have made large volu...
research
04/09/2023

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

This paper introduces FrenchMedMCQA, the first publicly available Multip...
research
04/09/2021

Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework

Work to date on language-informed video understanding has primarily addr...
research
02/17/2023

Multimodal Subtask Graph Generation from Instructional Videos

Real-world tasks consist of multiple inter-dependent subtasks (e.g., a d...

Please sign up or login with your details

Forgot password? Click here to reset