Reading Between the Lanes: Text VideoQA on the Road

07/08/2023
by   George Tom, et al.
0

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of 3,222 driving videos collected from multiple countries, annotated with 10,500 questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa

READ FULL TEXT

page 2

page 7

page 9

page 14

research
05/19/2020

RoadText-1K: Text Detection Recognition Dataset for Driving Videos

Perceiving text is crucial to understand semantics of outdoor scenes and...
research
02/08/2022

NEWSKVQA: Knowledge-Aware News Video Question Answering

Answering questions in the context of videos can be helpful in video ind...
research
11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...
research
05/31/2019

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich sema...
research
09/04/2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Researchers have extensively studied the field of vision and language, d...
research
09/04/2023

LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data

Recently, deception detection on human videos is an eye-catching techniq...
research
11/30/2021

AssistSR: Affordance-centric Question-driven Video Segment Retrieval

It is still a pipe dream that AI assistants on phone and AR glasses can ...

Please sign up or login with your details

Forgot password? Click here to reset