CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding

08/23/2022
by   Xingyu "Bruce" Liu, et al.
0

Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 8

page 13

research
11/22/2017

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

Visual and audio modalities are two symbiotic modalities underlying vide...
research
10/07/2020

Rescribe: Authoring and Automatically Editing Audio Descriptions

Audio descriptions make videos accessible to those who cannot see them b...
research
07/21/2020

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

In this paper, we introduce a new problem, named audio-visual video pars...
research
03/25/2021

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

In this paper, we address the problem of separating individual speech si...
research
01/07/2018

Cross-modal Embeddings for Video and Audio Retrieval

The increasing amount of online videos brings several opportunities for ...
research
10/20/2021

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

We introduce the task of spatially localizing narrated interactions in v...
research
06/13/2019

Grounding Object Detections With Transcriptions

A vast amount of audio-visual data is available on the Internet thanks t...

Please sign up or login with your details

Forgot password? Click here to reset