How2: A Large-scale Dataset for Multimodal Language Understanding

11/01/2018
by   Ramon Sanabria, et al.
0

In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations. We also present integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multimodal summarization. By making available data and code for several multimodal natural language tasks, we hope to stimulate more research on these and similar challenges, to obtain a deeper understanding of multimodality in language processing.

READ FULL TEXT
research
05/26/2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
research
04/14/2019

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Humor is a unique and creative communicative behavior displayed during s...
research
08/17/2022

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

Lecture slide presentations, a sequence of pages that contain text and f...
research
03/27/2023

IRFL: Image Recognition of Figurative Language

Figures of speech such as metaphors, similes, and idioms allow language ...
research
12/30/2020

Accurate Word Representations with Universal Visual Guidance

Word representation is a fundamental component in neural language unders...
research
04/05/2023

Unleashing the Power of ChatGPT for Translation: An Empirical Study

The recently released ChatGPT has demonstrated surprising abilities in n...
research
04/14/2017

ShapeWorld - A new test methodology for multimodal language understanding

We introduce a novel framework for evaluating multimodal deep learning m...

Please sign up or login with your details

Forgot password? Click here to reset