GEM: A General Evaluation Benchmark for Multimodal Tasks

06/18/2021
by   Lin Su, et al.
0

In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages. We also provide two baseline models for this benchmark. We will release the dataset, code and baseline models, aiming to advance the development of multilingual multimodal research.

READ FULL TEXT

page 3

page 4

research
12/13/2020

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

Automatic generation of video descriptions in natural language, also cal...
research
10/24/2022

Multilingual Multimodal Learning with Machine Translated Text

Most vision-and-language pretraining research focuses on English tasks. ...
research
03/27/2023

IRFL: Image Recognition of Figurative Language

Figures of speech such as metaphors, similes, and idioms allow language ...
research
02/03/2023

Controlling for Stereotypes in Multimodal Language Model Evaluation

We propose a methodology and design two benchmark sets for measuring to ...
research
12/28/2019

All-in-One Image-Grounded Conversational Agents

As single-task accuracy on individual language and image tasks has impro...
research
03/24/2023

MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and onl...
research
09/08/2019

MULE: Multimodal Universal Language Embedding

Existing vision-language methods typically support two languages at a ti...

Please sign up or login with your details

Forgot password? Click here to reset