WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

05/09/2023
by   Andrea Burns, et al.
0

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.

READ FULL TEXT

page 1

page 2

page 3

research
05/05/2023

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Webpages have been a rich, scalable resource for vision-language and lan...
research
12/23/2022

Do DALL-E and Flamingo Understand Each Other?

A major goal of multimodal research is to improve machine understanding ...
research
10/26/2022

FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning

Multimodal tasks in the fashion domain have significant potential for e-...
research
04/06/2020

Learning to Summarize Passages: Mining Passage-Summary Pairs from Wikipedia Revision Histories

In this paper, we propose a method for automatically constructing a pass...
research
03/02/2023

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax

Clinical imaging databases contain not only medical images but also text...
research
04/19/2019

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

Computing author intent from multimodal data like Instagram posts requir...
research
12/15/2022

Are Multimodal Models Robust to Image and Text Perturbations?

Multimodal image-text models have shown remarkable performance in the pa...

Please sign up or login with your details

Forgot password? Click here to reset