Paraphrase Acquisition from Image Captions

01/26/2023
by   Marcel Gohsen, et al.
0

We propose to use captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of this idea, we analyze captions in the English Wikipedia, where editors frequently relabel the same image for different articles. The paper introduces the underlying mining technology and compares known paraphrase corpora with respect to their syntactic and semantic paraphrase similarity to our new resource. In this context, we introduce characteristic maps along the two similarity dimensions to identify the style of paraphrases coming from different sources. An annotation study demonstrates the high reliability of the algorithmically determined characteristic maps.

READ FULL TEXT

page 1

page 2

research
04/16/2021

Concadia: Tackling image accessibility with context

Images have become an integral part of online media. This has enhanced s...
research
03/20/2021

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

In this paper, we build a multi-style generative model for stylish image...
research
05/22/2022

The Case for Perspective in Multimodal Datasets

This paper argues in favor of the adoption of annotation practices for m...
research
04/13/2015

Joint Learning of Distributed Representations for Images and Texts

This technical report provides extra details of the deep multimodal simi...
research
05/01/2022

Conventions and Mutual Expectations – understanding sources for web genres

Genres can be understood in many different ways. They are often perceive...
research
12/01/2019

Learning to Relate from Captions and Bounding Boxes

In this work, we propose a novel approach that predicts the relationship...
research
09/15/2023

PatFig: Generating Short and Long Captions for Patent Figures

This paper introduces Qatent PatFig, a novel large-scale patent figure d...

Please sign up or login with your details

Forgot password? Click here to reset