Transductive Visual Verb Sense Disambiguation

12/20/2020
by   Sebastiano Vascon, et al.
3

Verb Sense Disambiguation is a well-known task in NLP, the aim is to find the correct sense of a verb in a sentence. Recently, this problem has been extended in a multimodal scenario, by exploiting both textual and visual features of ambiguous verbs leading to a new problem, the Visual Verb Sense Disambiguation (VVSD). Here, the sense of a verb is assigned considering the content of an image paired with it rather than a sentence in which the verb appears. Annotating a dataset for this task is more complex than textual disambiguation, because assigning the correct sense to a pair of <image, verb> requires both non-trivial linguistic and visual skills. In this work, differently from the literature, the VVSD task will be performed in a transductive semi-supervised learning (SSL) setting, in which only a small amount of labeled information is required, reducing tremendously the need for annotated data. The disambiguation process is based on a graph-based label propagation method which takes into account mono or multimodal representations for <image, verb> pairs. Experiments have been carried out on the recently published dataset VerSe, the only available dataset for this task. The achieved results outperform the current state-of-the-art by a large margin while using only a small fraction of labeled samples per sense. Code available: https://github.com/GiBg1aN/TVVSD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2016

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

We introduce a new task, visual sense disambiguation for verbs: given an...
research
08/17/2022

Semi-supervised Learning with Deterministic Labeling and Large Margin Projection

The centrality and diversity of the labeled data are very influential to...
research
12/01/2019

Integrate Image Representation to Text Model on Sentence Level: a Semi-supervised Framework

Integrating visual features has been proved useful in language represent...
research
11/15/2017

Dual-Path Convolutional Image-Text Embedding

This paper considers the task of matching images and sentences. The chal...
research
10/10/2022

SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training

The conventional success of textual classification relies on annotated d...
research
07/06/2023

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning

Large web-sourced multimodal datasets have powered a slew of new methods...

Please sign up or login with your details

Forgot password? Click here to reset