Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

03/30/2016
by   Spandana Gella, et al.
0

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce VerSe, a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels. We propose an unsupervised algorithm based on Lesk which performs visual sense disambiguation using textual, visual, or multimodal embeddings. We find that textual embeddings perform well when gold-standard textual annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. We also verify our findings by using the textual and multimodal embeddings as features in a supervised setting and analyse the performance of visual sense disambiguation task. VerSe is made publicly available and can be downloaded at: https://github.com/spandanagella/verse.

READ FULL TEXT
research
12/20/2020

Transductive Visual Verb Sense Disambiguation

Verb Sense Disambiguation is a well-known task in NLP, the aim is to fin...
research
04/14/2023

OPI at SemEval 2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation

The goal of visual word sense disambiguation is to find the image that b...
research
06/24/2023

UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation

We describe the systems of the University of Alberta team for the SemEva...
research
08/15/2023

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Automatic assessment of the quality of scholarly documents is a difficul...
research
08/17/2023

FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings

Logo embedding plays a crucial role in various e-commerce applications b...
research
02/21/2015

Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks

Artificial agents today can answer factual questions. But they fall shor...
research
02/15/2019

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

Detecting visual relationships, i.e. <Subject, Predicate, Object> triple...

Please sign up or login with your details

Forgot password? Click here to reset