Self-Contained Entity Discovery from Captioned Videos

08/13/2022
by   Melika Ayoughi, et al.
2

This paper introduces the task of visual named entity discovery in videos without the need for task-specific supervision or task-specific external knowledge sources. Assigning specific names to entities (e.g. faces, scenes, or objects) in video frames is a long-standing challenge. Commonly, this problem is addressed as a supervised learning objective by manually annotating faces with entity labels. To bypass the annotation burden of this setup, several works have investigated the problem by utilizing external knowledge sources such as movie databases. While effective, such approaches do not work when task-specific knowledge sources are not provided and can only be applied to movies and TV series. In this work, we take the problem a step further and propose to discover entities in videos from videos and corresponding captions or subtitles. We introduce a three-stage method where we (i) create bipartite entity-name graphs from frame-caption pairs, (ii) find visual entity agreements, and (iii) refine the entity assignment through entity-level prototype construction. To tackle this new problem, we outline two new benchmarks SC-Friends and SC-BBT based on the Friends and Big Bang Theory TV series. Experiments on the benchmarks demonstrate the ability of our approach to discover which named entity belongs to which face or scene, with an accuracy close to a supervised oracle, just from the multimodal information present in videos. Additionally, our qualitative examples show the potential challenges of self-contained discovery of any visual entity for future work. The code and the data are available on GitHub.

READ FULL TEXT

page 1

page 5

page 13

page 14

page 15

page 16

page 17

page 18

research
06/13/2019

Grounding Object Detections With Transcriptions

A vast amount of audio-visual data is available on the Internet thanks t...
research
03/02/2022

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Entity alignment, aiming to identify equivalent entities across differen...
research
11/09/2022

Visual Named Entity Linking: A New Dataset and A Baseline

Visual Entity Linking (VEL) is a task to link regions of images with the...
research
09/22/2014

Temporally Coherent Bayesian Models for Entity Discovery in Videos by Tracklet Clustering

A video can be represented as a sequence of tracklets, each spanning 10-...
research
10/20/2020

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

A challenge for named entity disambiguation (NED), the task of mapping t...
research
05/18/2020

Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature

Name entity recognition in noisy user-generated texts is a difficult tas...
research
11/28/2016

Who's that Actor? Automatic Labelling of Actors in TV series starting from IMDB Images

In this work, we aim at automatically labeling actors in a TV series. Ra...

Please sign up or login with your details

Forgot password? Click here to reset