Spatio-temporal grounding describes the task of localizing events in spa...
To enable progress towards egocentric agents capable of understanding
ev...
Given a long untrimmed video and natural language queries, video groundi...
Identification of brain regions related to the specific neurological
dis...
This article aims to summarize recent and ongoing efforts to simulate
co...
Multi-modal learning from video data has seen increased attention recent...
Videos are a rich source for self-supervised learning (SSL) of visual
re...
The task of multimodal learning has seen a growing interest recently as ...
In this paper, we explore self-supervised audio-visual models that learn...
Visual and textual modalities contribute complementary information about...
The variational quantum Monte Carlo (VQMC) method received significant
a...
Multimodal self-supervised learning is getting more and more attention a...
An identification is found between meta-learning and the problem of
dete...
We formulate a practical yet challenging problem: General Partial Label
...
We address the problem of phrase grounding by learning a multi-level com...