The self-media era provides us tremendous high quality videos. Unfortuna...
Current research on cross-modal retrieval is mostly English-oriented, as...
Attribute-specific fashion retrieval (ASFR) is a challenging information...
This paper addresses the temporal sentence grounding (TSG). Although exi...
This paper targets unsupervised skeleton-based action representation lea...
Despite the recent developments in the field of cross-modal retrieval, t...
We address the problem of temporal sentence localization in videos (TSLV...
This paper strives to predict fine-grained fashion similarity. In this
s...
This paper addresses the problem of temporal sentence grounding (TSG), w...
This paper aims for the language-based product image retrieval task. The...
This paper targets the task of language-based moment localization. The
l...
Temporal language localization in videos aims to ground one video segmen...
Query-based moment localization is a new task that localizes the best ma...
The rapid growth of user-generated videos on the Internet has intensifie...
This paper strives to learn fine-grained fashion similarity. In this
sim...
This paper attacks the challenging problem of zero-example video retriev...
Attention mechanisms have been widely applied in the Visual Question
Ans...
This paper strives to find amidst a set of sentences the one best descri...
In order to retrieve unlabeled images by textual queries, cross-media
si...
Image captioning has so far been explored mostly in English, as most
ava...
Unsupervised pre-training was a critical technique for training deep neu...
This paper strives to find the sentence best describing the content of a...