This research aims to improve the accuracy of complex volleyball predict...
This research is intended to accomplish two goals: The first goal is to
...
Multi-view learning is a learning problem that utilizes the various
repr...
Multi-view learning accomplishes the task objectives of classification b...
We present a new large-scale multilingual video description dataset, VAT...
This research strives for natural language moment retrieval in long,
unt...
Vision-language navigation (VLN) is the task of navigating an embodied a...
Recognizing instances at different scales simultaneously is a fundamenta...
In this paper, we present a novel Single Shot multi-Span Detector for
te...
Though impressive results have been achieved in visual captioning, the t...
A major challenge for video captioning is to combine audio and visual cu...
Video captioning is the task of automatically generating a textual
descr...
In this paper we introduce a fully end-to-end approach for visual tracki...
Transferring artistic styles onto everyday photographs has become an
ext...