With the rise in popularity of video-based social media, new categories ...
The counting task, which plays a fundamental rule in numerous applicatio...
Referring video object segmentation aims to segment the object referred ...
Pre-training has been adopted in a few of recent works for
Vision-and-La...
Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a s...
Vision and Language Navigation (VLN) requires an agent to navigate to a
...
Vision-and-language navigation (VLN) is a multimodal task where an agent...
Accuracy of many visiolinguistic tasks has benefited significantly from ...
Vision-and-Language Navigation (VLN) requires an agent to navigate in a
...
Vision-and-Language Navigation (VLN) is unique in that it requires turni...
One of the long-term challenges of robotics is to enable humans to
commu...
With the advantage of high mobility, Unmanned Aerial Vehicles (UAVs) are...
Conventional video segmentation methods often rely on temporal continuit...