A common assumption when training embodied agents is that the impact of
...
We have observed significant progress in visual navigation for embodied
...
The domain of Embodied AI, in which agents learn to complete tasks throu...
We introduce a language generative model framework for generating a styl...
In this paper we address the problem of visual reaction: the task of
int...
Visual place recognition is challenging, especially when only a few pla...
Narrated 360 videos are typically provided in many touring scenarios to
...
For survival, a living agent must have the ability to assess risk (1) by...
We propose a scalable approach to learn video-based question answering (...
A great video title describes the most salient event compactly and captu...