The recent breakthroughs in natural language processing for model pretra...
This paper studies a simple extension of image-based Masked Autoencoders...
State-of-the-art vision and vision-and-language models rely on large-sca...
We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-sca...
We introduce Habitat 2.0 (H2.0), a simulation platform for training virt...
Performance on the most commonly used Visual Question Answering dataset ...
A crucial component for the scene text based reasoning required for Text...