We study the problem of synthesizing immersive 3D indoor scenes from one...
We study the automatic generation of navigation instructions from 360-de...
Speech-based image retrieval has been studied as a proxy for joint
repre...
Image captioning datasets have proven useful for multimodal representati...
Conventional spoken language understanding systems consist of two main
c...