Video-based dialog task is a challenging multimodal learning task that h...
We introduce a Transformer based 6D Object Pose Estimation framework
Vid...
We introduce the task of scene-aware dialog. Given a follow-up question ...
Dialog systems need to understand dynamic visual scenes in order to have...
Scene-aware dialog systems will be able to have conversations with users...