Outside-knowledge visual question answering is a challenging task that
r...
Despite the excellent performance of large-scale vision-language pre-tra...
Visual Question Answering (VQA) models are prone to learn the shortcut
s...
Models for Visual Question Answering (VQA) often rely on the spurious
co...
Visual dialog has witnessed great progress after introducing various
vis...
While sophisticated Visual Question Answering models have achieved remar...
Emotion Recognition in Conversation (ERC) is a more challenging task tha...
Zero-shot intent detection (ZSID) aims to deal with the continuously eme...
Commonsense Reading Comprehension (CRC) is a significantly challenging t...