Employing additional multimodal information to improve automatic speech
...
Large language models (LLMs) have demonstrated remarkable language abili...
Large-scale pre-trained language models (PLMs) with powerful language
mo...
In the past few years, the emergence of vision-language pre-training (VL...
Visual Dialog is a challenging vision-language task since the visual dia...
In the past few years, the emergence of pre-training models has brought
...
Visual dialogue is a challenging task since it needs to answer a series ...
Visual dialog, which aims to hold a meaningful conversation with humans ...
Visual dialog is challenging since it needs to answer a series of cohere...
Visual Dialog is a vision-language task that requires an AI agent to eng...