A big convergence of model architectures across language, vision, speech...
Pretrained general-purpose language models can achieve state-of-the-art
...
There has been an increased interest in multimodal language processing
i...
Humans convey their intentions through the usage of both verbal and nonv...
Multimodal research is an emerging field of artificial intelligence, and...