Vision and language models (VL) are known to exploit unrobust indicators...
We propose VALSE (Vision And Language Structured Evaluation), a novel
be...
Large-scale pretraining is fast becoming the norm in Vision-Language (VL...
The last years have shown rapid developments in the field of multimodal
...
We investigate the ability of general-purpose pretrained vision and lang...
Different metrics have been proposed to compare Abstract Meaning
Represe...