Measuring Progress in Fine-grained Vision-and-Language Understanding

05/12/2023
by   Emanuele Bugliarello, et al.
2

While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images. This has resulted in an increased interest in the community to either develop new benchmarks or models for such capabilities. To better understand and quantify progress in this direction, we investigate four competitive V L models on four fine-grained benchmarks. Through our analysis, we find that X-VLM (Zeng et al., 2022) consistently outperforms other baselines, and that modelling innovations can impact performance more than scaling Web data, which even degrades performance sometimes. Through a deeper investigation of X-VLM, we highlight the importance of both novel losses and rich data sources for learning fine-grained skills. Finally, we inspect training dynamics, and discover that for some tasks, performance peaks early in training or significantly fluctuates, never converging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

Recent work in vision-and-language pretraining has investigated supervis...
research
04/25/2017

Fine-Grained Entity Typing with High-Multiplicity Assignments

As entity type systems become richer and more fine-grained, we expect th...
research
12/16/2021

Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context

Physical measurements constitute a large portion of numbers in academic ...
research
07/24/2023

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

For more than a decade, researchers have measured progress in object rec...
research
03/31/2022

A 23 MW data centre is all you need

The field of machine learning has achieved striking progress in recent y...
research
12/29/2020

Tips and Tricks for Webly-Supervised Fine-Grained Recognition: Learning from the WebFG 2020 Challenge

WebFG 2020 is an international challenge hosted by Nanjing University of...
research
08/11/2022

Regressing Relative Fine-Grained Change for Sub-Groups in Unreliable Heterogeneous Data Through Deep Multi-Task Metric Learning

Fine-Grained Change Detection and Regression Analysis are essential in m...

Please sign up or login with your details

Forgot password? Click here to reset