GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

01/05/2023
by   Da Yin, et al.
0

A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical region. In fact, a significant proportion of knowledge is locally shared by people from certain regions but may not apply equally in other regions because of cultural differences. If a model is unaware of regional characteristics, it may lead to performance disparity across regions and result in bias against underrepresented groups. We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories. Motivated by the attributes, we design new pre-training objectives Image Knowledge Matching (IKM) and Image Edit Checking (IEC) to pre-train GIVL. Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V L tasks.

READ FULL TEXT

page 1

page 6

page 8

page 13

page 14

page 15

page 16

research
08/23/2023

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Building scalable vision-language models to learn from diverse, multimod...
research
08/10/2022

Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Recent advances in vision-language pre-training have demonstrated astoun...
research
03/01/2022

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Vision-and-Language (V+L) pre-training models have achieved tremendous s...
research
03/20/2022

How does the pre-training objective affect what large language models learn about linguistic properties?

Several pre-training objectives, such as masked language modeling (MLM),...
research
11/16/2021

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

Most existing methods in vision language pre-training rely on object-cen...
research
06/08/2023

COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features

With the development of the multi-media internet, visual characteristics...
research
03/16/2023

Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

For downstream applications of vision-language pre-trained models, there...

Please sign up or login with your details

Forgot password? Click here to reset