On the Difference of BERT-style and CLIP-style Text Encoders

06/06/2023
by   Zhihong Chen, et al.
0

Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e.g., BERT, one of the representative models. Recently, contrastive language-image pretraining (CLIP) has also attracted attention, especially its vision models that achieve excellent performance on a broad range of vision tasks. However, few studies are dedicated to studying the text encoders learned by CLIP. In this paper, we analyze the difference between BERT-style and CLIP-style text encoders from three experiments: (i) general text understanding, (ii) vision-centric text understanding, and (iii) text-to-image generation. Experimental analyses show that although CLIP-style text encoders underperform BERT-style ones for general text understanding tasks, they are equipped with a unique ability, i.e., synesthesia, for the cross-modal association, which is more similar to the senses of humans.

READ FULL TEXT
research
03/21/2023

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Most humans use visual imagination to understand and reason about langua...
research
04/15/2022

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Transformer-based models are widely used in natural language understandi...
research
10/11/2022

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Contrastive Language-Image Pretraining (CLIP) efficiently learns visual ...
research
03/06/2023

IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining

Recently, large-scale Vision and Language (V&L) pretraining has become t...
research
05/26/2021

CogView: Mastering Text-to-Image Generation via Transformers

Text-to-Image generation in the general domain has long been an open pro...
research
11/30/2020

Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs

Large-scale pretraining and task-specific fine-tuning is now the standar...
research
10/02/2020

Which *BERT? A Survey Organizing Contextualized Encoders

Pretrained contextualized text encoders are now a staple of the NLP comm...

Please sign up or login with your details

Forgot password? Click here to reset