Disentangling visual and written concepts in CLIP

06/15/2022
by   Joanna Materzyńska, et al.
8

The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder. First, we find that the image encoder has an ability to match word images with natural images of scenes described by those words. This is consistent with previous research that suggests that the meaning and the spelling of a word might be entangled deep within the network. On the other hand, we also find that CLIP has a strong ability to match nonsense words, suggesting that processing of letters is separated from processing of their meaning. To explicitly determine whether the spelling capability of CLIP is separable, we devise a procedure for identifying representation subspaces that selectively isolate or eliminate spelling capabilities. We benchmark our methods against a range of retrieval tasks, and we also test them by measuring the appearance of text in CLIP-guided generated images. We find that our methods are able to cleanly separate spelling capabilities of CLIP from the visual processing of natural images.

READ FULL TEXT

page 1

page 2

page 5

page 7

page 8

research
11/24/2020

Insights From A Large-Scale Database of Material Depictions In Paintings

Deep learning has paved the way for strong recognition systems which are...
research
05/21/2018

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

This is the reproducibility report for the paper "Learning To Count Obje...
research
11/16/2015

Adversarial Manipulation of Deep Representations

We show that the representation of an image in a deep neural network (DN...
research
05/15/2017

WordFence: Text Detection in Natural Images with Border Awareness

In recent years, text recognition has achieved remarkable success in rec...
research
01/02/2019

Ancient Painting to Natural Image: A New Solution for Painting Processing

Collecting a large-scale and well-annotated dataset for image processing...
research
10/07/2010

Profile Based Sub-Image Search in Image Databases

Sub-image search with high accuracy in natural images still remains a ch...
research
01/31/2008

Automatic Text Area Segmentation in Natural Images

We present a hierarchical method for segmenting text areas in natural im...

Please sign up or login with your details

Forgot password? Click here to reset