Describing Sets of Images with Textual-PCA

10/21/2022
by   Oded Hupert, et al.
0

We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set. Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases. First, a centroid phrase that has the largest average semantic similarity to the images in the set is generated, where both the computation of the similarity and the generation are based on pretrained vision-language models. Then, the phrase that generates the highest variation among the similarity scores is generated, using the same models. The next phrase maximizes the variance subject to being orthogonal, in the latent space, to the highest-variance phrase, and the process continues. Our experiments show that our method is able to convincingly capture the essence of image sets and describe the individual elements in a semantically meaningful way within the context of the entire set. Our code is available at: https://github.com/OdedH/textual-pca.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

research
07/21/2016

Exploring phrase-compositionality in skip-gram models

In this paper, we introduce a variation of the skip-gram model which joi...
research
08/01/2022

Patents Phrase to Phrase Semantic Matching Dataset

There are many general purpose benchmark datasets for Semantic Textual S...
research
09/06/2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual con...
research
05/22/2017

Learning to Associate Words and Images Using a Large-scale Graph

We develop an approach for unsupervised learning of associations between...
research
03/10/2017

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Compositionality in language refers to how much the meaning of some phra...
research
08/30/2023

Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

We present Catalog Phrase Grounding (CPG), a model that can associate pr...
research
02/24/2022

Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

Affordance detection, which refers to perceiving objects with potential ...

Please sign up or login with your details

Forgot password? Click here to reset