Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network

07/25/2023
by   Chull Hwan Song, et al.
0

Many studies in vision tasks have aimed to create effective embedding spaces for single-label object prediction within an image. However, in reality, most objects possess multiple specific attributes, such as shape, color, and length, with each attribute composed of various classes. To apply models in real-world scenarios, it is essential to be able to distinguish between the granular components of an object. Conventional approaches to embedding multiple specific attributes into a single network often result in entanglement, where fine-grained features of each attribute cannot be identified separately. To address this problem, we propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone. Firstly, we employ a cross-attention mechanism to fuse and switch the information of conditions (specific attributes), and we demonstrate its effectiveness through a diverse visualization example. Secondly, we leverage the vision transformer for the first time to a fine-grained image retrieval task and present a simple yet effective framework compared to existing methods. Unlike previous studies where performance varied depending on the benchmark dataset, our proposed method achieved consistent state-of-the-art performance on the FashionAI, DARN, DeepFashion, and Zappos50K benchmark datasets.

READ FULL TEXT

page 4

page 11

page 12

page 13

research
12/27/2022

Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval

This paper proposes an attribute-guided multi-level attention network (A...
research
02/07/2020

Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

This paper strives to learn fine-grained fashion similarity. In this sim...
research
06/19/2018

FineTag: Multi-label Retrieval of Attributes at Fine-grained Level in Images

In image retrieval, the features extracted from an item are used to look...
research
04/06/2021

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

This paper strives to predict fine-grained fashion similarity. In this s...
research
04/17/2023

DETR-based Layered Clothing Segmentation and Fine-Grained Attribute Recognition

Clothing segmentation and fine-grained attribute recognition are challen...
research
05/07/2019

Intentional Attention Mask Transformation for Robust CNN Classification

Convolutional Neural Networks have achieved impressive results in variou...
research
03/25/2019

Predicting Multiple Demographic Attributes with Task Specific Embedding Transformation and Attention Network

Most companies utilize demographic information to develop their strategy...

Please sign up or login with your details

Forgot password? Click here to reset