VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training

For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88 We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer

READ FULL TEXT

page 5

page 6

page 10

research
12/12/2019

Theme-Matters: Fashion Compatibility Learning via Theme Attention

Fashion compatibility learning is important to many fashion markets such...
research
10/27/2022

Masked Vision-Language Transformer in Fashion

We present a masked vision-language transformer (MVLT) for fashion-speci...
research
05/13/2020

Fashion Recommendation and Compatibility Prediction Using Relational Network

Fashion is an inherently visual concept and computer vision and artifici...
research
07/17/2022

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Large-scale Vision-and-Language (V+L) pre-training for representation le...
research
04/17/2023

Transformer-based Graph Neural Networks for Outfit Generation

Suggesting complementary clothing items to compose an outfit is a proces...
research
03/20/2023

EVA-02: A Visual Representation for Neon Genesis

We launch EVA-02, a next-generation Transformer-based visual representat...
research
06/17/2019

Using Discriminative Methods to Learn Fashion Compatibility Across Datasets

Determining whether a pair of garments are compatible with each other is...

Please sign up or login with your details

Forgot password? Click here to reset