Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

04/10/2022
by   Shilin Xu, et al.
0

Human fashion understanding is one important computer vision task since it has the comprehensive information that can be used for real-world applications. In this work, we focus on joint human fashion segmentation and attribute recognition. Contrary to the previous works that separately model each task as a multi-head prediction problem, our insight is to bridge these two tasks with one unified model via vision transformer modeling to benefit each task. In particular, we introduce the object query for segmentation and the attribute query for attribute prediction. Both queries and their corresponding features can be linked via mask prediction. Then we adopt a two-stream query learning framework to learn the decoupled query representations. For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features. The decoder design shares the same spirits with DETR, thus we name the proposed method Fahsionformer. Extensive experiments on three human fashion datasets including Fashionpedia, ModaNet and Deepfashion illustrate the effectiveness of our approach. In particular, our method with the same backbone achieve relative 10 joint metric ( AP^mask_IoU+F_1) for both segmentation and attribute recognition. To the best of our knowledge, we are the first unified end-to-end vision transformer framework for human fashion analysis. We hope this simple yet effective method can serve as a new flexible baseline for fashion analysis. Code will be available at https://github.com/xushilin1/FashionFormer.

READ FULL TEXT

page 4

page 6

page 13

page 14

research
04/10/2022

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and...
research
03/08/2021

End-to-End Human Object Interaction Detection with HOI Transformer

We propose HOI Transformer to tackle human object interaction (HOI) dete...
research
04/26/2020

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

In this work we explore the task of instance segmentation with attribute...
research
12/13/2015

Deep Relative Attributes

Visual attributes are great means of describing images or scenes, in a w...
research
01/03/2023

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part ...
research
04/06/2021

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

This paper strives to predict fine-grained fashion similarity. In this s...
research
09/17/2018

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

Human parsing has received considerable interest due to its wide applica...

Please sign up or login with your details

Forgot password? Click here to reset