DeepAI AI Chat
Log In Sign Up

Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network

by   Haibo Su, et al.

Fashion products typically feature in compositions of a variety of styles at different clothing parts. In order to distinguish images of different fashion products, we need to extract both appearance (i.e., "how to describe") and localization (i.e.,"where to look") information, and their interactions. To this end, we propose a biologically inspired framework for image-based fashion product retrieval, which mimics the hypothesized twostream visual processing system of human brain. The proposed attentional heterogeneous bilinear network (AHBN) consists of two branches: a deep CNN branch to extract fine-grained appearance attributes and a fully convolutional branch to extract landmark localization information. A joint channel-wise attention mechanism is further applied to the extracted heterogeneous features to focus on important channels, followed by a compact bilinear pooling layer to model the interaction of the two streams. Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.


page 1

page 10


Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

Fine-grained visual recognition typically depends on modeling subtle dif...

Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval

This paper proposes an attribute-guided multi-level attention network (A...

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

This paper strives to predict fine-grained fashion similarity. In this s...

Visual Fashion-Product Search at SK Planet

We build a large-scale visual search system which finds similar product ...

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Bilinear feature transformation has shown the state-of-the-art performan...

MoNet: Moments Embedding Network

Bilinear pooling has been recently proposed as a feature encoding layer,...

Breaking Moravec's Paradox: Visual-Based Distribution in Smart Fashion Retail

In this paper, we report an industry-academia collaborative study on the...