Fashion-IQ 2020 Challenge 2nd Place Team's Solution

07/13/2020
by   Minchul Shin, et al.
0

This paper is dedicated to team VAA's approach submitted to the Fashion-IQ challenge in CVPR 2020. Given a pair of the image and the text, we present a novel multimodal composition method, RTIC, that can effectively combine the text and the image modalities into a semantic space. We extract the image and the text features that are encoded by the CNNs and the sequential models (e.g., LSTM or GRU), respectively. To emphasize the meaning of the residual of the feature between the target and candidate, the RTIC is composed of N-blocks with channel-wise attention modules. Then, we add the encoded residual to the feature of the candidate image to obtain a synthesized feature. We also explored an ensemble strategy with variants of models and achieved a significant boost in performance comparing to the best single model. Finally, our approach achieved 2nd place in the Fashion-IQ 2020 Challenge with a test score of 48.02 on the leaderboard.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2019

Designovel's system description for Fashion-IQ challenge 2019

This paper describes Designovel's systems which are submitted to the Fas...
research
03/27/2020

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data

We present an approach named CurlingNet that can measure the semantic di...
research
07/24/2021

Cycled Compositional Learning between Images and Text

We present an approach named the Cycled Composition Network that can mea...
research
05/11/2022

TextMatcher: Cross-Attentional Neural Network to Compare Image and Text

We study a novel multimodal-learning problem, which we call text matchin...
research
12/20/2021

Hateful Memes Challenge: An Enhanced Multimodal Framework

Hateful Meme Challenge proposed by Facebook AI has attracted contestants...
research
12/12/2018

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

People naturally understand the emotions of-and often also empathize wit...
research
10/23/2016

Cross Device Matching for Online Advertising with Neural Feature Ensembles : First Place Solution at CIKM Cup 2016

We describe the 1st place winning approach for the CIKM Cup 2016 Challen...

Please sign up or login with your details

Forgot password? Click here to reset