Generating Diverse and Informative Natural Language Fashion Feedback

06/15/2019
by   Gil Sadeh, et al.
0

Recent advances in multi-modal vision and language tasks enable a new set of applications. In this paper, we consider the task of generating natural language fashion feedback on outfit images. We collect a unique dataset, which contains outfit images and corresponding positive and constructive fashion feedback. We treat each feedback type separately, and train deep generative encoder-decoder models with visual attention, similar to the standard image captioning pipeline. Following this approach, the generated sentences tend to be too general and non-informative. We propose an alternative decoding technique based on the Maximum Mutual Information objective function, which leads to more diverse and detailed responses. We evaluate our model with common language metrics, and also show human evaluation results. This technology is applied within the "Alexa, how do I look?" feature, publicly available in Echo Look devices.

READ FULL TEXT
research
02/21/2020

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance i...
research
12/19/2018

Generating Diverse and Meaningful Captions

Image Captioning is a task that requires models to acquire a multi-modal...
research
05/14/2021

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning w...
research
07/14/2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Image captioning is a significant field across computer vision and natur...
research
08/17/2019

Leveraging sentence similarity in natural language generation: Improving beam search using range voting

We propose a novel method for generating natural language sentences from...
research
07/28/2021

Experimenting with Self-Supervision using Rotation Prediction for Image Captioning

Image captioning is a task in the field of Artificial Intelligence that ...
research
06/18/2019

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language ...

Please sign up or login with your details

Forgot password? Click here to reset