VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

07/18/2017 ∙ by Fartash Faghri, et al. ∙ 0

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by the use of hard negatives in structured prediction, and ranking loss functions used in retrieval, we introduce a simple change to common loss functions used to learn multi-modal embeddings. That, combined with fine-tuning and the use of augmented data, yields significant gains in retrieval performance. We showcase our approach, dubbed VSE++, on the MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

Code Repositories

vsepp

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.