vsepp
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
view repo
We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by the use of hard negatives in structured prediction, and ranking loss functions used in retrieval, we introduce a simple change to common loss functions used to learn multi-modal embeddings. That, combined with fine-tuning and the use of augmented data, yields significant gains in retrieval performance. We showcase our approach, dubbed VSE++, on the MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8
READ FULL TEXT
Despite the evolution of deep-learning-based visual-textual processing
s...
read it
Cross-modal retrieval aims to learn discriminative and modal-invariant
f...
read it
Most real world applications of image retrieval such as Adobe Stock, whi...
read it
Image captioning datasets have proven useful for multimodal representati...
read it
Cross-modal hashing has been receiving increasing interests for its low
...
read it
We cast visual retrieval as a regression problem by posing triplet loss ...
read it
The predictability of social media popularity is a topic of much scienti...
read it
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
Comments
There are no comments yet.