Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

05/30/2019
by   Pranav Aggarwal, et al.
0

Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method for learning text-visual embedding using both image titles and click-through data from an image search engine. We also propose a new triplet loss function by modeling positive awareness of the embedding, and introduce a novel mini-batch-based hard negative sampling approach for better data efficiency in the learning process. Experimental results show that our proposed method outperforms existing methods, and is also effective for real-world text-to-visual retrieval.

READ FULL TEXT
research
10/21/2022

Dissecting Deep Metric Learning Losses for Image-Text Retrieval

Visual-Semantic Embedding (VSE) is a prevalent approach in image-text re...
research
03/10/2023

Semantic-Preserving Augmentation for Robust Image-Text Retrieval

Image text retrieval is a task to search for the proper textual descript...
research
06/23/2016

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

In this paper we tackle the problem of image search when the query is a ...
research
03/01/2023

The style transformer with common knowledge optimization for image-text retrieval

Image-text retrieval which associates different modalities has drawn bro...
research
10/05/2022

Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective

Visual-Semantic Embedding (VSE) aims to learn an embedding space where r...
research
10/22/2021

Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering

This paper introduces a two-phase deep feature engineering framework for...
research
05/08/2023

Scene Text Recognition with Image-Text Matching-guided Dictionary

Employing a dictionary can efficiently rectify the deviation between the...

Please sign up or login with your details

Forgot password? Click here to reset