Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA

11/28/2019
by   Mikhail Fain, et al.
0

We propose a novel non-parametric method for cross-modal retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing methods on a challenging image-to-recipe task. We also use our method for comparing image and text encoders trained using different modern approaches, thus addressing the issues hindering the developments of novel methods for cross-modal recipe retrieval. We demonstrate how to use the insights from model comparison and extend our baseline model with standard triplet loss that improves SoTA on the Recipe1M dataset by a large margin, while using only precomputed features and with much less complexity than existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

Cross-modal image-recipe retrieval has gained significant attention in r...
research
03/24/2021

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Cross-modal recipe retrieval has recently gained substantial attention d...
research
06/04/2019

A Strong and Robust Baseline for Text-Image Matching

We review the current schemes of text-image matching models and propose ...
research
08/27/2023

Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment

Image-text retrieval requires the system to bridge the heterogenous gap ...
research
11/30/2022

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

Cross-modal retrieval across image and text modalities is a challenging ...
research
03/22/2021

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

Current state-of-the-art approaches to cross-modal retrieval process tex...
research
03/27/2023

Model Cascades for Efficient Image Search

Modern neural encoders offer unprecedented text-image retrieval (TIR) ac...

Please sign up or login with your details

Forgot password? Click here to reset