Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism

03/09/2020
by   Hao Wang, et al.
14

Cross-modal food retrieval is an important task to perform analysis of food-related information, such as food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, so that precise matching can be realized. Compared with existing cross-modal retrieval approaches, two major challenges in this specific problem are: 1) the large intra-class variance across cross-modal food data; and 2) the difficulties in obtaining discriminative recipe representations. To address these problems, we propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities by aligning output semantic probabilities. In addition, we exploit self-attention mechanism to improve the embedding of recipes. We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and the result shows that it outperforms the state-of-the-art.

READ FULL TEXT

page 2

page 7

page 8

research
05/03/2019

Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images

Food computing is playing an increasingly important role in human daily ...
research
04/20/2022

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

Cross-modal image-recipe retrieval has gained significant attention in r...
research
10/14/2018

Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

In this paper, we introduce Recipe1M, a new large-scale, structured corp...
research
04/02/2020

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

Nowadays, driven by the increasing concern on diet and health, food comp...
research
03/24/2021

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Cross-modal recipe retrieval has recently gained substantial attention d...
research
12/02/2020

Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in shared subspace

Computational food analysis (CFA), a broad set of methods that attempt t...
research
11/30/2022

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

Cross-modal retrieval across image and text modalities is a challenging ...

Please sign up or login with your details

Forgot password? Click here to reset