Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images

05/03/2019
by   Hao Wang, et al.
34

Food computing is playing an increasingly important role in human daily life, and has found tremendous applications in guiding human behavior towards smart food consumption and healthy lifestyle. An important task under the food-computing umbrella is retrieval, which is particularly helpful for health related applications, where we are interested in retrieving important information about food (e.g., ingredients, nutrition, etc.). In this paper, we investigate an open research task of cross-modal retrieval between cooking recipes and food images, and propose a novel framework Adversarial Cross-Modal Embedding (ACME) to resolve the cross-modal retrieval task in food domains. Specifically, the goal is to learn a common embedding feature space between the two modalities, in which our approach consists of several novel ideas: (i) learning by using a new triplet loss scheme together with an effective sampling strategy, (ii) imposing modality alignment using an adversarial learning strategy, and (iii) imposing cross-modal translation consistency such that the embedding of one modality is able to recover some important information of corresponding instances in the other modality. ACME achieves the state-of-the-art performance on the benchmark Recipe1M dataset, validating the efficacy of the proposed technique.

READ FULL TEXT

page 4

page 7

page 8

research
03/09/2020

Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism

Cross-modal food retrieval is an important task to perform analysis of f...
research
04/02/2020

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

Nowadays, driven by the increasing concern on diet and health, food comp...
research
04/14/2023

Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning

Food image-to-recipe aims to learn an embedded space linking the rich se...
research
10/17/2020

Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Increased awareness of the impact of food consumption on health and life...
research
03/30/2022

Learning Program Representations for Food Images and Cooking Recipes

In this paper, we are interested in modeling a how-to instructional proc...
research
04/20/2022

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

Cross-modal image-recipe retrieval has gained significant attention in r...
research
10/04/2021

Learning Structural Representations for Recipe Generation and Food Retrieval

Food is significant to human daily life. In this paper, we are intereste...

Please sign up or login with your details

Forgot password? Click here to reset