Learning Program Representations for Food Images and Cooking Recipes

03/30/2022
by   Dim P. Papadopoulos, et al.
0

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

page 8

research
10/14/2018

Recipe1M: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

In this paper, we introduce Recipe1M, a new large-scale, structured corp...
research
02/04/2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Despite the abundance of multi-modal data, such as image-text pairs, the...
research
05/03/2019

Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images

Food computing is playing an increasingly important role in human daily ...
research
10/04/2021

Learning Structural Representations for Recipe Generation and Food Retrieval

Food is significant to human daily life. In this paper, we are intereste...
research
04/02/2020

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

Nowadays, driven by the increasing concern on diet and health, food comp...
research
10/17/2020

Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Increased awareness of the impact of food consumption on health and life...
research
06/19/2018

VirtualHome: Simulating Household Activities via Programs

In this paper, we are interested in modeling complex activities that occ...

Please sign up or login with your details

Forgot password? Click here to reset