CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

02/04/2021
by   Hai X. Pham, et al.
7

Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall as a visual-linguistic association problem. More specifically, we introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks. This model allows one to discover complex functional and hierarchical relationships between images and text, and among textual parts of a recipe including title, ingredients and cooking instructions. Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are not only able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision, but we can also learn more meaningful feature representations of food recipes, appropriate for challenging cross-modal retrieval and recipe adaption tasks.

READ FULL TEXT

page 6

page 10

page 11

page 15

page 16

page 17

page 18

page 19

research
11/17/2017

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in bo...
research
03/24/2021

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Cross-modal recipe retrieval has recently gained substantial attention d...
research
12/02/2020

Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in shared subspace

Computational food analysis (CFA), a broad set of methods that attempt t...
research
03/30/2022

Learning Program Representations for Food Images and Cooking Recipes

In this paper, we are interested in modeling a how-to instructional proc...
research
08/20/2020

Multi-modal Cooking Workflow Construction for Food Recipes

Understanding food recipe requires anticipating the implicit causal effe...
research
03/31/2022

A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

Food is not only a basic human necessity but also a key factor driving a...
research
06/01/2017

Grounding Symbols in Multi-Modal Instructions

As robots begin to cohabit with humans in semi-structured environments, ...

Please sign up or login with your details

Forgot password? Click here to reset