Multi-modal Cooking Workflow Construction for Food Recipes

08/20/2020
by   Liangming Pan, et al.
0

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps are multi-modal, featuring both text instructions and cooking images. We then propose a neural encoder-decoder model that utilizes both visual and textual information to construct the cooking workflow, which achieved over 20 performance gain over existing hand-crafted baselines.

READ FULL TEXT
research
05/24/2022

Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

Learning effective recipe representations is essential in food studies. ...
research
02/04/2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Despite the abundance of multi-modal data, such as image-text pairs, the...
research
12/02/2020

Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in shared subspace

Computational food analysis (CFA), a broad set of methods that attempt t...
research
05/01/2021

WfChef: Automated Generation of Accurate Scientific Workflow Generators

Scientific workflow applications have become mainstream and their automa...
research
06/02/2023

Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment analysis

Multimodal Sentiment Analysis (MSA) has been a popular topic in natural ...
research
01/08/2019

GILT: Generating Images from Long Text

Creating an image reflecting the content of a long text is a complex pro...
research
05/16/2022

Heri-Graphs: A Workflow of Creating Datasets for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media

Values (why to conserve) and Attributes (what to conserve) are essential...

Please sign up or login with your details

Forgot password? Click here to reset