A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

03/31/2022
by   Vishal Pallagani, et al.
0

Food is not only a basic human necessity but also a key factor driving a society's health and economic well-being. As a result, the cooking domain is a popular use-case to demonstrate decision-support (AI) capabilities in service of benefits like precision health with tools ranging from information retrieval interfaces to task-oriented chatbots. An AI here should understand concepts in the food domain (e.g., recipes, ingredients), be tolerant to failures encountered while cooking (e.g., browning of butter), handle allergy-based substitutions, and work with multiple data modalities (e.g. text and images). However, the recipes today are handled as textual documents which makes it difficult for machines to read, reason and handle ambiguity. This demands a need for better representation of the recipes, overcoming the ambiguity and sparseness that exists in the current textual documents. In this paper, we discuss the construction of a machine-understandable rich recipe representation (R3), in the form of plans, from the recipes available in natural language. R3 is infused with additional knowledge such as information about allergens and images of ingredients, possible failures and tips for each atomic cooking step. To show the benefits of R3, we also present TREAT, a tool for recipe retrieval which uses R3 to perform multi-modal reasoning on the recipe's content (plan objects - ingredients and cooking tools), food preparation process (plan actions and time), and media type (image, text). R3 leads to improved retrieval efficiency and new capabilities that were hither-to not possible in textual representation.

READ FULL TEXT

page 3

page 4

page 5

research
08/04/2019

Improving IT Support by Enhancing Incident Management Process with Multi-modal Analysis

IT support services industry is going through a major transformation wit...
research
12/02/2020

Cross-modal Retrieval and Synthesis (X-MRS): Closing the modality gap in shared subspace

Computational food analysis (CFA), a broad set of methods that attempt t...
research
02/04/2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Despite the abundance of multi-modal data, such as image-text pairs, the...
research
05/24/2022

Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks

Learning effective recipe representations is essential in food studies. ...
research
10/23/2018

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval

Automatic art analysis has been mostly focused on classifying artworks i...
research
06/14/2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Recent research on Large Language Models (LLMs) has led to remarkable ad...
research
10/26/2018

Investigating non-classical correlations between decision fused multi-modal documents

Correlation has been widely used to facilitate various information retri...

Please sign up or login with your details

Forgot password? Click here to reset