Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

11/05/2019
by   Matthias Fontanellaz, et al.
0

Direct computer vision based-nutrient content estimation is a demanding task, due to deformation and occlusions of ingredients, as well as high intra-class and low inter-class variability between meal classes. In order to tackle these issues, we propose a system for recipe retrieval from images. The recipe information can subsequently be used to estimate the nutrient content of the meal. In this study, we utilize the multi-modal Recipe1M dataset, which contains over 1 million recipes accompanied by over 13 million images. The proposed model can operate as a first step in an automatic pipeline for the estimation of nutrition content by supporting hints related to ingredient and instruction. Through self-attention, our model can directly process raw recipe text, making the upstream instruction sentence embedding process redundant and thus reducing training time, while providing desirable retrieval results. Furthermore, we propose the use of an ingredient attention mechanism, in order to gain insight into which instructions, parts of instructions or single instruction words are of importance for processing a single ingredient within a certain recipe. Attention-based recipe text encoding contributes to solving the issue of high intra-class/low inter-class variability by focusing on preparation steps specific to the meal. The experimental results demonstrate the potential of such a system for recipe retrieval from images. A comparison with respect to two baseline methods is also presented.

READ FULL TEXT

page 2

page 5

page 6

research
07/24/2019

Self-attention based BiLSTM-CNN classifier for the prediction of ischemic and non-ischemic cardiomyopathy

Approximately 26 million individuals are suffering from heart failure, a...
research
05/11/2023

EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification

In the recent past, complex deep neural networks have received huge inte...
research
02/09/2023

Drawing Attention to Detail: Pose Alignment through Self-Attention for Fine-Grained Object Classification

Intra-class variations in the open world lead to various challenges in c...
research
04/27/2021

UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with Multi-Embedding Representation for Toxicity Highlighter

Toxic Spans Detection(TSD) task is defined as highlighting spans that ma...
research
02/07/2019

Neural Inverse Knitting: From Images to Manufacturing Instructions

Motivated by the recent potential of mass customization brought by whole...
research
06/17/2022

Local Slot Attention for Vision-and-Language Navigation

Vision-and-language navigation (VLN), a frontier study aiming to pave th...
research
03/04/2022

CoNIC Solution

Nuclei segmentation and classification has been a challenge due to the h...

Please sign up or login with your details

Forgot password? Click here to reset