An open-source food image embedding model
Nutrient-based meal recommendations have the potential to help individuals prevent or manage conditions such as diabetes and obesity. However, learning people's food preferences and making recommendations that simultaneously appeal to their palate and satisfy nutritional expectations are challenging. Existing approaches either only learn high-level preferences or require a prolonged learning period. We propose Yum-me, a personalized nutrient-based meal recommender system designed to meet individuals' nutritional expectations, dietary restrictions, and fine-grained food preferences. Yum-me enables a simple and accurate food preference profiling procedure via a visual quiz-based user interface, and projects the learned profile into the domain of nutritionally appropriate food options to find ones that will appeal to the user. We present the design and implementation of Yum-me, and further describe and evaluate two innovative contributions. The first contriution is an open source state-of-the-art food image analysis model, named FoodDist. We demonstrate FoodDist's superior performance through careful benchmarking and discuss its applicability across a wide array of dietary applications. The second contribution is a novel online learning framework that learns food preference from item-wise and pairwise image comparisons. We evaluate the framework in a field study of 227 anonymous users and demonstrate that it outperforms other baselines by a significant margin. We further conducted an end-to-end validation of the feasibility and effectiveness of Yum-me through a 60-person user study, in which Yum-me improves the recommendation acceptance rate by 42.63READ FULL TEXT VIEW PDF
Food recommender systems play an important role in assisting users to
Food recommender systems play an important role in assisting users to
Salt is consumed at too high levels in the general population, causing h...
Diet management is key to managing chronic diseases such as diabetes.
Food-choices and eating-habits directly contribute to our long-term heal...
In mobile crowdsensing, finding the best match between tasks and users i...
With the exponential growth in the usage of social media to share live
An open-source food image embedding model
Yum-me is a nutrient based food recommendation system
Healthy eating plays a critical role in our daily well-being and is indispensable in preventing and managing conditions such as diabetes, high blood pressure, cancer, mental illnesses, and asthma, etc. (Povey and Clark-Carter, 2007; Bodnar and Wisner, 2005). In particular, for children and young people, the adoption of healthy dietary habits has been shown to be beneficial to early cognitive development (Shepherd et al., 2006). Many applications designed to promote healthy behaviors have been proposed and studied (Kadomura et al., 2014; Chang et al., 2014; Kadomura et al., 2013; Consolvo et al., 2008). Among those applications, the studies and products that target healthy meal recommendations have attracted much attention (van Pinxteren et al., 2011; platejoy, 2016). Fundamentally, the goal of these systems is to suggest food alternatives that cater to individuals’ health goals and help users develop healthy eating behavior by following the recommendations (zipongo, 2015). Akin to most recommender systems, learning users’ preferences is a necessary step in recommending healthy meals that users are more likely to find desirable (zipongo, 2015). However, the current food preference elicitation approaches, including 1) on-boarding surveys and 2) food journaling, still suffer from major limitations, as discussed below.
Preferences elicited by surveys are coarse-grained. A typical on-boarding survey asks a number of multi-choice questions about general food preferences. For example, PlateJoy (platejoy, 2016), a daily meal planner app, elicits preferences for healthy goals and dietary restrictions with the following questions:
(1) How do you prefer to eat? No restrictions, dairy free, gluten free, kid friendly, pescatarian, paleo, vegetarian…
(2) Are there any ingredients you prefer to avoid? avocado, eggplant, eggs, seafood, shellfish, lamb, peanuts, tofu….
While the answers to these questions can and should be used to create a rough dietary plan and avoid clearly unacceptable choices, they do not generate meal recommendations that cater to each person’s fine-grained food preferences, and this may contribute to their lower than desired recommendation-acceptance rates, as suggested by our user testing results.
Food journaling approach suffers from cold-start problem and is hard to maintain. For example, Nutrino (nutrino, 2016), a personal meal recommender, asks users to log their daily food consumption and learn users’ fine-grained food preferences. As is typical of systems relying on user-generated data, food journaling suffers from the cold-start problem, where recommendations cannot be made or are subject to low accuracy when the user has not yet generated a sufficient amount of data. For example, a previous study showed that an active food-journaling user makes about 3.5 entries per day (Cordeiro et al., 2015). It would take a non-trivial amount of time for the system to acquire sufficient data to make recommendations, and the collected samples may be subject to sampling biases as well (Cordeiro et al., 2015; Klesges et al., 1995). Moreover, the photo food journaling of all meals is a habit difficult to adopt and maintain, and therefore is not a generally applicable solution to generate complete food inventories (Cordeiro et al., 2015).
To tackle these limitations, we develop Yum-me, a meal recommender that learns fine-grained food preferences without relying on the user’s dietary history. We leverage people’s apparent desire to engage with food photos111Collecting, sharing and appreciating high quality, delicious-looking food images is a growing fashion in our everyday lives. For example, food photos are immensely popular on Instagram ( #food has over 177M posts and #foodporn has over 91M posts at the time of writing). to create a more user-friendly medium for asking visually-based diet-related questions - The recommender learns users’ fine-grained food preferences through a simple quiz-based visual interface (Yang et al., 2015) and then attempts to generate meal recommendations that cater to the user’s health goals, food restrictions, as well as personal appetite for food. It can be used by people who have food restrictions, such as vegetarian, vegan, kosher, or halal. Particularly, we focus on the health goals in the form of nutritional expectations, e.g. adjusting calories, protein, and fat intake. The mapping from health goals to nutritional expectations can be accomplished by professional nutritionists or personal coaches and is out of the scope of this paper. We leave it as future work. In designing the visual interface (Yang et al., 2015), we propose a novel online learning framework that is suitable for learning users’ potential preferences for a large number of food items while requiring only a modest number of interactions. Our online learning approach balances exploitation-exploration and takes advantage of food similarities through preference-propagation among locally connected graphs. To the best of our knowledge, this is the first interface and algorithm that learns users’ food preferences through real-time interactions without requiring specific diet history information.
For such an online learning algorithm to work, one of the most critical components is a robust food image analysis model. Towards that end, as an additional contribution of this work we present a novel, unified food image analysis model, called FoodDist. Based on deep convolutional networks and multi-task learning (Krizhevsky et al., 2012; Bossard et al., 2014), FoodDist is the best-of-its-kind Euclidean distance embedding for food images, in which similar food items have smaller distances while dissimilar food items have larger distances. FoodDist allows the recommender to learn users’ fine-grained food preferences accurately via similarity assessments on food images. Besides preference learning, FoodDist can be applied to other food-image-related tasks, such as food image detection, classification, retrieval, and clustering. We benchmark FoodDist with the Food-101 dataset (Bossard et al., 2014), the largest dataset for food images. The results suggest the superior performance of FoodDist over prior approaches (Yang et al., 2015; Meyers et al., 2015; Bossard et al., 2014). FoodDist will be made available on Github upon publication.
We evaluate our online learning framework in a field study of 227 anonymous users and we show that it is able to predict the food items that a user likes or dislikes with high accuracy. Furthermore, we evaluate the desirability of Yum-me recommendations end-to-end through a 60-person user study, where each user rates the meal recommendations made by Yum-me relative to those made using a traditional survey-based approach. The study results show that, compared to the traditional survey based recommender, our system significantly improves the acceptance rate of the recommended healthy meals by 42.63%. We see Yum-me as a complement to the existing food preference elicitation approaches that further filters the food items selected by a traditional onboarding survey based on users’ fine-grained taste for food, and allows a system to serve tailored recommendations upon the first use of the system. We discuss some potential use cases in section 7.
The rest of the paper is organized as follows. After discussing related work in section 2, we introduce the structure of Yum-me and our backend database in section 3. In section 4, we describe the algorithmic details of the proposed online learning algorithm, followed by the architecture of FoodDist model in section 5. The evaluation results of each component, as well as the recommender are presented in section 6. Finally, we discuss the limitations, potential impact and real world applications in section 7 and conclude in section 8.
Our work benefits from, and is relevant to, multiple research threads: (1) healthy meal recommender system, (2) cold-start problem and preference elicitation, (3) pairwise algorithms for recommendation, and (4) food image analysis, which will be surveyed in detail next.
Traditional food and recipe recommender systems learn users’ dietary preferences from their online activities, including ratings (Forbes and Zhu, 2011; Freyne and Berkovsky, 2010; Harvey et al., 2013; Elsweiler and Harvey, 2015), past recipe choices (Svensson et al., 2005; Geleijnse et al., 2011), and browsing history (Ueda et al., 2014; van Pinxteren et al., 2011; nutrino, 2016). For example, (Svensson et al., 2005) builds a social navigation system that recommends recipes based on the previous choices made by the user; (van Pinxteren et al., 2011) proposes to learn a recipe similarity measure from crowd card-sorting and make recommendations based on the self-reported meals; and (Harvey et al., 2013; Elsweiler and Harvey, 2015) generates healthy meal plans based on user’s ratings towards a set of recipes and the nutritional requirements calculated for the persona. In addition, previous recommenders also seek to incorporate users’ food consumption histories recorded by the food logging and journaling systems (e.g. taking food images (Cordeiro et al., 2015) or writing down ingredients and meta-information (van Pinxteren et al., 2011)).
The above systems, while able to learn users’ detailed food preference, share a common limitation, that is they need to wait until a user generates enough data before their recommendations can be effective for this user (i.e., the cold-start problem). Therefore, most commercial applications, for example, Zipongo (zipongo, 2016) and Shopwell (shopwell, 2016) adopt onboarding surveys to more quickly elicit users’ coarse-grained food preferences. For instance, Zipongo’s questionnaires (zipongo, 2016) ask users about their nutrient intake, lifestyle, habits, and food preferences, and then make day-to-day and week-to-week healthy meals recommendations; ShopWell’s survey (shopwell, 2016) are designed to avoid certain food allergens, e.g., gluten, fish, corn, or poultry, and find meals that match to particular lifestyles, e.g., healthy pregnancy or athletic training.
Yum-me fills a vacuum that the prior approaches were not able to achieve, namely a rapid elicitation of users’ fine-grained food preferences for immediate healthy meal recommendations. Based on the online learning framework (Yang et al., 2015), Yum-me infers users’ preferences for each single food item among a large food dataset, and projects these preferences for general food items into the domain that meets each individual user’s health goals.
To alleviate the cold-start problem mentioned above, several models of preference elicitation have been proposed in recent years. The most prevalent method of elicitation is to train decision trees to poll users in a structured fashion(Rashid et al., 2002; Golbandi et al., 2011; Zhou et al., 2011; Das et al., 2013; Sun et al., 2013). These questions are either generated in advance and remain static (Rashid et al., 2002) or change dynamically based on real-time user feedback (Golbandi et al., 2011; Zhou et al., 2011; Das et al., 2013; Sun et al., 2013). Also, another previous work explores the possibility of eliciting item ratings directly from the user (Zhang et al., 2015; Chang et al., 2015). This process can either be carried at item-level (Zhang et al., 2015) or within-category (e.g., movies) (Chang et al., 2015).
The preference elicitation methods we mentioned above largely focus on the domain of movie recommendations (Sun et al., 2013; Rashid et al., 2002; Chang et al., 2015; Zhang et al., 2015) and visual commerce (Das et al., 2013) (e.g., cars, cameras) where items can be categorized based on readily available metadata. When it comes to real dishes, however, categorical data (e.g., cuisines) and other associated information (e.g., cooking time) possess a much weaker connection to a user’s food preferences. Therefore, in this work, we leverage the visual representation of each meal so as to better capture the process through which people make diet decisions.
Pairwise approaches (Rendle and Freudenthaler, 2014; Park and Chu, 2009; Rendle et al., 2009; Hsieh et al., 2017; Yang et al., 2017; Weston et al., 2010, 2013) are widely studied in recommender system literature. For example, Bayesian Personalized Ranking (BPR) (Rendle et al., 2009; Rendle and Freudenthaler, 2014) and Weighted Approximate-Rank Pairwise (WARP) loss (Weston et al., 2010), which learn users’ and items’ representations from user-item pairs, are two representative and popular approaches under this category. Such algorithms have successfully powered many state-of-the-art systems (Hsieh et al., 2017; Weston et al., 2013). In terms of the cold-start scenario, (Park and Chu, 2009) developed a pairwise method to leverage users’ demographic information in recommending new items.
Compared to previous methods, our problem setting is fundamentally different in the sense that Yum-me elicits preferences in an active manner where the input is incremental and contingent on the previous decisions made by the algorithm, while prior work focuses on the static circumstances where the training data is available up-front, and there is no need for the system to actively interact with the user.
The tasks of analyzing food images are very important in many ubiquitous dietary applications that actively or passively collect food images from mobile (Cordeiro et al., 2015) and wearable (Arab et al., 2011; Thomaz et al., 2013), (Ng et al., 2015)
devices. The estimation of food intake and its nutritional information is helpful to our health(Noronha et al., 2011) as it provides detailed records of our dietary history. Previous work mainly conducted the analysis by leveraging the crowd (Noronha et al., 2011; Turner-McGrievy et al., 2015)
and computer vision algorithms(Bossard et al., 2014; Meyers et al., 2015).
Noronha et al. (Noronha et al., 2011) crowdsourced nutritional analysis of food images by leveraging the wisdom of untrained crowds. The study demonstrated the possibility of estimating a meal’s calories, fat, carbohydrates, and protein by aggregating the opinions from a large number of people; (Turner-McGrievy et al., 2015) elicit the crowd to rank the healthiness of several food items and validate the results against the ground truth provided by trained observers. Although this approach has been justified to be accurate, it inherently requires human resources that restrict it from scaling to large number of users and providing real time feedback.
To overcome the limitations of crowds and automate the analysis process, numerous papers discussing algorithms for food image analysis, including classification (Bossard et al., 2014; Meyers et al., 2015; Kawano and Yanai, 2014; Beijbom et al., 2015), retrieval (Kitamura et al., 2009), and nutrient estimation (Meyers et al., 2015; Sudo et al., 2014; Chae et al., 2011; He et al., 2013). Most of the previous work (Bossard et al., 2014) leveraged hand-crafted image features. However, traditional approaches were only demonstrated in special contexts, such as in a specific restaurant (Beijbom et al., 2015) or for particular type of cuisine (Kawano and Yanai, 2014) and the performance of the models might degrade when they are applied to food images in the wild.
In this paper, we designed FoodDist using deep convolutional neural network based multitask learning(Caruana, 1997), which has been shown to be successful in improving model generalization power and performance in several applications (Zhang et al., 2014; Dai et al., 2015). The main challenge of multitask learning is to design appropriate network structures and sharing mechanisms across tasks. With our proposed network structure, we show that FoodDist achieves superior performance when applied to the largest available real-world food image dataset (Bossard et al., 2014), and when compared to prior approaches.
Our personalized nutrient-based meal recommendation system, Yum-me, operates over a given inventory of food items and suggests the items that will appeal to the users’ palate and meet their nutritional expectations and dietery restrictions. A high-level overview of Yum-me’s recommendation process is shown in Fig. 1 and briefly described as follows:
Step 1: Users answer a simple survey to specify their dietary restrictions and nutritional expectations. This information is used by Yum-me to filter food items and create an initial set of recommendation candidates.
Step 2: Users then use an adaptive visual interface to express their fine-grained food preferences through simple comparisons of food items. The learned preferences are used to further re-rank the recommendations presented to them.
In the rest of this section, we describe our backend large-scale food database and aforementioned two recommendation steps: 1) a user survey that elicits user’s dietary restrictions and nutritional expectations, and 2) an adaptive visual interface that elicits users’ fine-grained food preferences.
To account for the dietary restrictions in many cultures and religions, or people’s personal choices, we prepare a separate food database for each of the following dietary restrictions:
No restrictions, Vegetarian, Vegan, Kosher, Halal 222Our system is not restricted to these five dietary restrictions and we will extend the system functionalities to other categories in the future.
For each diet type, we pulled over 10,000 main dish recipes along with their images and metadata (ingredients, nutrients, tastes, etc.) from the Yummly API (Yummly, 2016). The total number of recipes is around 50,000. In order to customize food recommendations for people with specific dietary restrictions, e.g., vegetarian and vegan, we filter recipes by setting the allowedDiet parameter in the search API. For kosher or halal, we explicitly rule out certain ingredients by setting excludedIngredient parameter. The lists of excluded ingredients are shown as below:
Kosher: pork, rabbit, horse meat, bear, shellfish, shark, eel, octopus, octopuses, moreton bay bugs, frog.
Halal: pork, blood sausage, blood, blood pudding, alcohol, grain alcohol, pure grain alcohol, ethyl alcohol.
One challenge in using a public food image API is that many recipes returned by the API contain non-food images and incomplete nutritional information. Therefore, we further filter the items with the following criteria: the recipe should have 1) nutritional information of calories, protein and fat, and 2) at least one food image. In order to automate this process, we build a binary classifier based on a deep convolutional neural network to filter out non-food images. As suggested by(Meyers et al., 2015), we treat the whole training set of Food-101 dataset (Bossard et al., 2014) as one generic food
category and sampled the same number of images (75,750) from the ImageNet dataset(Deng et al., 2009) as our non-food category. We took the pretrained VGG CNN model (Simonyan and Zisserman, 2014)
and replaced the final 1000 dimensional softmax with a single logistic node. For the validation, we use the Food-101 testing dataset along with the same number of images sampled from ImageNet (25,250). We trained the binary classifier using the Caffe framework(Jia et al., 2014) and it reached 98.7% validation accuracy. We applied the criteria to all the datasets and the final statistics are shown in Table. 1.
Fig. 2 shows the visualizations of the collected datasets. For each of the recipe images, we embed it into an 1000-dimensional feature space using FoodDist (described later in Section 5) and then project all the images onto a 2-D plane using t-Distributed Stochastic Neighbor Embedding(t-SNE) (Van der Maaten and Hinton, 2008). For visibility, we further divide the 2-D plane into several blocks; from each of which, we sample a representative food image residing in that block to present in the figure. Fig. 2 demonstrates the large diversity and coverage of the collected datasets. Also, the embedding results clearly demonstrate the effectiveness of FoodDist in grouping similar food items together while pushing dissimilar items away. This is important to the performance of Yum-me, as discussed in Section 6.3.
|Database||Original size||Final size|
The user survey is designed to elicit user’s high-level dietary restrictions and nutritional expectations. Users can specify their dietary restrictions among the five categories mentioned-above and indicate their nutritional expectations in terms of the desired amount of calories, protein and fat. We choose these nutrients for their high relevance to many common health goals, such as weight control (Epstein et al., 1985), sports performance (Brotherhood, 1984), etc. We provide three options for each of these nutrients, including reduce, maintain, and increase. The user’s diet type is used to select the appropriate food dataset, and the food items in the dataset are further ranked by their suitability to users’ health goals based on the nutritional facts.
To measure the suitability of food items given users’ nutritional expectations, we rank the recipes in terms of different nutrients in both ascending and descending order, such that each recipe is associated with six ranking values, i.e., , , , , and , where and stand for ascending and descending respectively. The final suitability value for each recipe given the health goal is calculated as follows:
where . The indicator coefficient nutrient is rated as reduce and nutrient is rated as increase. Otherwise and . If user’s goal is to maintain all nutrients, then all recipes are given equal rankings. Eventually, given a user’s responses to the survey, we rank the suitability of all the recipes in the corresponding database and select top- items (around top 10%) as the candidate pool of healthy meals for this user. In our initial prototype, we set .
Based on the food suitability ranking, a candidate pool of healthy meals is created. However, not all the meals in this candidate pool will suit the user’s palate. Therefore, we design an adaptive visual interface to further identify recipes that cater to the user’s taste through eliciting their fine-grained food preferences. We propose to learn users’ fine-grained food preferences by presenting users with food images and ask them to choose ones that look delicious.
Formally, the food preference learning task can be defined as: given a large target set of food items , we represent user’s preferences as a distribution over all the possible food items, i.e. , where each element denotes the user’s favorable scale for item . Since the number of items, , is usually quite large and intractable to elicit individually from the user 333The target set is often the whole food database that different applications use. For example, the size of Yummly database can be up to 1-million (Yummly, 2016)., the approach we take is to adaptively choose a specific and much smaller subset
to present to the user, and propagate the users’ preferences for those items to the rest items based on their visual similarity. Specifically, as Fig.1 shows, the preference elicitation process can be divided into two phases:
Phase I: In each of the first 2 iterations, we present ten food images and ask users to tap on all the items that look delicious to them.
Phase II: In each of the subsequent iterations, we present a pair of food images and ask users to either compare the food pair and tap on the one that looks delicious to them or tap on “Yuck” if neither of the items appeal to their taste.
We model the interaction between the user and our backend system at iteration as Fig. 3 shows. The symbols that will be used in our algorithms are defined as follows:
Set of food items that are presented to user at iteration (). , ;
Set of food items that user prefer(select) among . ;
User’s preference distribution on all food items at iteration , where . is initialized as ;
Set of food images that have been already explored until iteration (). ;
Based on the workflow depicted in Fig. 3, for each iteration , the backend system updates vector to and set to based on users’ selections and previous image set . After that, it decides the set of images that will be immediately presented to the user (i.e., ). Our food preference elicitation framework can be formalized in Algorithm. 1. The core procedures are update and select, which will be described in the following subsections for more details.
Based on user’s selections and the image set , the update module renews user’s state from to . Our intuition and assumption behind following algorithm design is that people tend to have close preferences for similar food items.
Preference vector : Our strategy of updating preference vector is inspired by Exponentiated Gradient Algorithm in bandit settings (EXP3) (Auer et al., 2002). Specifically, at iteration , each in vector is updated by:
where is the exponentiated coefficient that controls update speed and is the update vector used to adjust each preference value.
In order to calculate update vector , we formalize the user’s selection process as a data labeling problem (Zhou et al., 2004) where for item , label and for item , label . Thus, the label vector provided by the user is:
For update vector , we expect that it is close to label vector but with smooth propagation of label values to nearby neighbors (For convenience, we omit superscript that denotes current iteration). The update vector can be regarded as a soften label vector compared with . To make the solution more computationally tractable, for each item with , we construct a locally connected undirected graph as Fig. 4 shows: , add an edge if . The labels for vertices in graph are calculated as .
For each locally connected graph , we fix value as and propose the following regularized optimization method to compute other elements () of update vector , which is inspired by the traditional label propagation method (Zhou et al., 2004). Consider the problem of minimizing following objective function :
In Eqn. (4), represents the similarity measure between food item and :
The first term of the objective function is the smoothness constraint as the update value for similar food items should not change too much. The second term is the fitting constraint, which makes close to the initial labeling assigned by user (i.e. ). However, unlike (Zhou et al., 2004), in our algorithm, the trade-off between these two constraints is dynamically adjusted by the similarity between item and where similar pairs are weighed more with smoothness and dissimilar pairs are forced to be close to initial labeling.
With Eqn. (4) being defined, we can take the partial derivative of with respect to different as follows:
As , then:
After all are calculated, the original update vector is then the sum of , i.e. . The pseudo code for the algorithm of updating preference vector is shown in Algorithm.2 for details.
Explored food image set : In order to balance the exploitation and exploration in image selection phase, we maintain a set that keeps track of all similar food items that have already been visited by user and the updating rule for is as follows:
With the algorithms designed for updating preference vector and explored image set , the overall functionality of procedure update is shown in Algorithm.2.
After updating user state, the select module then picks food images that will be presented in the next round. In order to trade-off between exploration and exploitation in our algorithm, we propose different images selection strategies based on current iteration .
For each of the first two iterations, we select ten different food images by using K-means++ (Arthur and Vassilvitskii, 2007) algorithm, which is a seeding method used in K-means clustering and can guarantee that selected items are evenly distributed in the feature space. For our use case, K-means++ algorithm is summarized in Algorithm.3.
Starting from the third iteration, users are asked to make pairwise comparisons between food images. To balance the Exploitation and Exploration, we always select one image from the area with higher preference value based on current and another one from unexplored area, i.e. . (Both selections are random in a given subset of food items). With above explanations, the image selection method we propose in this application is shown in Algorithm 4.
Formally, the goal of FoodDist is to learn a feature extractor (embedding) such that given an image , projects it to an dimensional feature vector for which the Euclidean distance to other such vectors will reflect the similarities between food images, as Fig. 5 shows. Formally speaking, if image is more similar to image than image , then .
We build FoodDist based on recent advances in deep Convolutional Neural Networks (CNN), which provide a powerful framework for automatic feature learning. Traditional feature representations for images are mostly hand-crafted, and were used with feature descriptors, such as SIFT (Scale Invariant Feature Transform) (Lowe, 2004)
, which aims for invariance to changes in object scale and illumination, thereby improving the generalizability of the trained model. However, in the face of highly diverse image characteristics, the one-size-fits-all feature extractor performs poorly. In contrast, deep learning adapts the features to particular image characteristics and extracts features that are most discriminative in the given task(Razavian et al., 2014).
As we present below, a feature extractor for food images can be learned through classification and metric learning, or through multitask learning, which concurrently performs these two tasks. We demonstrate that the proposed multitask learning approach enjoys the benefits of both classification and metric learning, and achieves the best performance.
One common way to learn a feature extractor for labeled data is to train a neural network that performs classification (i.e., mapping input to labels), and takes the output of a hidden layer as the feature representations; specifically, using a feedforward deep CNN with -layers (as the upper half of the Fig. 6 shows):
where represents the computation of -th layer (e.g., convolution, pooling, fully-connected, etc.), and is the output class label. The difference between the output class label and the ground truth (i.e., the error) is back-propagated throughout the whole network from layer to the layer . We can take the output of the layer as the feature representation of , which is equivalent to having a feature extractor as:
Usually, the last few layers will be fully-connected layers, and the last layer is roughly equivalent to a linear classifier that is built on the features (Ian Goodfellow and Courville, 2016). Therefore, is discriminative in separating instances under different categorical labels, and the Euclidean distances between normalized feature vectors can reflect the similarities between images.
Different from the classification approach, where the feature extractor is a by-product, metric learning proposes to learn the distance embedding directly from the paired inputs of similar and dissimilar examples. Prior work (Yang et al., 2015) used a Siamese network to learn a feature extractor for food images. The structure of a Siamese network resembles that in Fig. 6 but without Class label, Fully connected, 101 and Softmax Loss layers. The inputs to the Siamese network are pairs of food images . The images pass through CNNs with shared weights and the output of each network is regarded as the feature representation, i.e., and , respectively. Our goal is for and to have a small distance value (close to 0) if and are similar food items; otherwise, they should have a larger distance value. The value of contrastive loss is then back-propagated to optimize the Siamese network:
where similarity label indicates whether the input pair of food items , are similar or not ( for similar, for dissimilar), is the margin for dissimilar items and is the Euclidean distance between and in embedding space. Minimizing the contrastive loss will pull similar pairs together and push dissimilar pairs farther away (larger than a margin ) and it exactly matches the goal.
The major advantage of metric learning is that the network will be directly optimized for our final goal, i.e., a robust distance measure between images. However, as shown in the model benchmarks, using the pairwise information alone does not improve the embedding performance as the process of sampling pairs loses the label information, which is arguably more discriminative than (dis)similar pairs.
Both methods above have their pros and cons. Learning with classification leverages the label information, but the network is not directly optimized to our goal. As a result, although the feature vectors are learned to be separable in the linear space, the intra- and inter- categorical distances might still be unbalanced. On the other hand, metric learning is explicitly optimized for our final objective by pushing the distances between dissimilar food items apart beyond a margin . Nevertheless, sampling the similar or dissimilar pairs loses valuable label information. For example, given a pair of items with different labels, we only consider the dissimilarity between the two categories they belong to, but overlook the fact that each item is also different from the remaining categories, where is the total number of categories.
In order to leverage the benefits of both tasks, we propose a multitask learning design (Ian Goodfellow and Courville, 2016) for FoodDist. The idea of multitask learning is to share part of the model across tasks so as to improve the generalization ability of the learned model (Ian Goodfellow and Courville, 2016). In our case, as Fig. 6 shows, we share the parameters between the classification network and Siamese network, and optimize them simultaneously. We use the base structure of the Siamese network and share the upper CNN with a classification network where the output of the CNN is fed into a cascade of a fully connected layer and a softmax loss layer. The final loss of the whole network is the weighted sum of the softmax loss and contrastive loss :
Our benchmark results (Section 6.2) suggest that the feature extractor built with multitask learning achieves the best of both worlds: it achieves the best performance for both classification and Euclidean distance-based retrieval tasks.
We conduct user testing for online learning framework and end-to-end recommender system (Yum-me), as well as offline evaluation for food image embedding model (FoodDist). Our hypothesis for the evaluations are summarized below:
H1: Our online learning framework learns more accurate food preference profile than baseline approaches.
H2: FoodDist generates better similarity measure for food images than state-of-the-art embedding models.
H3: Yum-me makes more accurate nutritionally-appropriate meal recommendations than traditional survey as it integrates coarse-grained item filtering (provided by survey) with fine-grained food preference learned through adaptive elicitation.
In this section, we first present user testing results for online learning framework in Section 6.1, then offline benchmark FoodDist model with a large-scale real-world food image dataset in Section 6.2, and finally discuss the results of end-to-end user testing in Section 6.3.
In order to evaluate the accuracy of our online learning framework, we conducted a field study among 227 anonymous users recruited from social networks and university mailing lists. The experiment was approved by Institutional Review Board (ID: 1411005129) at Cornell University. All participants were required to use this system independently for three times. Each time the study consisted of following two phases:
Training Phase. Users conducted the first iterations of food image comparisons, and the system learnt and elicited preference vector based on the algorithms proposed in this paper or baseline methods, which will be discussed later. We randomly picked from set at the beginning but made sure that each user experienced different values of only once.
Testing Phase. After iterations of training, users entered the testing phase, which consisted of 10 rounds of pairwise comparisons. We picked testing images based on preference vector that learnt from online interactions: One of them was selected from food area that user liked (i.e. item with top preference value) and the other one from the area that user disliked (i.e. item with bottom preference value) Both of the images were picked randomly among unexplored food items.
In order to evaluate the effectiveness of user state update and images selection methods respectively, we conduct a 2-by-2 experiment in this section. For the user state update method, we compare proposed Label propagation, Exponentiated Gradient (LE) algorithm to Online Perceptron
Online Perceptron(OP), and for the images selection method, we compare proposed Exploration-Exploitation (EE) algorithm to the Random Selection (RS). Specifically, four frameworks presented below are evaluated. Users encountered them randomly when they logged into the system:
LE+EE: This is the online learning algorithm proposed in this paper that combines the ideas of Label propagation, Exponentiated Gradient algorithm for user state update and Exploitation-Exploration strategy for images selection.
LE+RS: This algorithm retains our method for user state update (LE) but Random Select images to present to user without any exploitation or exploration.
OP+EE: As each item is represented by 1000 dim feature vector, we can adopt the idea of regression to tackle this online learning problem (i.e. learning weight vector such that is higher for item that user prefer). Hence, we compare our method with Online Perceptron algorithm that updates whenever it makes error, i.e. if , assign , where is the label for item (pairwise comparison is regarded as binary classification such that the food item that user select is labeled as +1 and otherwise -1). In this algorithm, we retain our strategy of images selection (i.e. EE).
OP+RS: The last algorithm is the Online Perceptron mentioned above but with Random images Selection strategy.
Among 227 participants in our study, 58 of them finally used algorithm LE+EE, 57 used OP+RS. For the rest of users (112), half of them (56) tested OP+EE and the other half (56) tested LE+RS. Overall, the participants for different algorithms are totally random so that the performances of different models are directly comparable.
After all users going through the training and testing phases, we calculate the prediction accuracy of each individual user and aggregate them based on the context that they encountered (i.e. the number of training iterations and the algorithm settings mentioned above). The prediction accuracies and their cumulative distributions are shown in Fig. 7, 8 and 9 respectively.
Length effects of training iterations. As shown in Fig. 7 and Fig. 8, the prediction accuracies of our online learning algorithm are all significantly higher than the baselines.The algorithm performance is further improved with longer training period. As is clearly shown in Fig. 8, when the number of training iterations reaches 15, about half of the users will experience the prediction accuracy that exceeds , which is fairly promising and decent considering small number of interactions that system elicited from scratch. The results above justify that the online preference learning algorithm can adjust itself to explore users’ preference area as more information is available from their choices. For the task of item-based food preference bootstrapping, our system can efficiently balance the exploration-exploitation while providing reasonably accurate predictions.
Comparisons across different algorithms. As mentioned previously, we compared our algorithm with several obvious alternatives. As shown in Fig. 7 and Fig. 9, none of these algorithms works very well and the accuracy of prediction is actually decreasing as the user provides more information. Additionally, as is shown in Fig. 9, our algorithm has particular advantages when users are making progress (i.e. the number of training iterations reaches 15). The reason why these techniques are not suited for our application is mainly due to the following limitations:
Random Selection. Within a limited number of interactions, random selection can not maintain the knowledge that it has already learned about the user (exploitation), nor explore unknown areas (exploration). In addition, it’s more likely that the system will choose food items that are very similar to each other and thus hard for the user to make decisions. Therefore, after short periods of interactions, the system is messed up, and the performance degrades.
Underfitting. The algorithm that will possibly have the underfitting problem is the online perceptron (OP). For our application, each food item is represented by 1000 dim feature vector and OP
is trying to learn a separate hyperplane based on a limited number of training data. As each single feature is directly derived from deep neural network, the linearity assumptions made by perceptron might yield wrong predictions for the dishes that haven’t been explored before.
As another two aspects of online preference elicitation system, computing efficiency and user experience are also very important metrics for system evaluation. Therefore, we recorded the program execution time and user response time as a lens into the real-time performance of the online learning algorithm. As shown in Fig. 10(b), the program execution time is about for the first two iterations and less than for the iterations afterwards444Our web system implementation is based on Amazon EC2 t2-micro Linux 64-bit instance. Also, according to Fig. 10(a), the majority of users can make their decisions in less than for the task of comparison among ten food images while the payload for the pairwise comparison is less than . As a final cumulative metric for the system overhead, it is shown in Table. 2 that even for iterations of training, users can typically complete the whole process within 53 seconds, which further justify that our online learning framework is light-weight and user-friendly in efficiently eliciting food preference.
|# Iter: 5||# Iter: 10||# Iter: 15|
After the study, some participants send us emails regarding their experiences towards the adaptive visual interface. Most of the comments reflect the participants’ satisfactions and that our system is able to engage the user throughout the elicitation process. For example, “Now I’m really hungry and want a grilled cheese sandwhich!”, “That was fun seeing tasty food at top of the morning.” and “Pretty cool tool.”. However, they also highlight some limitations of our current prototype. For example, “I am addicted to spicy food and it totally missed it. There may just not be enough spicy alternatives in the different dishes to pick up on it.” points out that the prototype is limited in the size of the food database.
We develop FoodDist and baseline models (Section 5) using Food-101 training dataset, which contains 75,750 food images from 101 food categories (750 instances for each category) (Bossard et al., 2014). To the best of our knowledge, Food-101 is the largest and most challenging publicly available dataset for food images. We implement models using Caffe (Jia et al., 2014) and experiment with two CNN architectures in our framework: AlexNet (Krizhevsky et al., 2012), which won the first place at ILSVRC2012 challenge, and VGG (Simonyan and Zisserman, 2014), which is the state-of-the-art CNN model. The inputs to the networks are image crops of sizes (VGG) or (AlexNet). They are randomly sampled from a pixelwise mean-subtracted image or its horizontal flip. In our benchmark, we train four different feature extractors: AlexNet+Learning with classification (AlexNet+CL), AlextNet+Multitask learning (AlexNet+MT), VGG+Learning with classification (VGG+CL) and VGG+Multitask learning (VGG+ML, FoodDist). For the multitask learning framework, we sample the similar and dissimilar image pairs with 1:10 ratio from the Food-101 dataset based on the categorical labels to be consistent with the previous work (Yang et al., 2015). The models are fine-tuned based on the networks pre-trained with the ImageNet data. We use Stochastic Gradient Decent with a mini-batch size of 64, and each network is trained for iterations. The initial learning rate is set to 0.001 and we use a weight decay of 0.0005 and momentum of 0.9.
We compare the performance of four feature extractors, including FoodDist, with the state-of-the-art food image analysis models using Food-101 testing dataset, which contains 25,250 food images from 101 food categories (250 instances for each category). The performance for classification and retrieval tasks are evaluated as follow:
We test the performance of using learned image features for classification. For the classification deep neural network in each of the models above, we adopt the standard 10-crop testing. i.e. the network makes a prediction by extracting ten patches (the four corner patches and the center patch in the original images and their horizontal reflections), and averaging the predictions at the softmax layer. The metrics used in this paper are Top-1 accuracy and Top-5 accuracy.
Retrieval: We use a retrieval task to evaluate the quality of the euclidean distances between extracted features. Ideally, the distances should be smaller for similar image pairs and larger for dissimilar pairs. Therefore, as suggested by previous work (Yang et al., 2015; Yang et al., 2015), We check the nearest -neighbors of each test image, for , where
is the size of the testing dataset, and calculate the Precision and Recall values for each
. We use mean Average Precision (mAP) as the evaluation metric to compare the performance. For every method, the Precision/Recall values are averaged over all the images in the testing set.
The classification and retrieval performance of all models are summarized in Table. 3 and Table. 4 respectively. FoodDist performs the best among four models and is significantly better than the state-of-the-art approaches in both tasks. For the classification task, the classifier built on FoodDist features achieves 83.09% Top-1 accuracy, which significantly outperforms the original RFDC (Bossard et al., 2014) model and the proprietary GoogLeNet model (Meyers et al., 2015); For the retrieval task, FoodDist doubles the mAP value reported by previous work (Yang et al., 2015) that only used the AlexNet and siamese network architecture. The benchmark results demonstrate that FoodDist features possess high generalization ability and the euclidean distances between feature vectors reflect the similarities between food images with great fidelity. In addition, as we can observe from both tables, the multitask learning based approach always performs better than learning with classification for both tasks no matter which CNN is used. This further justifies the proposed multitask learning approach and its advantage of incorporating both label and pairwise distance information that makes the learned features more generalizable and meaningful in the euclidean distance embedding.
|Method||Top-1 ACC (%)||Top-5 ACC(%)|
|RFDC (Bossard et al., 2014)||50.76%|
|GoogleLeNet (Meyers et al., 2015)||79%|
|Method||mean Average Precision (mAP)|
|Food-CNN (Yang et al., 2015)||0.3084|
We conducted end-to-end user testing to validate the efficacy of Yum-me recommendations. We recruited 60 participants through the university mailing list, Facebook, and Twitter. The goal of the user testing was to compare Yum-me recommendations with a widely-used user onboarding approach, i.e. a traditional food preference survey (A sample survey used by PlateJoy is shown in Fig. 13). As Yum-me is designed for scenarios where no rating or food consumption history is available (which is common when the user is new to a platform or is visiting nutritionist’s office), collaborative filtering algorithm that has been adopted by many state-of-the-art recommenders is not directly comparable to our system.
In this study, we used a within-subjects study design in which each participant expressed their opinions regarding the meals recommended by both of the recommenders, and the effectiveness of the systems were compared on a per-user basis.
We created a traditional recommendation system by randomly picking out of meals in the candidate pool to recommend to the users. The values of and are controlled such that for both Yum-me and the traditional baseline. The user study consists of three phases, as Fig. 11 shows: (1) Each participant was asked to indicate their diet type and health goals through our basic user survey. (2) Each participant was then asked to use the visual interface. (3) 20 meal recommendations were arranged in a random order and presented to the participant at the same time, where 10 of them are made by Yum-me, and the other 10 are generated by the baseline. The participant was asked to express their opinion by dragging each of the 20 meals into either the Yummy or the No way bucket. To overcome the fact that humans would tend to balance the buckets if their previous choices were shown, the food item disappeared after the user dragged it into a bucket. In this way, users were not reminded of how many meals they had put into each bucket.
The user study systems were implemented as web services and participants accessed the study from desktop or mobile browsers. We chose a web service for its wide accessibility to the population, but we could easily fit Yum-me into other ubiquitous devices, as mentioned earlier.
The most common dietary choice among our 60 participants was No restrictions (48), followed by Vegetarian (9), Halal (2) and Kosher (1). No participants chose Vegan. Participant preferences in terms of nutrients are summarized in Table. 5. For Calories and Fat, the top two goals were Reduce and Maintain. For Protein, participants tended to choose either Increase or Maintain. For health goals, the top four participant choices were Maintain calories-Maintain protein-Maintain fat (20), Reduce calories-Maintain protein-Reduce fat (10), Reduce calories-Maintain protein-Maintain fat (10) and Reduce calories-Increase protein-Reduce fat (5). The statistics match well with the common health goals among the general population, i.e. people who plan to control weight and improve sports performance tend to reduce the intake calories and fat, and increase the amount of protein.
We use a quantitive approach to demonstrate that: (1) Yum-me recommendations yield higher meal acceptance rates than traditional approaches; and (2) Meals recommended by Yum-me satisfy users’ nutritional needs.
In order to show higher meal acceptance rates, we calculated the participant acceptance rate of meal recommendations as:
The cumulative distribution of the acceptance rate is shown in Fig. 12, and the average acceptance rate, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of each approach are presented in Table. 6
. The results demonstrate that Yum-me significantly improves the quality of the presented food items. The per-user acceptance rate difference between two approaches was normally distributed555A Shapiro Wilk W test was not significant (), which justifies that the difference is normally distributed.
, and a paired Student’s t-test indicated a significant difference between the two methods ().666We also performed a non-parametric Wilcoxon signed-rank test and found a comparable result.
To quantify the improvement provided by Yum-me, we calculated the difference between the acceptance rates of the two systems, i.e. . The distribution and average values of the differences are presented in Fig. 14 and Table. 6 respectively. It is noteworthy that Yum-me outperformed the baseline by in terms of the number of preferred recommendations, which demonstrates its utility over the traditional meal recommendation approach. However, another observed phenomenon in Fig. 14 is that there are 12 users (20%) with zero acceptance rate differences, which may due to the following two reasons: (1) Yum-me is not effective to this set of users, and it doesn’t improve their preferences towards recommended food items. (2) As we didn’t conduct participant control and filtering, some participants may not be well-involved in the study and randomly select or drag items.
|Yum-me Avg. Acc.||0.7250||0.0299|
|Baseline Avg. Acc.||0.5083||0.0341|
To examine meal nutrition, we compare the nutritional facts of paticipants’ favorite meals with those of meals recommended (by Yum-me) and accepted (items dragged into the yummy bucket) by the user. As shown in Fig. 15, for users with same nutritional needs and no dietary restrictions, we calculate the average amount of protein, calories and fat (per-serving) in (1) their favorite 20 meals (as determined by our online learning algorithm), and (2) their recommended and accepted meals, respectively. The mean values presented in Fig. 15 are normalized by the average amount of corresponding nutrient in their favorite meals. The results demonstrate that using a relatively simple nutritional ranking approach, Yum-me is able to satisfy most of the nutritional needs set by the users, including reduce, maintain and increase calories, increase protein, and reduce fat. However, our system fails to meet two nutritional requirments, i.e. maintain protein and maintain fat. Our results also show where Yum-me recommendations result in unintended nutritional composition. For example, the goal of reducing fat results in the reduction of protein and calories, and the goal of increasing calories ends up increasing the protein in meals. This is partially due to the inherent inter-dependance between nutrients and we leave further investigation of this issue to future work.
To qualitatively understand the personalization mechanism of Yum-me, we randomly pick 3 participants with no dietary restrictions and with the health goal of reducing calories. For each user, we select top-20 general food items the user likes most (inferred by the online learning algorithm). These food items played important roles in selecting the healthy meals to recommend to the user. To visualize this relationship, among these top-20 items, we further select two food items that are most similar to the healthy items Yum-me recommended to the users and present three such examples in Fig. 16. Intuitively, our system is able to recommend healthy food items that are visually similar to the food items a user like, but the recommended items are of lower calories due to the use of healthier ingredients or different cooking styles. These examples showcase how Yum-me can project users’ general food preferences to the domain of the healthy options, and find the ones that can most appeal to users.
Through a closer examination of the cases where our system performed, or did not perform, well, we observed a negative correlation between the entropy of the learned preference distribution 777Entropy of preference distribution: and the improvement of Yum-me over the baseline (). This correlation suggests that when user’s preference distributions are more concentrated, the recommended meals tend to perform better. This is not too suprising because the entropy of the preference distribution roughly reflects the degree of confidence the system has in the users’ preferences, where the confidence is higher if the entropy is lower and vice versa. In Fig. 17, we show the evolution of the entropy value as the users are making more comparisons. The results demonstrate that the system becomes more confident about user’s preferences as users provide more feedback.
In this section, we discuss the limitations of the current prototype and study and present real world scenarios where Yum-me and its sub-modules can be used.
In evaluating the online learning framework, because there is no previous algorithm that can end-to-end solve our preference elicitation problem, the baselines are constructed by combining methods that intuitively fit user state update and images selection modules, respectively. This introduces potential biases in baseline selections. Additionally, in the end-to-end user testing, the participants’ judgements of whether the food is Yummy or No way is potentially influenced by the image quality and the health concerns. These may be confounding factors in measuring users’ preferences towards food items and can be eliminated by explicitly instructing the participants to not consider these factors. We leave further evaluations as future work.
The ultimate effectiveness of Yum-me in generating healthy meal suggestions is contingent on the appropriateness of the nutritional needs input by the user. In order to conduct such recommendations for people with different conditions, Yum-me could be used in the context of personal health coaches, nutritionists or coaching applications that provide reliable nutritional suggestions based on the user’s age, weight, height, exercise and disease history. For instance, general nutritional recommendations can be calculated using online services built on the guidelines from National Institutes of Health, such as weight-success888http://www.weighing-success.com/NutritionalNeeds.html and active999http://www.active.com/fitness/calculators/nutrition. Also, although we have demonstrated the feasibility of building a personalized meal recommender catering to people’s fine-grained food preference and nutritional needs, the current prototype of Yum-me assumes a relatively simple strategy to rank the nutritional appropriateness, and is limited in terms of the available options for nutrition. Future work should investigate more sophisticated ranking approaches and incorporate options relevant to the specific application context.
We envision that Yum-me has the potential to power many real-world dietary applications. For example, (1) User onboarding. Traditionally, food companies, e.g. Zipongo and Plated, address the cold start problem by asking each new user to answer a set of pre-defined questions, as shown in Section 6.3, and then recommend meals accordingly. Yum-me can enhance this process by eliciting user’s fine-grained food preference and informing an accurate dietary profile. Service providers can customize Yum-me to serve their own businesses and products by using a specialized backend food item database, and then use it as a step after the general questions. (2) Nutritional assistants. While visiting a doctor’s office, patients are often asked to fill out standard questionnaires to indicate food preferences and restrictions. Patients’ answers are then investigated by the professionals to come up with effective and personalized dietary suggestions. In such a scenario, the recommendations made by Yum-me could provide a complementary channel for communicating the patient’s fine-grained food preferences to the doctor to further tailor suggestions.
FoodDist provides a unified model to extract features from food images so that they are discriminative in the classification and clustering tasks, and its pairwise Euclidean distances are meaningful in reflecting similarities. The model is rather efficient (s/f on 8-core commodity processors) and can be ported to mobile devices with the publicly-available caffe-android-lib framework101010https://github.com/sh1r0/caffe-android-lib.
In addition to enabling Yum-me, we released the FoodDist model to the community (https://github.com/ylongqi/FoodDist) so that it can be used to fuel other nutritional applications. For the sake of space, we only briefly discuss two sample use cases below:
Given a set of labels, e.g., food categories, cuisines, and restaurants, the task of food and meal recognition could be approached by first extracting food image features from FoodDist and then training a linear classifier, e.g., logistic regression or SVM, to classify the food images that are beyond the categories given in the Food-101 dataset.
Nutrition Facts estimation: With the emergence of large-scale food item or recipe databases, such as Yummly, the problem of nutritional fact estimation might be converted to a simple nearest-neighbor retrieval task: given a query image, we find its closest neighbor in the FoodDist based on Euclidean distance, and use that neighbor’s nutritional information to estimate the nutrition facts of the query image (Meyers et al., 2015).
In this paper, we propose Yum-me, a novel nutrient-based meal recommender that makes meal recommendations catering to users’ fine-grained food preferences and nutritional needs. We further present an online learning algorithm that is capable of efficiently learning food preference, and FoodDist, a best-of-its-kind unified food image analysis model. The user study and benchmarking results demonstrate the effectiveness of Yum-me and superior performance of FoodDist model.
Looking forward, we envision that the idea of using visual similarity for preference elicitation may have implications to the following research areas. (1) User-centric modeling: the fine-grained food preference learned by Yum-me can be seen as a general dietary profile of each user and be projected to other domains to enable more dietary applications, such as suggesting proper meal plans for diabetes patients. Moreover, a personal dietary API can be built on top of this profile to enable sharing and improvementment across multiple dietary applications. (2) Food image analysis API for deeper content understanding: With the release of the FoodDist model and API, many dietary applications, in particular the ones that capture a large number of food images, might benefit from a deeper understanding of their image contents. For instance, food journaling applications could benefit from the automatic analysis of food images to summarize the day-to-day food intake or trigger timely reminders and suggestions when needed. (3) Fine-grained preference elicitation leveraging visual interfaces. The idea of eliciting users’ fine-grained preference via visual interfaces is also applicable to other domains. The key insight here is that visual contents capture many subtle variations among objects that text or categorical data cannot capture; and the learned representations can be used as an effective medium to enable fine-grained preferences learning. For instance, the IoT, wearable, and mobile systems for entertainments, consumer products, and general content deliveries might leverage such an adaptive visual interface to design an onboarding process that learn users’ preferences in a much shorter time and potentially provide a more pleasant user experience than traditional approaches.
We would like to thank the anonymous reviewers for their insightful comments and thank Yin Cui, Fan Zhang, Tsung-Yi Lin, and Dr. Thorsten Joachims for discussion of machine learning algorithms.
Food-101–mining discriminative components with random forests.In Computer Vision–ECCV 2014. Springer, 446–461.
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248–255.
Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 452–461.
Feasibility of identifying eating moments from first-person images leveraging human computation. InProceedings of the 4th International SenseCam & Pervasive Imaging Conference. ACM, 26–33.
Personalizing Food Recommendations with Data Science.http://blog.zipongo.com/blog/2015/8/11/personalizing-food-recommendations-with-data-science. (2015).