Real-Time Learning from An Expert in Deep Recommendation Systems with Marginal Distance Probability Distribution

10/12/2021 ∙ by Arash Mahyari, et al. ∙ 0

Recommendation systems play an important role in today's digital world. They have found applications in various applications such as music platforms, e.g., Spotify, and movie streaming services, e.g., Netflix. Less research effort has been devoted to physical exercise recommendation systems. Sedentary lifestyles have become the major driver of several diseases as well as healthcare costs. In this paper, we develop a recommendation system for daily exercise activities to users based on their history, profile and similar users. The developed recommendation system uses a deep recurrent neural network with user-profile attention and temporal attention mechanisms. Moreover, exercise recommendation systems are significantly different from streaming recommendation systems in that we are not able to collect click feedback from the participants in exercise recommendation systems. Thus, we propose a real-time, expert-in-the-loop active learning procedure. The active learners calculate the uncertainty of the recommender at each time step for each user and ask an expert for a recommendation when the certainty is low. In this paper, we derive the probability distribution function of marginal distance, and use it to determine when to ask experts for feedback. Our experimental results on a mHealth dataset show improved accuracy after incorporating the real-time active learner with the recommendation system.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A major driver of healthcare costs in different countries are unhealthy behaviors such as physical inactivity, increased food intake, and unhealthy food choice [1, 2]. Behavioral and environmental health factors account for more deaths than genetics [1]. Pervasive computational, sensing, and communication technology can be laveraged to support individuals in their everyday lives to develop healthier lifestyles. For instance, the pervasive use of smartphones is a potential platform for the delivery of behavior-change methods at great economies of scale. Commercial systems such as noom [3] aim to provide psychological support via mobile health (mHealth) systems. Research platforms, such as the Fittle+ system [4], have demonstrated the efficacy of translating known behavior-change techniques [5] into personal mHealth applications. However, most work in mHealth domain are limited to the development of smartphone applications and connecting subjects and expert knowledge.

On the other hand, recommendation systems are becoming popular in various applications. e-commerce websites have been using recommendation systems to suggest new products and items to existing and new users to entice them into purchasing new items [6, 7]. With the advent of streaming movies and music services, e.g., Netflix, Spotify, service providers need to keep their users interested or lose their customers and profits. Thus, these streaming service providers deploy recommendation systems to suggest new movies and musics to their existing and new users based on their existing history of watching movies and listening to musics [8, 9]. However, not many research studies have been devoted to developing recommendation systems for exercise activities.

In this paper, we develop an attention-based recommendation systems for exercise activities to new users using a mHealth application. The proposed recommendation system is based on a deep recurrent neural network that takes advantage of users’ profiles and exercises characteristics as feature and temporal attention mechanisms. However, one major difference between exercise activities and other domains is that the recommendation system is not able to collect users’ feedback. In movie, e-commerce, and music applications, recommendation systems constantly receive feedback from users’ clicks data. When a recommendation system suggests a new music (or movie) to a user, the user may select the suggested music (or movie) and listen to (or watch) it completely. The click and duration of playback is used as a feedback to fine-tune the recommendation system. However, when a recommendation system suggests an exercise activity to the user, the application is not able to collect the click data. In other words, the recommendation system doesn’t know whether the user completed the exercise, i.e. it was a good recommendation. In some mHealth apps [10], users were asked to provide that information manually, but the missing data is tremendous as most users were ignorant of that feedback. The problem is more challenging when the recommendation system faces a new user without any history. To address this issue, in this paper, we take advantage of real time expert-in-the-loop mechanism. Our proposed network will calculate the certainty of a new recommendation using the probability distribution of the marginal distance

, the difference between the highest probability and the second highest probability of exercise classes in the output of the recommendation system. To quantify the certainty, we derive the probability distribution of the marginal distance from the probability distribution of the last layer of the recommender. Even though the marginal distance has been used in active learning, this is the first work to provide its probability distribution for a statistical hypothesis testing.

The other challenge with most recommendation systems is the initialization of the recommendation system for new users. Much work has been devoted to address this issue, including metal-learning approaches [11]. In this paper, we leverage the questionnaire filled by users and their demographic information to find the existing users with similar interests and demographic information. Then, the global recommendation model is fined tuned with the history of the similar users. Comparing to other existing methods, this approach has less complexity. Fig. 1 shows the overall architecture of the recommendation system.

The rest of this paper is organized as follows: Section II describes the mHealth data used in this study. The architecture of the proposed recommendation system is explained in Section III-A. Section III-B explains how we take advantage of the user profiles to initialize a recommendation system for new users. Section IV describes the proposed active learning procedure based on the distribution of the marginal distance. Section V is devoted to the experimental results and discussion.

I-a Related Work

More recently, mHealth systems have found applications in different healthcare domain since the advent of smart phones. Dunsmuir et al. [12] developed mHealth for diagnosis and management of pregnant women with pre-eclampsia. In [13], the inter-pulse-interval security keys was used to authenticate entities for various mHealth applications. Schiza et al. [14] proposed a unified framework for an eHealth national healthcare system for European Union. In another work [15], WE-CARE, a mobile -lead ECG device, was developed to provide 24/7 cardiovascular monitoring system. In [16]

, an mHealth system was designed and developed to provide exercise advice to participants based on their Body Mass Index, Basal Metabolic Rate, and the energy used in each activity or sport, e.g. aerobic dancing, cycling, jogging working and swimming. However, this work does not use machine learning algorithms.

On the other hand, recommendation systems have been used in e-commerce and online shopping for several years. The goal of recommendation systems is to recommend products that suit the consumers’ tastes. Traditional recommendation systems have used collaborative filters to suggest products similar to those the consumers have purchased. With the advancements in deep learning algorithms [17], several studies have proposed deep learning-based recommendation systems [18]. In [18], a multi-stack recurrent neural network (RNN) architecture is used to develop a recommendation system to suggest businesses in Yelp based on their reviews. Wu et al. [19]

used an RNN with long-short term memories (LSTM) to predict future behavioral trajectories. A few studies have proposed exercise recommendation systems

[20]. Sami et al. [20] used several independent variables to recommend various sports such as swimming using collaborative approaches. Ni et al. [21]

developed an LSTM-based model called FitRec for estimating a user’s heart rate profile over candidate activities and then predicting activities on that basis. The model was tested against 250 thousand workout records with associated sensor measurements including heart rate.

In health care domain, Yoon et al. [22] developed a recommendation system for a personalized clinical decision making. The system used electronic health records of different patients and their clinical decisions to recommend clinical decisions for new patients. In a recent work [23]

, support vector machines (SVM), random forest, and logistic regression were used to recommend skin-health products based on genetic phenotypes of consumers. In a similar study


, user contextual features and daily trajectories of steps over time were used to develop a recommendation system for planning an hour-by-hour activity. The model classifies users to subgroups and recommend activities based on the history of similar users. The proposed method doesn’t use any time series model, e.g., recurrent neural networks, to learn patterns, thus lacking the ability to generalize to more users.

Ii mHealth Data

The data we use comes from the Konrad et al. [10] mHealth experiment with DStress. It was developed to provide coaching on exercise and meditation goals for adults seeking to reduce stress. The purpose of the experiment was to test the efficacy of an adaptive daily exercise recommender (DStress-adaptive) against two alternative exercises programs in which the daily exercises changed according to fixed schedules (Easy-fixed and Difficult-fixed). The DStress-adpative recommender was a hand engineered finite state machine. The transition rules are described in more detail in Konrad et al. [10], but they implement a policy whereby, if a person successfully completes all three exercises assigned for a day, they advance to the next higher level of exercise difficulty. If they do not succeed at exercises or meditation activities, then they are regressed to exercises or meditation activities at an easier level of difficulty. The 44 exercises used in DStree and their difficulty ratings were obtained from three certified personal trainers (e.g., Wall Pushups, Standing Knee Lifts, Squats, and Burpees, etc.). The experiment took place over a 28-day period. In a given week, users encountered three kinds of days: Exercise Days (occurring on Mondays, Wednesdays, Fridays), Meditation Days (Tuesdays, Thursdays, Saturdays), and Rest Days (Sundays).

72 adult participants (19-59 yr) were randomly assigned to three conditions with different 28-day goal progressions: (1) a DStress-adaptive condition using the adaptive coaching system in which goal difficulties adjusted to the user based on past performance, (2) an Easy-fixed condition in which the difficulty of daily goals increased at the same slow rate for all participants assigned to that condition, and (3) a Difficult-fixed condition in which the goal difficulties increased at a greater rate. Konrad et al. [10] found that the adaptive DStress-adaptive condition produced significant reductions in self-reported stress levels compared to the Easy-fixed and Difficult-fixed goal schedules. The DStress-adaptive condition also produced superior rates of performing assigned daily exercise goals.

Fig. 1: The overall architecture of the proposed recommendation system with expert-in-the-loop. The exercise activities are recommended through a smartphone app, and their completion are collected from smartphones. The deep recommendation system is trained on the collected history data and its augmentation. The new recommender system is initialized for each new participant from the global trained model, and fine-tuned with similar users based on their profiles. At each time step, a new exercise is recommended to users. If the recommendation system is uncertain about the new recommendation, i.e. whether the user will complete the exercises or not, the recommendation system will ask the expert for correction.

User Profile: A variety of pretest survey data were collected in the Konrad et al. [10] study that provides our user data. These included the (1) Perceived Stress Scale (PSS), which is a 10-item psychometric scale assessing perceived tress over the past month, (2) Depression, Anxiety, Stress Scale (DASS), a 21 item assessment of depression, anxiety, and stress, (3) the Cohen-Hoberman Inventory of Physical Symptoms (CHIPS): a 33-item scale measuring concerning physical symptoms over the past 2 week, (4) BMI, the Body Mass Index, and (5) Goldin Leisure-Time Exercise Questionnaire (GLTEQ), a 4-item scale measuring frequency of physical activity during leisure time, and (6) the Exercise Self-Efficacy Scale (EXSE), and 8-item assessment of self-efficacy about exercising in next 1-8 weeks.

Exercise Profile: The original Konrad et al. study [10] obtained difficulty ratings of the exercises from three subject matter experts (SMEs; personal trainers) that were predictive of the probability of performing the exercises [25]. We augmented these data with exercise classification, attributes, and relations obtained from a 60 minute structured interview with an SME (fitness coach) that was followed up with specific clarification questions. The structured interview consisted of a card-sorting task and an exercise program planning task.

For the card-sorting task, exercise names and descriptions were placed on 3x5 cards. The cards were shuffled and the SME was asked to go through the cards to familiarize herself with the exercises. The SME was asked to sort the exercises into piles by whatever criteria “seemed natural.” The SME was asked to lablel those piles. These original piles were grouped into super-categories and labeled. Then the original piles were sorted into subcategories and labeled recursively until no further subgroups made sense to the SME. The SME was then asked if there was a possible “alternative grouping” of the exercises. The SME was also asked to rate the difficulty of each exercise. This card sorting produced an initial hierarchical classification of the exercises into a top level of resistance exercises and metabolic conditioning exercises. Within those supercategories there were subcategories for push, pull, squats, lunges, single leg stance, and core exercises. Further subcategories consisted of back, chest, legs. legs/glutes, core/abs, and abs/glute exercises. The alternative grouping consisted of full body, compound, and power categories.

Iii Proposed Recommendation System

The goal of this paper is to recommend the next exercise for a given participants based on the history of exercises the participant completed. Let represent the exercise history for the th participant, where

is the one-hot encoding of the exercise and

is the total number of exercises ( in mHealth dataset). Let and represent the th participant’s profile and the th exercise’s profile, respectively. The recommender gets as the input in addition to the th user’s profile and the exercises’ profiles and predicts to recommend to the user.

Iii-a Network Architecture

The proposed network consists of five modules: encoder, decoder, recurrent neural network (RNN), user attention, exercise temporal attention.

Exercise input: The encoder is a fully connected (FC) linear layer that embeds the -dimensional exercise names onto a

-dimensional vector space:


where is the exercise name’s embedding at time for the th participant, is a trainable weight matrix learned from the training data, and ReLU

is the activation function.

User Profile input: The profile of the th user is provided as a -dimensional vector. The user profile is embedded onto a -dimensional vector space:


where is the embedding of the th user’s profile, and .

Users’ profiles, e.g., demographic information, provide valuable information about paying attention to specific aspects and features of exercises. For example, age and gender are two important variables that can significantly affect the types of exercises users will likely to perform. Thus, we combine the exercise name embedding and the user profile embedding using:


where and are trainable mapping matrices, is the combination vector of the exercise and the user profiles which provides the attention probability for each entity of the input exercise. The final vector is the element-wise multiplication of the exercise name embedding and the attention probability: . The exercise name embedding is a vector in a -dimensional space, and multiplying it with the attention probability rotates this vector in the -dimensional space.

Fig. 2: The architecture and modules of the proposed deep recurrent neural network. The recommendation system uses users and exercise profiles as attention mechanism. The attention mechanism will highlight the most relevant characters tics of the exercises to each user.

Exercise Profile input and temporal attention mechanism: Similarly, the profile of the th exercises is provided with a -dimensional vector and embedded using a linear layer:


where is the embedding of the th exercise, and is the trainable matrix. Note that we use in front of the exercise profile similar to the exercise names to represent the time information.

At each time step, the user performs an exercise that have a huge impact on the future exercises the user has desire to complete. For example, an user who is doing exercises focused on upper body may want to continue upper-body exercises for a day and then focus on the lower-body exercises on another day. In another example, the difficulty of the current exercise affect the difficulty level of the short-term future exercises. To incorporate this information into the recommendation system, we use the exercise profiles as attention mechanism to give weights to different exercises at different time steps. The temporal attention mechanism assigns probability values to different time steps within the -length window. The exercise embeddings are combined with :


where and are trainable mapping matrices, is the combination vector of the user-attention exercise and exercise profiles giving the attention probability for each time step. The final time series of vectors with length used as the input to the RNN module is: for .


The RNN module is responsible for learning the sequential pattern of the exercise history. Although there are different variant of RNN modules with long short-term memory (LSTM) and gated recurrent units (GRU), we will use regular RNN modules. The reason is that LSTMs and GRUs have built-in units to learn the dependencies of time series over time and accentuate or de-emphasize relevant information at different time steps. In this paper, we will use temporal attention mechanism that will take into account the importance of different time steps. Moreover, our auto-correlation function (ACF) analysis of this dataset shows a short-term dependencies for exercise recommendation. However, in developing recommendation systems for datasets with long-term dependencies, we will replace regular RNNs with RNNs with LSTM or GRU units. The RNN module consists of one hidden layer with

ReLU activation function:


where and are trainable weights and is the hiddent state at time .

Exercise name prediction: The decoder converts the predicted hidden state at time into exercise names:


where is a trainable matrix and is the multinomial distribution over exercises.

Training: The whole network is trained end-to-end with the cross entropyloss function:


All modules are analytically differential. Thus, the gradient can back-propagate through decoder, recurrent layers through time, and user and temporal attention modules. Adam optimizer is used for training the network [26].

Iii-B New User Initialization

The recommendation system is trained with a set of training data collected from different users. The system is general and not personalized for new users. To personalized the recommender, we fine tune our network with the training dataset of the users whose profiles are very similar to the new user. To achieve this goal, we calculate the similarity between

and the existing users to find the most similar users. Afterwards, the recommender system is fine-tuned with their training data, small learning rate, and one epoch.

Iv Active Learning

Active learning is used to ask experts for providing annotation for unlabeled samples [27, 28]. However, it is impossible to ask experts for annotating a large set of unlabeled samples. The active learner is usually presented with a limited budget to ask experts for annotation. The active learner selects the most informative samples based on the uncertainty that a trained classifier has about these samples. Two most common methods of measuring the uncertainty are entropy and marginal distance .

In this paper, we use active learning to personalize the recommender for each user. The recommender is trained on a set of training data from different people. Thus, the recommender is not tailored for new users. The profile of the users will provide the attention mechanism in the feature space, but not enough to personalize the recommender. When we get a new user, the recommender will be used. However, when the recommender is uncertain about the next recommender exercise, the recommender will ask the expert to intervene and provide the next recommendation. We will use the marginal distance and entropy of the series of recommendation as criteria to decide when to ask the expert. However, the existing work has set an arbitrary threshold on the marginal distance. In this paper, we derive the probability distribution function of the marginal distance.

Iv-a Marginal Distance Random Variable

The output of the classifier (the last layer of the recommender) is a vector of random variables, , with Multinomial distribution. Ideally, one of s is one and the rest are zeros. However, random variables represent the probability of the input sample belonging to each of these classes. Let represent the probability values.

has a Beta distribution and the vector

is drawn from a Dirichlet distribution . Thus, these random variables are sorted in ascending order and the marginal distance is defined as . The distribution of the marginal distance is (see Appendix A for proof):


The Monte Carlo method will be used to approximate this distribution.

Iv-B Active Learning Procedure

The key component in active learning is to determine when to ask the expert for feedback. The recommender system needs to ask for the expert opinion when the input sample is out of the distribution of the training dataset. Let represents the distribution of the training dataset. We use to derive the probability distribution function the marginal distance as defined in § IV-A. Let represent this distribution, and represent the marginal distance for the th user at time . Then, our hypothesis is:

The -level hypothesis testing determines whether the recommended exercise should be given to the user or ask the expert for the right exercise recommendation (feedback or label). Then, the trained recommender is fine-tuned with the feedback from the expert for personalization.

V Experimental Results

The code repository: /ExerRecomActiveLearn.

In this section, the proposed approach with different components are evaluated on the offline mHealth dataset.

V-a Data Augmentation

The challenge with recommendation systems is not having enough training data. One way to increase the amount of training data is to use data augmentation [29, 30]

. While data augmentation in computer vision is straightforward and can be achieved by adding noise or cropping images randomly, it requires careful attention for sequential and symbolic data, e.g., language

[29]. The random creation of sequential data has negative effect as it introduces noise to the data that does not follow the sequential pattern in the real data.

For exercise activities, there are two ways to augment the sequential data: asking the human expert or using association mining on the training data. In the first approach, an exercise expert was asked to categorize different exercises available for participants. Then, the augmentation algorithm go over the sequence of exercises for each participant in the training data, choose of exercises and replace them with similar exercises in the same category.

In the second approach, we propose to use association rule mining [31] to extract rules from frequent itemsets. Then, the augmentation algorithms go over the sequence of exercises of each participant, choose of exercises and replace them with their similar exercises based on these rules.

V-B Experiment Setup and Discussion

In order to find the appropriate length of window for the RNN model, we looked at the autocorrelation function of the exercise sequence. The autocorrelation function shows the degree of the dependencies of time series data and is often used to select the order of the time series analysis methods. Our ACF analysis shows that the length of the sequence for the RNN model should be . Processing the data with results in sequence of training samples.

Baseline: We use the proposed RNN model with the user profile attention mechanism and the exercise profile temporal attention mechanism (described in  III-A) as our baseline method. The model is trained with Cross-entropy loss function and Adam optimizer [26] for epochs. We use k-fold approach for evaluating the performance of the recommender. In our k-fold setup, we kept one participant out of the training dataset and used the remaining participants data for the training. Then, the left-out participant is treated as a new participant and the trained recommender is used to recommend exercises to the new participant. The actual data from the left-out participant is used as the groundtruth to calculate the accuracy of the recommendation system. We calculate the top-1, top-5, and top-10 accuracy for evaluating the recommendation system. The experiment is repeated times and the average of the top-k accuracy are reported in Table I. In the first experiment, we only use the demographic information of users in the attention mechanism of the proposed recommendation system. Table I row 1 shows these accuracy. In the second, experiment, we used all information extracted from questionnaires in addition to their demographic information for the attention mechanism, and observed a slight decline in the accuracy (Table I row 2). We hypothesize that most people don’t have an accurate evaluation of their own ability. Thus, the answers to the questionnaires may not accurately represent their profiles, leading to a conflict to what exercises they performed and what they answered. For example, two person may give exact similar answers to the questions (possibly not accurate answers), but have different ability and interest to perform exercises. In the rest of this paper, we only use demographic information to represent the user’s profile.

Baseline with Data Augmentation: The training dataset was augmented with two approaches described in § V-A. The training and augmented data were used to train the baseline model and evaluated as described in the baseline section. Table I row 3 shows the accuracy results of the baseline model trained with the training data and the expert augmented data. The data augmentation generalizes the model and improve the accuracy compared to the baseline (Table I row 1).

On the other hand, we see a decline in the accuracy of the baseline model when it is trained on the training dataset and the augmented data by association rule mining algorithms (Table I row 4). This observation points to the importance of having the expert in the loop for exercise recommendation systems. Because in this experiment the augmented dataset with expert knowledge gives higher accuracy, we use this method for other experiments in this paper.

Baseline with Active Learning: The training data was used to estimate the parameters of the Dirichlet distributions: . The marginal distribution is calculated numerically, and the -level hypothesis testing results in:

where . During the test, if falls bellow , then the recommender asks the expert for the feedback and fine-tune the network with the feedback. Because we are evaluating the proposed model on a dataset collected in the past, we cannot ask for the expert for feedback in our evaluation. Therefore, whenever falls bellow , we take the actual exercise the test participant performed at that time step and provide it as the feedback by an expert to our active learning algorithm (Table I row 5). Comparing the results with our baseline model, the top- accuracy is increased by . The increased in accuracy is the results of fine-tuning the recommender system with the feedback received from the expert, which makes the recommendation system personalized.

Baseline with New User Initialization: We calculated the pairwise similarity across all participants. For each new participant, we fine-tuned the recommendation system with the training data of the three most similar participants based on their profiles. The recommendation system is initialized for the new user by the fine-tuned network. We didn’t use data augmentation and active learning in this experiment just to examine the effect of new user initialization. We see a minor improvement in top- accuracy (Table I row 6). However, we believe that when it comes to more diverse participants, e.g., different age groups, race, ethnicity, the proposed initialization strategy improve the accuracy significantly.


Baseline with Data Augmentation and Active Learning: In this experiment, we combined the data augmentation and active learning as we hypothesises that the accuracy will improve. Table I row 7 shows the result. As we expected, the accuracy has improved with respect to only active learning (Table I row 5) and only augmentation (Table I row 3).

Baseline with Data Augmentation, New User Initialization, and Active Learning: In the last experiment, we combined all modules. Table I row 9 represents the results indicating a very minor decline in accuracy by adding the new user initialization procedure (compared to Table I row 7). More study with different questionnaires, medical records, etc. may lead to increase in the accuracy of the new user initialization method.

Method top-1 Accuracy top-5 Accuracy top-10 Accuracy
1. Baseline (Demographic) 63.78% 92.19% 97.33%
2. Baseline (Full Profile) 61.16% 90.76% 96.98%
3. Baseline + Data Augmentation (Expert) 72.53% 95.53% 98.56%
4. Baseline + Data Augmentation (Rule based) 69.74% 95.28% 98.68%
5. Baseline (Demographic) + Active Learning 74.45% 95.29% 98.48%
6. Baseline (Demographic) + New user Init 65.91% 93.14% 97.65%
7. Baseline + Data Augmentation (Expert) + Active Learning 80.12% 97.23% 99.26%
8. Baseline + Data Augmentation (Expert) + New user Init 71.90% 95.27% 98.42%
9. Baseline + Data Augmentation (Expert) + New user Init + Active Learning 80.08% 97.00% 99.11%
TABLE I: The accuracy of the recommendation system.

Vi Conclusion

In this paper, a physical exercise recommendation system was developed. The main challenge in developing recommendation systems is to make them personalized, especially for new users that the training dataset does not exist. Analyzing the outcomes of the experimental results indicate the importance of user and exercise profiles as attention mechanism.Thus, the developed system took advantage of the health and demographic questionnaires filled by users before joining the program to use as attention mechanism.

In spite of all fine tuning and the attention mechanism, the experimental results show that the perfect personalization cannot be achieved. While constant feedback from the user interaction can be collected in e-commerce, music and movie recommendation systems, having an expert in the loop is inevitable in exercise recommendation systems because we don’t know whether the user performed the exercise or not (while in e-commerce the user either buys the product or listen completely to a music on a music platform). The expert knowledge provides information when the recommender is uncertain, which is why a real time active learning mechanism in conjunction with our developed recommendation systems showed a significant increase in the accuracy of the system.


The authors would like to thank Dr. Choh Man Teng.

Appendix A Probability Distribution of the Marginal Distance

In marginal distance, the output variables are sorted in ascending order as . The marginal distance is defined as . When the value of the marginal distance falls below a given threshold , the recommender asks the human expert for labeling. While in prior works the threshold was determined by users, we define the probability distribution of the marginal distance. The output variable is drawn from a Dirichlet probability distribution function, . In the Dirichlet distribution, the amount of the variables sum up to one: . In the marginal distance, we are only interested in and . To simplify the calculation, we define a new random variable . Note that the marginal distribution of s are Beta distribution and the summation of several Beta distribution,

, is a Beta distribution. The joint distribution of

is a Dirichlet distribution, ,:


where , and . The parameters and are estimated from the training dataset using Maximum Likelihood approach [32, 33].

The probability distribution of the marginal distance is derived from transforming and based on . Because the support of is restricted to the hyper-plane defined by , the third argument, , is known given and . Fig. (a)a shows the hyper-plane. Because the support of is restricted and the value of depends on and

, we can just visualize the projection of the hyperplane on the

plane. This makes defining the boundaries of the integrals easier. Fig. (b)b shows this projection with and lines. In the first step, we derive the cumulative probability distribution , as in Eq. 11. Then, to get the probability distribution function, we take the derivatives from both sides with respect to to obtain Eq. 12. Thus, we get the probability distribution function the marginal distance as in Eq. 13. We calculate numerically by approximating the integral with a summation for different value of , even though calculation of the closed form is not impossible.

Fig. 5: (a) The valid surface of variables with the Dirichlet distribution. (b) The integral area projected on plane.


  • [1] W. T. Riley, W. J. Nilsen, T. A. Manolio, D. R. Masys, and M. Lauer, “News from the nih: potential contributions of the behavioral and social sciences to the precision medicine initiative,” Translational behavioral medicine, vol. 5, no. 3, pp. 243–246, 2015.
  • [2] K. E. Thorpe, “The future costs of obesity: National and state estimates of the impact of obesity on direct health care expenses,” A collaborative report from United Health Foundation, the American Public Health Association and Partnership for Prevention, 2009.
  • [3] “noom,”, 2021.
  • [4] P. Pirolli, G. M. Youngblood, H. Du, A. Konrad, L. Nelson, and A. Springer, “Scaffolding the mastery of healthy behaviors with fittle+ systems: Evidence-based interventions and theory,” Human–Computer Interaction, vol. 36, no. 2, pp. 73–106, 2018.
  • [5] S. Michie, R. West, R. Campbell, J. Brown, and H. Gainforth, “Abc of behaviour change theories: an essential resource for researchers,” Policy Makers and Practitioners. Silverback IS: Silverback Publishing, vol. 402, 2014.
  • [6] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,” IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.
  • [7] A. V. Bodapati, “Recommendation systems with purchase data,” Journal of marketing research, vol. 45, no. 1, pp. 77–93, 2008.
  • [8] J. Bennett, S. Lanning, et al., “The netflix prize,” in Proceedings of KDD cup and workshop, vol. 2007.   New York, NY, USA., 2007, p. 35.
  • [9] Y. Song, S. Dixon, and M. Pearce, “A survey of music recommendation systems and future perspectives,” in 9th International Symposium on Computer Music Modeling and Retrieval

    , vol. 4.   Citeseer, 2012, pp. 395–410.

  • [10] A. Konrad, V. Bellotti, N. Crenshaw, S. Tucker, L. Nelson, H. Du, P. Pirolli, and S. Whittaker, “Finding the adaptive sweet spot: Balancing compliance and achievement in automated stress reduction,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015, pp. 3829–3838.
  • [11] H. Bharadhwaj, “Meta-learning for user cold-start recommendation,” in 2019 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2019, pp. 1–8.
  • [12] D. T. Dunsmuir, B. A. Payne, G. Cloete, C. L. Petersen, M. Görges, J. Lim, P. Von Dadelszen, G. A. Dumont, and J. M. Ansermino, “Development of mhealth applications for pre-eclampsia triage,” IEEE journal of biomedical and health informatics, vol. 18, no. 6, pp. 1857–1864, 2014.
  • [13] R. M. Seepers, C. Strydis, I. Sourdis, and C. I. De Zeeuw, “Enhancing heart-beat-based security for mhealth applications,” IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp. 254–262, 2015.
  • [14] E. C. Schiza, T. C. Kyprianou, N. Petkov, and C. N. Schizas, “Proposal for an ehealth based ecosystem serving national healthcare,” IEEE journal of biomedical and health informatics, vol. 23, no. 3, pp. 1346–1357, 2018.
  • [15] A. Huang, C. Chen, K. Bian, X. Duan, M. Chen, H. Gao, C. Meng, Q. Zheng, Y. Zhang, B. Jiao, et al., “We-care: an intelligent mobile telecardiology system to enable mhealth applications,” IEEE journal of biomedical and health informatics, vol. 18, no. 2, pp. 693–702, 2013.
  • [16] P. Wuttidittachotti, S. Robmeechai, and T. Daengsi, “mhealth: A design of an exercise recommendation system for the android operating system,” Walailak Journal of Science and Technology (WJST), vol. 12, no. 1, pp. 63–82, 2015.
  • [17] D. Ravì, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo, and G.-Z. Yang, “Deep learning for health informatics,” IEEE journal of biomedical and health informatics, vol. 21, no. 1, pp. 4–21, 2016.
  • [18] D. Z. Liu and G. Singh, “A recurrent neural network based recommendation system,” tech. rep., Stanford University, 2016.
  • [19] C.-Y. Wu, A. Ahmed, A. Beutel, A. J. Smola, and H. Jing, “Recurrent recommender networks,” in Proceedings of the tenth ACM international conference on web search and data mining, 2017, pp. 495–503.
  • [20] A. Sami, R. Nagatomi, M. Terabe, and K. Hashimoto, “Design of physical activity recommendation system.” in IADIS European Conf. Data Mining, 2008, pp. 148–152.
  • [21] J. Ni, L. Muhlstein, and J. McAuley, “Modeling heart rate and activity data for personalized fitness recommendation,” in The World Wide Web Conference, 2019, pp. 1343–1353.
  • [22] J. Yoon, C. Davtyan, and M. van der Schaar, “Discovery and clinical decision support for personalized healthcare,” IEEE journal of biomedical and health informatics, vol. 21, no. 4, pp. 1133–1145, 2016.
  • [23] X. Liu, C.-H. Chen, M. Karvela, and C. Toumazou, “A dna-based intelligent expert system for personalised skin-health recommendations,” IEEE journal of biomedical and health informatics, vol. 24, no. 11, pp. 3276–3284, 2020.
  • [24] Z. Li, S. Das, J. Codella, T. Hao, K. Lin, C. Maduri, and C.-H. Chen, “An adaptive, data-driven personalized advisor for increasing physical activity,” IEEE journal of biomedical and health informatics, vol. 23, no. 3, pp. 999–1010, 2018.
  • [25] P. Pirolli, “A computational cognitive model of self-efficacy and daily adherence in mhealth,” Translational behavioral medicine, vol. 6, no. 4, pp. 496–508, 2016.
  • [26] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [27] H. H. Aghdam, A. Gonzalez-Garcia, J. v. d. Weijer, and A. M. López, “Active learning for deep detection neural networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3672–3680.
  • [28] B. Zhang, L. Li, S. Yang, S. Wang, Z.-J. Zha, and Q. Huang, “State-relabeling adversarial active learning,” in

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , 2020, pp. 8756–8765.
  • [29] S. Yu, J. Yang, D. Liu, R. Li, Y. Zhang, and S. Zhao, “Hierarchical data augmentation and the application in text classification,” IEEE Access, vol. 7, pp. 185 476–185 485, 2019.
  • [30] K. Kafle, M. Yousefhussien, and C. Kanan, “Data augmentation for visual question answering,” in

    Proceedings of the 10th International Conference on Natural Language Generation

    , 2017, pp. 198–202.
  • [31] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining.   Pearson Education India, 2016.
  • [32] T. Minka, “Estimating a dirichlet distribution,” 2000.
  • [33] E. Suh, “Dirichlet python package,”, 2021.