Learning Behavioral Representations from Wearable Sensors

The ubiquity of mobile devices and wearable sensors offers unprecedented opportunities for continuous collection of multimodal physiological data. Such data enables temporal characterization of an individual's behaviors, which can provide unique insights into her physical and psychological health. Understanding the relation between different behaviors/activities and personality traits such as stress or work performance can help build strategies to improve the work environment. Especially in workplaces like hospitals where many employees are overworked, having such policies improves the quality of patient care by prioritizing mental and physical health of their caregivers. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor signals and interpreting them. Here, we use a non-parametric Bayesian approach, to model multivariate sensor data from multiple people and discover dynamic behaviors they share. We apply this method to data collected from sensors worn by a population of workers in a large urban hospital, capturing their physiological signals, such as breathing and heart rate, and activity patterns. We show that the learned states capture behavioral differences within the population that can help cluster participants into meaningful groups and better predict their cognitive and affective states. This method offers a practical way to learn compact behavioral representations from dynamic multivariate sensor signals and provide insights into the data.



page 7

page 11


TILES-2018: A longitudinal physiologic and behavioral data set of hospital workers

We present a novel longitudinal multimodal corpus of physiological and b...

Forecasting Health and Wellbeing for Shift Workers Using Job-role Based Deep Neural Network

Shift workers who are essential contributors to our society, face high r...

Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction

Today's densely instrumented world offers tremendous opportunities for c...

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Assessment of job performance, personalized health and psychometric meas...

ROSbag-based Multimodal Affective Dataset for Emotional and Cognitive States

This paper introduces a new ROSbag-based multimodal affective dataset fo...

Detecting Affective Flow States of Knowledge Workers Using Physiological Sensors

Flow-like experiences at work are important for productivity and worker ...

Wearables and location tracking technologies for mental-state sensing in outdoor environments

Advances in commercial wearable devices are increasingly facilitating th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Advances in sensing technologies have made wearable sensors more accurate and widely available, allowing for continuous and unobtrusive acquisition of multimodal physiological data, including heart rate, breathing, and physical activity. This data potentially allows for real-time, quantitative characterization of human behavior, which provides a basis to assess individual’s health (Aral and Nicolaides, 2017) and psychological well-being (Wang et al., 2014).

To make sense of physiological data, however, a number of challenges have to be solved. Sensor data is dynamic, very noisy, and often incomplete, with many missing values. Data streams coming from multiple sensors and individuals usually have very different time scales and cover different data collection periods. These characteristics make aggregating, reconciling, and modeling sensor data very challenging. Researchers have used Hidden Markov Models (HMMs) 

(Rabiner and Juang, 1986) to address some of these challenges and effectively capture temporal trends within physiological data (Singh et al., 2010; Amate et al., 2011; Novak et al., 2004; Pierson et al., 2018). HMMs are a family of generative probabilistic models for sequential data in which the system can be represented as a Markov process with latent, or “hidden”, states. These hidden states determine the dynamics of the process. One weakness of traditional HMMs is that they constrain the model to a predefined number of states. When learning dynamic behaviors from multiple physiological signals , it may be difficult to enumerate the states best representing the data without making strong assumptions. Even with prior knowledge many variations in the signals, originated from environmental noise or artifacts in data collection, may change the distribution of underlying states and require more states.

To address this challenge, we propose to use a non-parametric Markov Switching Autoregressive model 

(Fox et al., 2014)—the Beta Process Autoregressive HMM—to learn shared latent states of data collected from wearable sensors. The number of states in this model is learned from the data: if an unusual new pattern appears in data, another state is added to model that segment. This is beneficial in cases where there is a malfunction or noise on the sensors. By assigning a separate state to that segment, we can be able to identify and disregard that state.

We apply the model to physiological data collected from about 200 workers at a large urban hospital. The workers agreed to participate in a 10 week-long study, during which they wore sensors that collected their bio-behavioral data. We show that the proposed model learns hidden states that correspond to shared behaviors of the workers, which provide useful features for behavioral modeling. We use these behavioral representations to better understand and analyze the data, group similar individuals together, and as features to predict their psychological traits, personality, and demographics. Using this framework, business owners can personalize each individual’s responsibilities depending on their identified clusters and estimated physical and psychological traits, in order to build a healthier and more productive workplace.

Our paper makes the following contributions:

  • We learn shared behaviors from multivariate physiological data collected in the wild, i.e., from people going on about their daily lives, using BP-AR-HMM model.

  • We use the learned shared states to define distance measures between participants, which are validated by clustering participants into meaningful groups.

  • We propose

    1. A compact representation of participants to gain more insight into the data, interpret the learned latent states and find relations between different behaviors/states and individual attributes.

    2. A more advanced but less interpretable representation to predict individual attributes, such as personality traits, age, etc.

The rest of the paper is organized as follows. After reviewing relevant related work (§2), we describe the model (§3) Afterwards we describe the collected data (§4). Finally, we present results of our data analysis and prediction, and conclude with a discussion of their significance (§5).

2. Related Work

Physiological data from wearable sensors have been used in a number of applications, most commonly for activity recognition. In such tasks, the goal is to learn the activity labels of each time unit (e.g. running, jumping, etc) based on their sensory data. Labels are usually collected in a controlled lab setting (Ravi et al., 2005; Fox et al., 2014). In the case of in-the-wild studies, labels are either provided by real-time annotations of the user (Bao and Intille, 2004) or by human annotators when the study is over (Tapia et al., 2004). Another line of research attempts to predict users’ personality traits(e.g., degree of extraversion vs introversion), well-being and performance  (Hosseinmardi et al., 2018b, a) where the labels are collected from the participants as pre- or/and post- surveys. In this line of research the entire signal is analyzed to predict a final outcome which is different from activity recognition where the goal is to predict a label for each time unit.

In this work, since we do not have access to activity labels of participants at each time step, we use information from pre-study surveys, such as personality traits, to interpret the underlying behavioral states we learn from the physiological data. In other words, we use this framework to investigate possible correlations of learned behaviors (states) with certain personality traits. In the rest of this section, related methods in extracting behavioral states or features from physiological data are described.

Learning compact representations becomes especially important when dealing with physiological data. In these types of problems we usually don’t have large number of participants however, for each participant we have rich longitudinal data. Since we have one label per data stream, participant, our training data is limited to the number of participants in the study, in this case 180 data points. Complex models, such as those learned by deep neural networks, tend to overfit on small training data, and thus are not applicable for these problems. In fact having rich data only increases the number of parameters needed to learn compact meaningful representations and hence increases the possibility of overfitting.

Piecewise Aggregate Approximation (PAA) (Keogh et al., 2001) and Symbolic Aggregate Approximation (SAX) (Lin et al., 2003) are compact time series representations. PAA reduces a time series of length to a representation of length , by dividing the signal into segments and replacing each segment by its average. SAX discretizes PAA representations into predefined letter sequences, symbolic representations aimed at replacing numerical time series with strings. Both PAA and SAX are dependant on the length of the time series, so signals with different lengths can not be compared using these representations.

Hidden Markov Models

(HMMs) represent temporal trends and dynamics of time series using states and transition probabilities. Since transition probabilities are not time dependant, HMMs’ parameters of time series with different lengths could easily be compared. In prior research, HMM models are learned on each time series independently 

(Baum and Petrie, 1966). As a consequence, the learned states cannot be compared across different representations. In addition to HMM models, other recent methods for temporal modeling suffer from the same shortcoming when applied to multiple signals: for example, the approach by Hallac et al. (Hallac et al., 2017), which offers a new perspective on clustering subsequences of multivariate time series, cannot be used in learning representations across multiple signals. Another shortcoming of standard HMM models is that the number of states must be fixed a priori. Recent Bayesian approaches overcome these constraints by allowing infinitely many potential states using Beta process, which are shared among all time series (Fox et al., 2009b, a, 2014). This allows each time series to be represented in the space of shared latent states. This approach have successfully been applied to automatically capture different types of eye movements (Houpt et al., 2018), human body motion (Fox et al., 2014) and understanding dynamics of forums in deep and darkweb(Tavabi et al., 2019). There is another line of work which allows for infinite potential states in a Hidden Markov Model. Beal et al. (Beal et al., 2002) use the Dirichlet process (Antoniak, 1974) as prior over the hidden states, but the model is again designed to capture each time series independently.

Another approach in finding shared latent features among multiple time series is tensor decomposition. Tensor-based methods have been extensively used in different fields

(Kolda and Bader, 2009) including behavioral modeling (Hosseinmardi et al., 2018b; Sapienza et al., 2018; Hosseinmardi et al., 2018a). A popular tensor decomposition method is Parafac2 (Harshman, 1970), which offers multilinear higher-order decomposition that can handle missing values and different length time series. Parafac2 itself is considered a traditional method, however there are multiple works published in recent years on improving it’s inference, imposing constraints, etc (Jørgensen et al., 2018; Cohen and Bro, 2018). Similarly a recent work by Wu et al. (Wu et al., 2018) proposes Random Warping Series (RWS), a model based on Dynamic Time Warping to find similarities between different length, multivariate time series and embed them in an -dimensional space. This embedding technique was able to outperform state of the art methods in clustering and classification of time series. In this paper, we compare our results against RWS and Parafac2.

3. Methods

3.1. Beta Process Autoregressive HMM

Figure 1. Illustration of the model. Left image is matrix . Based on this matrix time series exhibits states A, B and C but there is a zero probability of time series going into state D. The right image shows state sequences of time series

One of the most popular tools for studying multivariate time series is vector autoregressive (VAR) model 

(Monbet and Ailliot, 2017). In a VAR model of lag , each variable is a linear function of itself and the other variable’s previous values. However, such models cannot describe time series with changing behaviors. In order to model such cases, Markov switching autoregressive models, which are generalization of autoregressive and Hidden Markov Models (Monbet and Ailliot, 2017) are used.

In this paper, we use a generative model proposed by (Fox et al., 2014, 2009a), called Beta Process Autoregressive HMM (BP-AR-HMM), to discover behaviors, or regimes of Markov switching autoregressive models that are shared by different time series. Based on the proposed model, the entire set of time series can be described by the globally-shared states, or behaviors, where each time series is associated with a subset of them. Behaviors associated with different time series can be represented by a binary matrix , where means time series is associated with behavior . Given matrix , each time series is modeled as a separate hidden Markov model with states it exhibits. An example of matrix and the corresponding state sequences are shown in Figure  1

HMM is represented by a transition matrix , which is a square matrix with dimensions equal to the number of states time series exhibits.Entry is the probability of transitioning from state to state for time series , hence, the sum of each row equals to . Each state is modeled using a vector autoregressive process with lag .


When a time series is in state , its future values evolve according to the autoregressive weights and noise . Since the number of such states in the data is not known in advance, the Beta process is used (Hjort and others, 1990; Thibaux and Jordan, 2007). A Beta process allows for infinite number of behaviors but encourages sparse representations. Consider, as an example, a model with behaviors. Each behavior (each column of matrix

) is modeled by a Bernoulli random variable whose parameter is obtained from a Beta distribution (Beta Bernoulli process), i.e.


The underlying distribution when this process is extended to infinite number of behaviors—i.e., as tends to infinity—is the Beta process. This process is also known as the Indian Buffet Process  (Kingman, 1967; Griffiths and Ghahramani, 2011) which can be best understood with the following “culinary metaphor” involving a sequence of customers (time series) selecting dishes (features) from an infinitely large buffet. The -th customer selects dish with probability , where is the popularity of the dish, i.e., some features are going to be more prevalent than others. S/He then selects new dishes. With this approach, the number of features can grow arbitrarily with the size of the dataset: in other words, the feature space increases if the data cannot be faithfully represented with the already defined states. However, the probability of adding new states decreases according to . Finally, the distribution generated by the Indian Buffet Process is independent of the order of the customers. For posterior computations the original work is referenced (Fox et al., 2014, 2009a).

3.2. Measuring Distance

When applied to multivariate physiological signals, the generative model described above learns a hidden Markov model for each signal. We use the learned HMMs to identify individuals with similar behaviors. In this section we propose two different methods for measuring the distance between HMMs (each HMM represents a participant in the study)

3.2.1. Likelihood Distance:

To define a similarity measure between two HMMs, one could measure the probability of their state sequences, having been generated by the same process. Since each signal is associated with its distinct generative process, we measure state sequences’ similarity as the likelihood that sequence () was generated by , the process that gave rise to (), and the likelihood that was generated by . We average the two likelihoods to symmetrize the similarity measure.


The likelihood is computed using the learned transition matrix of and Markov process assumption , where is a row of the transition matrix corresponding to state . Since with this approach longer time series would automatically have a smaller likelihood, we normalize them by dividing to , being the number of states and being the length of the time series. Finally since are small probabilities, log(Sim()) is computed which would yield negative similarity values. We negate them to obtain the distance between the time series.

3.2.2. Viterbi Distance:

Distance between different HMMs could be also computed with the Viterbi distance proposed in (Falkhausen et al., 1995). Viterbi distance is defined as follows:


is any possible time series, is length of , and is joint probability of and (state sequence of Y given HMM ) defined as


Where S in any possible state sequence.

We use the same approximation used in (Falkhausen et al., 1995) for equation 4 which gives us:


s are the probabilities in transition matrix of and probability of state in the stationary distribution of . (The stationary distribution will be further explained in section 3.3)

The Likelihood Distance computes the distance based on both the state sequences and HMMs, where state sequence is a sample of the HMM (generative) model. The other method, Viterbi Distance, computes the values by only comparing the HMMs. This makes Viterbi distance less susceptible to noises observed in state sequences or in other words less sensitive to small changes. This trade-off causes one method to perform better than the other depending on the targeted construct and it’s corresponding sensitivity to small variations in the data.

Once the distance between participants is defined, we can perform a number of operations, including learning representations and clustering.

3.3. Learning Representations

In this section we describe two methods for learning representations from the HMMs. The first method is interpretable and could be used for analyzing the data. The second method however gives better performance in predicting most of the constructs.

3.3.1. Stationary Representation:

Each HMM is defined by a transition matrix. Transition matrix gives the probability of transitioning from one state to another, so

is the probability distribution for

, and is the probability distribution for

, which is the stationary distribution. No matter the starting state, the relative amount of time spent in each state is given by the stationary distribution, which is unique for each transition matrix and given by the eigenvector corresponding to the largest eigenvalue of the transition matrix. We use these stationary distributions as features for classification and regression tasks with different models.

3.3.2. Spectral Representation

A drawback of using stationary distribution of the transition matrix to represent participants is that it does not capture the relation between behavioral states, hence it might not be able to distinguish between participants with similar behaviors but which are ordered differently. In order to capture these differences we use distance between participants. Specifically, we perform these steps:

  1. Calculate the distance matrix between participants using either the likelihood distance or the Viterbi distance described in section 3.2

  2. Compute the normalized Laplacian of the distance matrix

  3. Use largest eigen-vectors (i.e., eigenvectors corresponding to largest eigenvalues) as representations of participants (

    is a hyperparameter)

This approach is similar to spectral clustering

(Von Luxburg, 2007) methods. The intuition behind this approach is that the distance matrix could be interpreted as a weighted adjacency matrix for the network between individuals and the eigen vectors of graph Laplacian provide information about the components and possible cuts in the graph.

4. Data

Figure 2. Histogram of the amount of data collected for the participants. X-axis gives the number of 12-hour shifts and the y-axis gives the number of participants with that many shifts.

The data used in this work comes from an ongoing research study of workplace well-being that measures physical activity, social interactions and physiological states of hospital workers. The study recruited over 200 volunteers from among the employees of a large urban hospital. Participants were enrolled for ten-weeks over the course of three “study waves”, each with different start dates (03/05/18, 04/09/18 and 05/05/18 for waves 1, 2 and 3 respectively). Participants were 31.1% (n = 66) male and 68.9% (n = 146) female and ranged in age from 21 years to 65 years, with median age of 36 and average of 38.6 years. Most participants were college educated with 59.4% with a Bachelors degree and 21.7% with at least some post-graduate study (Masters or Doctorate). The remaining 18.9% of participants had either a high school diploma or some college. Participants held a variety of job titles: 54.3% were registered nurses, 12% were certified nursing assistants, with the rest reporting some other job title, such as occupational or respiratory therapist, technicians, etc. Overall, two-third of the subjects were nurses. It is worth mentioning that clinical staff in this study works a minimum of 3 days per week (in 12 hour shifts), which can be any day during the week. Nurses exclusively work in day shifts (7am-7pm) or night shifts (7pm-7am). Subjects in other roles can have more flexible work shifts. Depending on the number of workdays during the study, participants wore the sensors for different number of days. Furthermore, participants exhibited varying compliance rates, with a few participants forgetting to wear their sensors on some days. Hence, collected data varies in the amount and length across different participants. Figure  2 shows the distribution of collected data, in terms of 12-hour shifts, over the participants. For this paper, we focused on 180 participants from whom at least 6 days of data was collected.

Name Description Instrument
ITP Job performance (Griffin et al., 2007)
IRB In Role Behavior (Williams and Anderson, 1991)
IOD-ID Counter-productive Work behavior (Berry et al., 2007)
IOD-OD Counter-productive Work behavior (Berry et al., 2007)
OCB Organizational Citizenship Behavior (Van Dyne et al., 1994)
Shipley Abstraction Cognitive ability (Shipley et al., 2009)
Shipley Vocabulary Cognitive ability (Shipley et al., 2009)
NEU Personality: Neuroticism (Gosling et al., 2003)
CON Personality: Conscientiousness (Gosling et al., 2003)
EXT Personality: Extraversion (Gosling et al., 2003)
AGR Personality: Agreeableness (Gosling et al., 2003)
OPE Personality: Openness (Gosling et al., 2003)
POS-AF Positive affect (Watson et al., 1988)
NEG-AF Negative affect (Watson et al., 1988)
STAI Anxiety (Spielberger et al., 1983)
AUDIT Alcohol Use Disorders Identification Test (Saunders et al., 1993)
IPAQ Physical activity (Maddison et al., 2007)
PSQI Sleep quality (Buysse et al., 1989)
Health limit Role limitations due to physical health problems (Ware Jr and Sherbourne, 1992)
Emotional limit Role limitations due to emotional problems (Ware Jr and Sherbourne, 1992)
Well-being Index of psychological well-being (Ware Jr and Sherbourne, 1992)
Social Functioning Index of social interaction ability (Ware Jr and Sherbourne, 1992)
Pain Index of physical pain (Ware Jr and Sherbourne, 1992)
General Health Index of general health (Ware Jr and Sherbourne, 1992)
Life Satisfaction Global life satisfaction (Diener et al., 1985)
Perceived stress Perceived stress indicator (Cohen et al., 1994)
PSY flexibility Ability to adapt to situational demands (Rogge, 2016)
PSY inflexibility Inability to adapt to situational demands (Rogge, 2016)
WAAQ Work-related Acceptance and Action Questionnaire (Bond et al., 2013)
Psychological Capital PCQ a measure of psychological capital (Luthans et al., 2007)
Challenge Stress Challenge stress indicator (positive stress) (Rodell and Judge, 2009)
Hindrance Stress Hindrance stress indicator (negative stress) (Rodell and Judge, 2009)
Table 1. Table of ground truth constructs collected during pre-study surveys.

4.1. Ground Truth Constructs

In addition to wearing sensors, participants were also asked to complete a surveys prior to the study. These pre-study surveys measured job performance, cognitive ability, personality, affect, and health states, which serve as ground truth constructs for our study. Constructs are shown in Table 1. Pre-study surveys also included participant’s demographics such as age, gender, job, etc.

Signal Feature
6 features
Heart Rate, R-R Peak Coverage
Avg. Breathing Depth, Avg. Breathing Rate
Std. Breathing Depth, Std. Breathing Rate
15 features
Intensity, Cadence
Steps, Sitting, Supine
Avg. G Force, Std. G Force
Angle From Vertical, Low G Coverage
Avg. X-Acceleration, Std. X-Acceleration
Avg. Y-Acceleration, Std. Y-Acceleration
Avg. Z-Acceleration, Std. Z-Acceleration
Table 2. Extracted Features from OMsignal.

4.2. Sensors and Features

Data used in this paper was collected from a suite of wearable sensors produced by OMSignal Biometric Smartwear. These OMSignal garments include sensors embedded in the fabric that measure physiological data in real-time and can relay this information to participant’s smartphone. The OMsignal sensor provides data including heart rate (HR), heart rate variability (HRV), breathing, and accelerometery (to infer sitting position, on-foot movement, and more). Table 2 shows a summary of the sensor signals that we use in this paper. Signal time series all have five min resolution. Participants were instructed to wear OMsignal garments only during their work shifts, although it is hard to verify this.

5. Results

We used BP-AR-HMM with auto regressive lag to model the temporal data collected from sensors worn by the 180 high-compliance participants in the study. For each participant, we constructed vectors representing the 21 features from physiological and movement signals listed in Table 2

. We used Z-score to normalize features from sensors. However, since some statistical features like mean and variance are useful for predicting constructs like age, both normalized and unnormalized signals were used in the prediction tasks in Tables


The model identified 23 shared latent states describing participants’ behavior. Some of the states were only exhibited by a few participants. These rare states could convey some useful information that helps identify noise or anomalies in the collected data; however, their sparseness is not beneficial to the prediction and clustering tasks. Therefore, we ignore states observed in fewer than of the time series. For example, one of these states observes a constant heart rate which shows a malfunction in the sensor.

5.1. Clustering

Figure 3. Dendrogram showing the similarity of participants based on their learned states.

For validating the distance measures defined in Section 3.2, we apply hierarchical agglomerative clustering calculated on the distance matrices. As mentioned in 3.2 the likelihood distance is more sensitive to small variations in the data, hence the resulting dendogram has more structure compared to the dendogram generated using the Viterbi distance, the groupings of the participants in the dendograms however are very similar in both cases. Therefore we only focus on the dendogram generated using the likelihood distance, shown is figure 3.

To evaluate the quality of the dendogram, we used responses gathered from participants during the pre-study surveys and performed statistical tests to evaluate differences between the branches. We partitioned the dendrogram into clusters with more than five members by cutting the dendrogram horizontally on different depths. For continuous-valued labels we performed one-way ANOVA test, a generalization of the T-test for multiple groups, and for categorical labels we performed Kruskal-Wallis H-test with the same setting. Based on the P-values obtained from ANOVA test and Kruskal-Wallis H-test, the most important features differentiating the branches (clusters) were job type, age and gender in that order. This was aligned with our expectations, since different job types require different activities, also age and gender affect physiological signals

(Lanitis, 2009; Wu et al., 2015).

The first cut point (marked 1 at the top of Figure 3) separates registered nurses from other jobs types with precision and recall . In other words, of individuals in the red cluster (cluster 2) are registered nurses. The main difference between the two clusters (red and blue) is the frequency of three latent states, which we call A, B and C. In the two states A and B, the participant was seated, based on the binary sitting signal. Additionally, variables related to acceleration and movement are almost zero for state A; however, state B

is more representative of higher activity levels. Participants in state

B have higher intensity, cadence, breathing and heart rate. This distinction of participants that are seated while features such as acceleration and steps are non-zero, suggests that these participants are moving around, for example, in a rolling chair. Frequency of state B is higher in cluster 2, while frequency of state A is higher for participants in cluster 3, and since the majority of individuals in cluster 2 are registered nurses, state B could represent activities like checking on patients and moving around while seated, and state A represents desk jobs. Figure 4 shows a visualization of these two states.

State C mostly captures flexibility of work hours for non-nurses (non-nurse participants are more likely to finish their shifts earlier and have less than 12 hours worth of data in one shift).

Figure 4. Comparison of states A and B using four signals from the same participant. For this plot a segment of signal where the participant was in state A and another segment where she was in state B is chosen and values coming from the sensors are displayed.

This clustering can also separate participants based on other demographic information such as work shifts. Work shifts, day or night shifts, are distinguished by state D, It will be further described how this is recognized in the prediction section. In this state binary supine signal, which is activated when the participants is lying down, is on. Using only the percentage of time spent in state D we can predict day or night shift of participants, with . It appears that state D captures quick naps in the work place and has a higher frequency for night shift participants. Participants who exhibit state D are shown by color yellow in Figure 3. There is however another state where the supine variable is on, state E. This state has a higher breathing depth and lower movements compared to D, and it usually lasts a few hours in a more continuous scenario. Participants with behaviour E are shown in Figure  3 with color green. Although participants were asked to wear the sensors while working, they could have kept them on at home, and state E may represent a longer night/day sleep at home, rather than at the workplace. Recognizing this, is beneficial for cleaning data even for other models.

5.2. Prediction

We use the learned representations for each participant as features to predict the ground truth constructs. The objective is two-fold: not only do we want to predict, but also, gain understanding about what the latent states represent.

5.2.1. Qualitative Results

Some of the behaviors learned by the model have natural interpretations (states A and B from Figure 4), while others may be harder to explain. Thus, a possible way to understand latent behaviors is to quantify their importance in explaining constructs. The stationary representation described in section 3.3 has a clear interpretation, with each dimension representing the percentage of time spent in the corresponding state. We explain the behavioral states using the stationary representation with the following process:

  1. Get the stationary representations of participants

  2. Run classification/regression on the representations to predict each construct.

  3. Retrieve the learned coefficients. Each coefficient corresponds to one dimension of the embedding which represents behavioral states.

  4. Select the states with highest positive and lowest negative coefficients and interpret these states based on their relation with the targeted construct.

Figure 5 shows a subset of constructs and the states that best predict them. Based on this approach we recognized state D, described in the section 5.1, as the most relevant state for differentiating between day and night shift employees. State D also helps predict positive affect (POS-AF), well-being, and hindrance stress. State D has a positive coefficient in predicting POS-AF and Well-being, whereas for hindrance stress it has a negative coefficient. Hindrance stress is generally perceived as a type of stress that prevents progress toward personal accomplishments. Thus, a plausible interpretation of these results is: Quick naps or breaks during work hours could increase positive affect and well-being and decrease hindrance stress. Similarly as shown in Figure 5, state N which has a large positive coefficient in predicting hindrance stress, has a large negative coefficient in predicting positive affect, well-being, life satisfaction, and extraversion.

(a) (b)
Figure 5. Bipartite graphs with subset of constructs from Table 1 and subset of states. In (a) each construct is connected to two states whose regression coefficients are the highest (i.e., strongest positive relationship); and in (b) each construct is connected to two states with the lowest negative coefficients (i.e., strongest negative relationship).

-.5in-.5in RMSE RMSE RMSE RMSE RMSE Construct HMM-S HMM-SL HMM-SV RWS Parafac2 ITP -0.729 0.493 0.073 0.488 0.288 0.469 -0.143 0.494 0.105 0.487 IRB -0.481 4.166 0.193 4.066 0.203 4.055 -0.146 4.18 0.265 4.014 IOD-ID -0.728 5.185 0.11 5.124 0.136 5.108 -0.534 5.22 -0.709 5.182 IOD-OD -0.326 6.917 0.202 6.715 0.244 6.634 -0.168 6.891 -0.01 6.864 OCB 0.168 12.035 0.167 12.035 0.245 11.884 0.112 12.159 0.215 11.986 Shipley abstract 0.178 3.732 0.085 3.777 0.179 3.756 0.148 3.764 0.314 3.603 Shipley vocabulary 0.26 4.713 0.085 4.841 0.213 4.748 0.297 4.643 0.399 4.486 NEU 0.066 0.726 0.159 0.718 0.174 0.722 0.048 0.728 0.116 0.724 CON -0.165 0.62 0.245 0.591 0.181 0.6 -0.033 0.613 0.093 0.612 EXT 0.154 0.655 0.152 0.659 0.264 0.642 0.178 0.65 0.038 0.66 AGR -0.428 0.491 0.122 0.485 0.191 0.479 0.079 0.488 0.099 0.488 OPE 0.224 0.586 0.217 0.581 0.28 0.571 0.216 0.585 -0.386 0.598 POS-AF 0.37 6.547 0.254 6.614 0.231 6.686 0.139 6.821 0.112 6.822 NEG-AF -0.278 5.293 0.235 5.139 0.206 5.195 0.045 5.286 0.139 5.238 STAI 0.016 8.975 0.196 8.817 0.112 8.919 0.128 8.912 0.095 8.966 AUDIT 0.1 2.159 0.362 2.017 0.153 2.142 0.053 2.169 0.244 2.113 IPAQ -0.57 15352 0.094 15191 0.115 15316 0.033 15311 0.097 15246 PSQI -0.682 2.366 0.178 2.318 0.142 2.33 0.193 2.322 0.194 2.311 Age 0.461 8.613 0.091 9.662 0.084 9.667 0.243 9.406 0.363 9.035 Health Limit -0.75 23.284 0.196 22.704 0.333 21.986 0.222 23.325 0.118 23.264 Emotional Limit -0.704 22.71 0.211 22.102 0.164 22.504 0.042 22.652 0.091 22.553 Well being 0.077 18.458 0.152 18.302 0.276 17.904 0.011 18.682 0.167 18.277 Social Functioning 0.057 21.94 0.109 21.684 0.191 21.547 0.085 21.857 0.218 21.541 Pain 0.167 18.613 0.134 18.448 0.239 18.164 0.023 18.658 0.102 18.571 General Health 0.211 17.062 0.27 16.792 0.171 17.28 0.151 17.311 0.2 17.105 Life Satisfaction -0.655 1.354 0.106 1.338 0.22 1.317 -0.125 1.362 0.207 1.317 Perceived Stress 0.196 0.511 0.201 0.51 0.209 0.511 0.195 0.513 -0.728 0.524 PSY flexibility -0.793 0.821 0.187 0.806 0.233 0.795 -0.077 0.823 0.103 0.813 PSY inflexibility -0.66 0.803 0.182 0.785 0.152 0.79 -0.013 0.803 0.006 0.8 WAAQ 0.31 5.65 0.284 5.705 0.205 5.833 0.153 5.878 0.163 5.866 Psychological Capital 0.188 0.656 0.129 0.661 0.17 0.662 0.12 0.662 0.08 0.664 Challenge Stress -0.639 0.622 0.171 0.615 0.078 0.62 -0.097 0.623 -0.789 0.621 Hindrance Stress 0.132 0.644 0.005 0.646 0.206 0.633 0.035 0.647 0.143 0.637

Table 3. Evaluation of the model on the construct prediction task. The best performing model’s results are highlighted in bold.

5.2.2. Quantitative Results

For predicting constructs, we used both normalized (z-score) and un-normalized signals as inputs to the BP-AR-HMM and obtained stationary representations and spectral representations using both distance measures (likelihood and Viterbi distance). In this work we refer to method with stationary representation as HMM-S, spectral representation with likelihood distance as HMM-SL and spectral representation with Viterbi distance as HMM-SV. Spectral representation requires a hyperparameter K, number of eigen vectors to include in the representation. We set the K to 10, 20,

, 100. We run ridge, kernel ridge and random forest regression on all three learned representations and report the best model. The results are reported in correlation to the target construct (

) and Root Mean Squared Error (RMSE) using Leave-one-out cross validation Table 3 shows results of the regression task.

5.2.3. Baselines

We compare our results against Random Warping Series (RWS)(Wu et al., 2018) and Parafac2 (Harshman, 1970)

. RWS is the state-of-the-art method for time series embedding. This method constructs kernels over features extracted from time series. Extracted features are given by Dynamic Time Warping, an algorithm that is used for measuring the similarity between a number of randomly generated sequences and the original sequence. This method has three hyperparameters:

, and . The first parameter, , specifies the size of the embedding and number of random time-series generated to be compared with the original time series. Parameter is the length of the random time series generated and is the choice of the kernel parameter. Based on authors suggestion, we fixed the value of to a large number, 512 based on their implementation, and experimented with few different values for and . This means that the embeddings generated were in a 512 dimensional space.

We use similiar techniques–ridge, kernel ridge and random forest regression – on the embeddings learned by different hyperparameters and compare the best results with our own results. This also holds for the next baseline Parafac2.

The second baseline we use is Parafac2 (Harshman, 1970). This approach views the data as a tensor (3 dimensional array) of participants-sensors-time and decomposes it into hidden components by applying SVD to the multivariate time series for each user, while sharing the same factors across variable dimensions, e.g. heart rate, breathing rate. These vectors could be interpreted as level of involvement in the hidden component: i.e., the vector along the dimension of number of participants shows the intensity of that hidden factor for each individual which can be used as features for regression or classification. For Parafac2 the number of hidden components must be given in advance. We varied the number of hidden components from one to ten and also used both normalized and un-normalized signals as inputs to the model and report the best result based on and RMSE in Table 3.

Overall, the results of the predictions based on the HMM’s latent states were systematically better, outperforming the baseline method in 28 out of the 33 constructs predicted. It’s worth mentioning that except for HMM-S which is non-parametric, all other four models in Table 3 have hyperparameters that need to be set. We tune the hyperparameters by running 10 different settings and selecting the setting with best results . This might be the reason why RWS is not performing as well. Because it has three hyperparametes while the other models (except for HMM-S) have one hyperparameter hence it needs more tuning compared to the other models.

Between our own representations (HMM-S, HMM-SL, HMM-SV),HMM-SV performs better for some construct while HMM-SL gives better results for others. This could be because of the differences between Viterbi and likelihood distance and their sensitivity to small variations in the data, discussed more in section 3.2. Also HMM-S is not a good representation for prediction and is better suited for analysis of the data.

6. Conclusion

In this work, we described a method for learning behavioral representations from dynamic physiological data captured by wearable sensors using a Bayesian non-parametric framework that combines Hidden Markov Models and the Beta Process. This method can overcome limitations of state-of-the-art alternatives, including handling unaligned time series with different lengths, robust inference with missing data and noise, and finally discovery of dynamic behaviors without any a priori knowledge about the behavioral states to be identified. We used this model to learn behavioral representations for data collected from workers of a large urban hospital. The 200 volunteers were enrolled in a 10-week long study and were asked to wear a suite of sensors.

The latent states learned by the model capture behavioral differences within the population and can also be used to predict their self-reported health and psychological well-being. In comparison to alternative models, our framework improves performance with compact representations of the multivariate time-series; leading to less overfitting and easier interpretation of the states learned. Concluding, we show that this framework can also cluster study participants into meaningful, cohesive groups exhibiting similar behaviors and characteristics.

This work can be extended in a number of ways. One possible direction is making this framework supervised. Using this framework as is, helps in analyzing multivariate signals and although learned states can predict some constructs very well, making this framework supervised could be more suited for a prediction task.

7. Acknowledgement

The authors are grateful to the TILES team for the efforts in study design, data collection and sharing, that enable this work. This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligent Advanced Research Projects Activity (IARPA), via IARPA Contract No 2017-17042800005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.


  • L. Amate, F. Forbes, J. Fontecave-Jallon, B. Vettier, and C. Garbay (2011) Probabilistic model definition for physiological state monitoring. In Statistical Signal Processing Workshop (SSP), 2011 IEEE, pp. 457–460. Cited by: §1.
  • C. E. Antoniak (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The annals of statistics, pp. 1152–1174. Cited by: §2.
  • S. Aral and C. Nicolaides (2017) Exercise contagion in a global social network. Nature communications 8, pp. 14753. Cited by: §1.
  • L. Bao and S. S. Intille (2004) Activity recognition from user-annotated acceleration data. In International conference on pervasive computing, pp. 1–17. Cited by: §2.
  • L. E. Baum and T. Petrie (1966)

    Statistical inference for probabilistic functions of finite state markov chains

    The annals of mathematical statistics 37 (6), pp. 1554–1563. Cited by: §2.
  • M. J. Beal, Z. Ghahramani, and C. E. Rasmussen (2002) The infinite hidden markov model. In Advances in neural information processing systems, pp. 577–584. Cited by: §2.
  • C. M. Berry, D. S. Ones, and P. R. Sackett (2007) Interpersonal deviance, organizational deviance, and their common correlates: a review and meta-analysis.. Journal of applied psychology 92 (2), pp. 410. Cited by: Table 1.
  • F. W. Bond, J. Lloyd, and N. Guenole (2013) The work-related acceptance and action questionnaire: initial psychometric findings and their implications for measuring psychological flexibility in specific contexts. Journal of Occupational and Organizational Psychology 86 (3), pp. 331–347. Cited by: Table 1.
  • D. J. Buysse, C. F. Reynolds III, T. H. Monk, S. R. Berman, and D. J. Kupfer (1989) The pittsburgh sleep quality index: a new instrument for psychiatric practice and research. Psychiatry research 28 (2), pp. 193–213. Cited by: Table 1.
  • J. E. Cohen and R. Bro (2018) Nonnegative parafac2: a flexible coupling approach. In International Conference on Latent Variable Analysis and Signal Separation, pp. 89–98. Cited by: §2.
  • S. Cohen, T. Kamarck, R. Mermelstein, et al. (1994) Perceived stress scale. Measuring stress: A guide for health and social scientists, pp. 235–283. Cited by: Table 1.
  • E. Diener, R. A. Emmons, R. J. Larsen, and S. Griffin (1985) The satisfaction with life scale. Journal of personality assessment 49 (1), pp. 71–75. Cited by: Table 1.
  • M. Falkhausen, H. Reininger, and D. Wolf (1995) Calculation of distance measures between hidden markov models. In Fourth European Conference on Speech Communication and Technology, Cited by: §3.2.2, §3.2.2.
  • E. B. Fox, M. C. Hughes, E. B. Sudderth, M. I. Jordan, et al. (2014) Joint modeling of multiple time series via the beta process with application to motion capture segmentation. The Annals of Applied Statistics 8 (3), pp. 1281–1313. Cited by: §1, §2, §2, §3.1, §3.1.
  • E. Fox, M. I. Jordan, E. B. Sudderth, and A. S. Willsky (2009a) Sharing features among dynamical systems with beta processes. In Advances in Neural Information Processing Systems, pp. 549–557. Cited by: §2, §3.1, §3.1.
  • E. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky (2009b) Nonparametric bayesian learning of switching linear dynamical systems. In Advances in Neural Information Processing Systems, pp. 457–464. Cited by: §2.
  • S. D. Gosling, P. J. Rentfrow, and W. B. Swann Jr (2003) A very brief measure of the big-five personality domains. Journal of Research in personality 37 (6), pp. 504–528. Cited by: Table 1.
  • M. Griffin, A. Neal, and S. Parker (2007) A new model of work role performance: positive behavior in uncertain and interdependent contexts. Academy of Management Journal 50 (2), pp. 327–347. Cited by: Table 1.
  • T. L. Griffiths and Z. Ghahramani (2011) The indian buffet process: an introduction and review.

    Journal of Machine Learning Research

    12 (Apr), pp. 1185–1224.
    Cited by: §3.1.
  • D. Hallac, S. Vare, S. Boyd, and J. Leskovec (2017) Toeplitz inverse covariance-based clustering of multivariate time series data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 215–223. Cited by: §2.
  • R. A. Harshman (1970) Foundations of the parafac procedure: models and conditions for an “explanatory” multimodal factor analysis. Cited by: §2, §5.2.3, §5.2.3.
  • N. L. Hjort et al. (1990) Nonparametric bayes estimators based on beta processes in models for life history data. The Annals of Statistics 18 (3), pp. 1259–1294. Cited by: §3.1.
  • H. Hosseinmardi, A. Ghasemian, S. Narayanan, K. Lerman, and E. Ferrara (2018a) Tensor embedding: a supervised framework for human behavioral data mining and prediction. arXiv preprint arXiv:1808.10867. Cited by: §2, §2.
  • H. Hosseinmardi, H. Kao, K. Lerman, and E. Ferrara (2018b) Discovering hidden structure in high dimensional human behavioral data via tensor factorization. In WSDM Heteronam Workshop, Cited by: §2, §2.
  • J. W. Houpt, M. E. Frame, and L. M. Blaha (2018) Unsupervised parsing of gaze data with a beta-process vector auto-regressive hidden markov model. Behavior research methods 50 (5), pp. 2074–2096. Cited by: §2.
  • P. J. Jørgensen, S. F. Nielsen, J. L. Hinrich, M. N. Schmidt, K. H. Madsen, and M. Mørup (2018) Probabilistic parafac2. arXiv preprint arXiv:1806.08195. Cited by: §2.
  • E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems 3 (3), pp. 263–286. Cited by: §2.
  • J. Kingman (1967) Completely random measures. Pacific Journal of Mathematics 21 (1), pp. 59–78. Cited by: §3.1.
  • T. G. Kolda and B. W. Bader (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §2.
  • A. Lanitis (2009) A survey of the effects of aging on biometric identity verification. International Journal of Biometrics 2 (1), pp. 34–52. Cited by: §5.1.
  • J. Lin, E. Keogh, S. Lonardi, and B. Chiu (2003) A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. Cited by: §2.
  • F. Luthans, B. J. Avolio, J. B. Avey, and S. M. Norman (2007) Positive psychological capital: measurement and relationship with performance and satisfaction. Personnel psychology 60 (3), pp. 541–572. Cited by: Table 1.
  • R. Maddison, C. N. Mhurchu, Y. Jiang, S. V. Hoorn, A. Rodgers, C. Lawes, and E. Rush (2007) International physical activity questionnaire (ipaq) and new zealand physical activity questionnaire (nzpaq): a doubly labelled water validation. Int. Journal of Behavioral Nutrition and Physical Activity 4 (1), pp. 62. Cited by: Table 1.
  • V. Monbet and P. Ailliot (2017) Sparse vector markov switching autoregressive models. application to multivariate time series of temperature. Computational Statistics & Data Analysis 108, pp. 40–51. Cited by: §3.1.
  • D. Novak, L. Lhotska, T. Al-Ani, Y. Hamam, D. Cuesta-Frau, P. Micó, and M. Aboy (2004) Morphology analysis of physiological signals using hidden markov models. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 3, pp. 754–757. Cited by: §1.
  • E. Pierson, T. Althoff, and J. Leskovec (2018) Modeling individual cyclic variation in human behavior. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 107–116. Cited by: §1.
  • L. R. Rabiner and B. Juang (1986) An introduction to hidden markov models. ieee assp magazine 3 (1), pp. 4–16. Cited by: §1.
  • N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman (2005) Activity recognition from accelerometer data. In Aaai, Vol. 5, pp. 1541–1546. Cited by: §2.
  • J. B. Rodell and T. A. Judge (2009) Can “good” stressors spark “bad” behaviors? the mediating role of emotions in links of challenge and hindrance stressors with citizenship and counterproductive behaviors.. Journal of Applied Psychology 94 (6), pp. 1438. Cited by: Table 1.
  • R. Rogge (2016) Cited by: Table 1.
  • A. Sapienza, A. Bessi, and E. Ferrara (2018) Non-negative tensor factorization for human behavioral pattern mining in online games. Information 9 (3), pp. 66. Cited by: §2.
  • J. B. Saunders, O. G. Asaland, T. F. Babor, J. R. D. la Fuente, and M. Grant (1993) Development of the alcohol use disorders identification test (audit): who collaborative project on early detection of persons with harmful alcohol consumption‐ii. Addiction 89 (6). Cited by: Table 1.
  • W. C. Shipley, C. P. Gruber, T. A. Martin, and A. M. Klein (2009) Shipley-2 manual. Western Psychological Service, Los Angeles, CA. Cited by: Table 1.
  • A. Singh, T. Tamminedi, G. Yosiphon, A. Ganguli, and J. Yadegar (2010) Hidden markov models for modeling blood pressure data to predict acute hypotension. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 550–553. Cited by: §1.
  • C. D. Spielberger, G. A. Jacobs, S. Russell, and R. S. Crane (1983) Assessment of anger: the state-trait anger scale. In Advances in Personality Assessment, Cited by: Table 1.
  • E. M. Tapia, S. S. Intille, and K. Larson (2004) Activity recognition in the home using simple and ubiquitous sensors. In International conference on pervasive computing, pp. 158–175. Cited by: §2.
  • N. Tavabi, N. Bartley, A. Abeliuk, S. Soni, E. Ferrara, and K. Lerman (2019) Characterizing activity on the deep and dark web. arXiv preprint arXiv:1903.00156. Cited by: §2.
  • R. Thibaux and M. I. Jordan (2007) Hierarchical beta processes and the indian buffet process. In Artificial Intelligence and Statistics, pp. 564–571. Cited by: §3.1.
  • L. Van Dyne, J. W. Graham, and R. M. Dienesch (1994) Organizational citizenship behavior: construct redefinition, measurement, and validation. Academy of management Journal 37 (4), pp. 765–802. Cited by: Table 1.
  • U. Von Luxburg (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §3.3.2.
  • R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, D. Ben-Zeev, and A. T. Campbell (2014) StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing, pp. 3–14. Cited by: §1.
  • J. E. Ware Jr and C. D. Sherbourne (1992) The mos 36-item short-form health survey (sf-36): i. conceptual framework and item selection. Medical care, pp. 473–483. Cited by: Table 1.
  • D. Watson, L. A. Clark, and A. Tellegen (1988) Development and validation of brief measures of positive and negative affect: the panas scales.. Journal of personality and social psychology 54 (6), pp. 1063. Cited by: Table 1.
  • L. J. Williams and S. E. Anderson (1991) Job satisfaction and organizational commitment as predictors of organizational citizenship and in-role behaviors. J. of Management 17 (3), pp. 601–617. Cited by: Table 1.
  • L. Wu, I. E. Yen, J. Yi, F. Xu, Q. Lei, and M. Witbrock (2018) Random warping series: a random features method for time-series embedding. arXiv preprint arXiv:1809.05259. Cited by: §2, §5.2.3.
  • Y. Wu, Y. Zhuang, X. Long, F. Lin, and W. Xu (2015) Human gender classification: a review. arXiv preprint arXiv:1507.05122. Cited by: §5.1.