1 Introduction
Smart health solutions are becoming ever more feasible with the rapid development of sensors and mobile applications that can continuously collect human behavioral data. Indeed, many sensors from various sources such as environmental (Poppe, 2007), bodyworn sensors (Lukowicz et al., 2004; Karantonis et al., 2006), or even smartphone based sensors (Anguita et al., 2013) are prevalent for health monitoring applications. With so many ambient sensors collecting different measurements, it becomes important not only to maintain good monitoring accuracy, but also low power consumption, to ensure effective and sustainable monitoring. Such a tradeoff between monitoring accuracy and monitoring resource allocation is natural and ubiquitous in many applications (He et al., 2006; Wiser et al., 2008; Wu et al., 2018; Wang et al., 2018).
Such problems require an accurate characterization of the tradeoff between monitoring resource usage and monitoring accuracy. In the case of human behavioral and health monitoring using ambient sensors, a power efficient sensing scheme can be achieved by deciding which group of sensors to use at a given time. Here, a tradeoff arises between sensor energy usage and the uncertainty in ignoring certain sensor signals while monitoring. To characterize this tradeoff, it is necessary to have a model that can quantify the uncertainty of the activity prediction with respect to the sensor measurements over time. With this model, a loss function associated with both energy cost and activity prediction uncertainty can be defined and therefore the adaptive monitoring problem can be solved by a sequential decision process.
Sequential decision process with Markov models have been well studied in the literature. For example, Markov decision process (MDP) and partially observed Markov decision process (POMDP)
(Krishnamurthy, 2016) are popular stochastic models for sequential decision process. They can be applied for optimal preventive maintenance policy (Byon, Ntaimo, and Ding, 2010) and path planning under uncertainty (Morere, Marchant, and Ramos, 2017). Other decision process, for example Bayesian Optimization, have been applied for optimal sensor set selection (Garnett, Osborne, and Roberts, 2010). In this paper, activity detection is based on time series sensor signals, so the sensor selection problem depends on not only the current state, but also the history. The computation of the loss function will therefore be complicated due to the sequential decision nature.Besides all above challenges in characterizing the resource limitation and monitoring accuracy tradeoff, modeling the measurements taken from these ambient sensors is also challenging by itself. For example, smartphone sensor measurements are abundant yet noisy, and thus require efficient computational methods to process effectively. Moreover, the sensor measurement frequency can vary with time, the duration of each activity may vary, and the multivariate effects between the large set of features can be difficult to capture. Methods to model such timeseries measurements include linear dynamical systems (Barber, 2012; Ardywibowo et al., 2018), ARMA models (Torres et al., 2005)
(Harvey, 1990), point processes (Gunawardana, Meek, and Xu, 2011), and Recurrent Neural Networks
(Funahashi and Nakamura, 1993). However, most of these existing methods focused on prediction only and did not attempt to characterize the uncertainty of the measurements.To tackle this problem, we derive a switching multivariate Gaussian process model for the goal of activity recognition. Our model is a hierarchical one consisting of a Hidden semiMarkov model (HSMM) for the discrete activity states, and a multivariate Gaussian process to model both the time dependent and the intervariable correlations between different sensor measurements. We use a block circulant embedding technique and use Fast Fourier Transforms (FFT) to speed up model inference and uncertainty quantification of our proposed model. Using this model, we then develop an adaptive monitoring scheme that for each monitoring period uses the optimal group of sensors by optimizing the Bayesian cost function considering both the activity predictive entropy^{1}^{1}1Here the predictive entropy is a measure for uncertainty, the reduction of which indicates the information provides by the selected sensors. It is computed approximately by Monte Carlo since it has no closedform expression. and the energy cost of selected sensors.
We implement our model on the UCI Human Activity Recognition using Smartphones dataset (Anguita et al., 2013)
. This dataset consists of labeled trajectories of smartphone sensor measurements from multiple subjects under 6 different activities: walking, walking upstairs, walking downstairs, sitting, standing, and laying. The time series trajectories consist of features extracted from gyroscope and accelerometer measurements, such as movement angle, jerk, acceleration, and moving averages. Extensive results demonstrate the effectiveness of our framework on achieving competitive performanceenergy tradeoffs
2 The Model: Switching Gaussian Process
Throughout our presentation, we will use the following notation convention: bold faces indicate vectors or multivariate processes, capital letters indicate matrices or covariance functions of two variables, and script letters indicate sets.
2.1 Model Formulation
We first describe the model for time series sensor measurements from one subject. For the different activity types, we assume that there is an underlying semiMarkov jump process (Yu, 2010) that governs the transitions between them. Denoting the discrete valued activity state at time as , the semiMarkov jump process can be expressed as:
(1) 
Here, are discrete activity states that discriminate between the different activity types, while and are the random start and end time points of the discrete activity state . We can denote the random duration that the subject stays in as . We adopt an explicitduration model for the sojourn times by modeling
using a Gamma distribution with state specific parameters as follows:
(2) 
where are the parameters for the duration distribution of activity with and
representing shape and scale parameters respectively. The advantage of this semiMarkov model is that it explicitly models the varying durations of different activity types. For example, in our application, we can see that the walking upstairs and walking downstairs activities all take less time to complete compared to the other activities. This activity duration may be informative in inferring the different activities. To complete the semiMarkov jump process model modeling the underlying activity states, the state transition probability themselves can be modeled by a transition probability matrix as follows:
(3) 
For the observation process , we incorporate a switching variate Gaussian Process. This process models the dynamic heterogeneity of the time series measurements by switching between different Gaussian process models for each inferred activity state or context. Specifically, we model , the observation process occurring at time period , as follows:
(4) 
Here, conditioned on the activity state , the observation process will be a Gaussian process with activitystatedependent mean and covariance function and respectively. The covariance function is a multivariate covariance function between two univariate variables and in for in the set of features .
With this, the observation process can be expressed as
(5) 
In order to handle the potential multivariate correlations between the sensor observations, we implement an intrinsic correlation model (Bonilla, Chai, and Williams, 2008). In this model, we assume that we can separate the covariance function of into a time dependent covariance function , and a freeform intervariable covariance matrix between two variables and in as follows:
(6) 
Knowing the activity state and a set of noisy observations at a set of time points , the prediction on a new set of points for the variable is given by:
(7) 
(8) 
(9) 
(10)  
Here, z is a vector of noisy multivariate measurements structured as , where is the measurement of variable at time . The vector selects the column of , while is a matrix of time dependent covariances between the prediction time points and the observed time points. Finally, is a
matrix of independent noise variances
for each multivariate variable . The parameters of our model can be combined as: . Where the covariance matrices can be further parameterized. Specifically, we use a Matern covariance function for the time dependent covariance, while a Cholesky factorization was used for the multivariable covariance to ensure positive definiteness. An illustration summarizing the model formulation is shown in Figure 1.2.2 Model Inference
We describe both the parameter inference and process inference algorithms for the proposed model with switching Gaussian process. For parameter inference, we use a population model to combine the data from multiple subjects, treating each subject as an independent realization of our model. We estimate the parameters of our model using maximum likelihood inference. Meanwhile, process inference can be efficiently done, similarly to classical filtering methods in traditional Hidden Markov Models (HMMs)
(Rabiner, 1986) using the ForwardBackward algorithm.2.2.1 Parameter Inference
In our application, since all of the activity states are labeled in the timeseries data, estimation of the semiMarkov jump process parameters is straightforward. Specifically, for each Gamma distributed activity state duration with parameter , we can derive the corresponding maximum likelihood estimates for and as follows:
(11) 
(12) 
(13) 
where is the duration of the th time stamp in state . Besides, the transition probability can be inferred by simply counting the number of transitions between activity states as follows:
(14) 
where is the number of transitions from state to state .
To ensure positive definiteness of the multivariate covariance, we can parametrize it using the Cholesky decomposition for lowertriangular matrix . Since each subject’s process model is independent from each other given the model parameters, we can write the complete loglikelihood of all subjects as a sum of individual loglikelihoods. Specifically, denote by as two times the negative loglikelihood of subject . This term can be expanded as follows: First, for each time frame corresponding to a different activity state, we put our observations vector in an matrix form, denoted by . Then, denote the matrix of corresponding mean functions for each as .
rCl
Q_j &=& ∑_τ=1^T_j(N_τlog—K_i_τ^Y— + P log—K_i_τ^T— + N_τP log2π
& &tr[(K_i_τ^Y)^1 F_τ^⊺(K_i_τ^T)] F_τ] + N_τ∑_p=1^Plogσ_p^2 +
& & tr[(Z_τ F_τ) D^1 (Z_τ F_τ)^⊺] ),
where is the activity state of subject in time frame with corresponding time dependent and multivariate covariance matrices and . With this, the complete loglikelihood will be a sum of the individual loglikelihoods:
(15) 
With this, any optimization method can be used to estimate the parameters of the multivariate Gaussian process model. The exact parameter updates are omitted from this presentation and the reader is referred to (Bonilla, Chai, and Williams, 2008) for more details.
Fast Inference using Block Circulant Embedding
Computing the likelihood as well as the gradient of the multivariate Gaussian Process models can be computationally challenging when dealing with a large number of features and the potential interactions between them. This is mainly due to the inversion and determinant calculation operations that need to be performed on the covariance matrix .
To speed up these computations, we propose a block circulant embedding approach. We first note that the measurements of our dataset are evenly spaced in time and assume that we only have one feature measurement. For this setting, the covariance matrix can be expressed as follows:
Here are scalars as we only have a single feature. Observe that we can embed this matrix in the following larger circulant matrix:
which is fully specified by the first row vector
. We can perform an eigenvalue decomposition of the above matrix as follows:
(16) 
and denote the Fourier and inverse Fourier transform respectively, and . Using the Fast Fourier Transform (FFT), this calculation can be done in . Consequently, inverse and determinant calculations can be done at this same time complexity (Davis, 2012). This idea can be extended to the case of block circulant matrices. First we note that the covariance matrix can be embedded into a block circulant matrix specified by the matrix . We then notice that each vector defines a circulant submatrix. To compute the determinant and eigendecomposition of each block circulant matrix, it suffices to simply decompose each of them using FFT. Finally, the model likelihood can be computed in .
2.2.2 Process Inference
Activity state inference can be conducted using standard filtering approaches similar to classical Hidden Markov Models (HMM) (Rabiner, 1986). Specifically, we are interested in estimating the current state at time given all previous observations.
Firstly, denote as a collection of variables for . The expression indicates that is in state between times and but no later or sooner. In other words, will be in a different state before and after time and respectively. With this, the activity detection problem can be expressed by the following maximum a posteriori (MAP) filtering problem:
(17) 
It can be solved recursively using dynamic programming. Specifically, at any given time point we need to compute the following forward and backward variables respectively:
(18) 
(19) 
Since we are merely interested in filtering the current activity states or contexts in an online fashion, we need only compute the forward variables . The recursion for computing them can be expressed as follows:
Here, is the transition probability from staying in state for duration, , towards staying in state for duration . Based on our model, this is a semiMarkov jump process, with state specific durations modeled as gamma distributions. Hence, this probability can be expressed as:
rCl
a_(i,d’)(j,d) &≔& P( x_[td+1:t] = j — x_[tdd’+1 : td] = i )
& = & Gamma(d’ — γ_i) P(x_td+1 = j — x_td = j) ×
& & Gamma(d — γ_j)
& = & Gamma (d’ — γ_j) P_ij Gamma(d — γ_j).
In (2.2.2), denotes the observation probability that is modeled as a switching Gaussian process. Specifically, this quantity can be expressed as follows:
rCl
b_j(y_t+1 : t+d) & ≔& P(y_t+1 : t+d — x_[t+1 : t+d] = j)
& = & N(y_t+1 : t+d — m_j^t+1 : t+d, K_j^t+1 : t+d),
where and are the predicted means and covariances from time points to .
3 The Adaptive Monitoring Framework
In this section, an adaptive monitoring method is proposed based on the hidden semiMarkov model with switching Gaussian Process. In particular, with the previous process inference procedure, we can derive the activity state predictive distribution. The prediction uncertainty can then be characterized by the entropy of the derived distribution. In the cases when we are confident about the activity state in the next time point based on the predictive entropy, we may not need all the sensor data for the next observations, and therefore we can choose only a subset of the sensors for the sake of saving energy. Adaptive monitoring will determine which group of sensor measurements may not be necessary, if we predefine groups of features based on the senor monitoring resource allocation requirements.
The set of predefined feature groups can be denoted as , from which one of the feature group is chosen at each observation time to observe signals . For each feature group , we also define its cost related to power consumption, which is set to a constant for simplicity. The selected feature group should balance the energy cost and prediction uncertainty gain. This is achieved by minimizing the following loss over all predefined feature groups:
(21) 
where denotes all the previous observations of feature groups selected from following the same rule, is the entropy function. In the loss function, the first term averages over the predictive distribution of given and it describes the remaining uncertainty of after observing , we would like to minimize this value to make sure that ignoring feature group does not increase much prediction uncertainty. At the same time, we also would like to achieve small energy cost .
Loss Function Computation
The minimization of (21) does not have an analytic solution form and must be solved approximately. We first approximate the expected predictive entropy in (21) by drawing samples from . For each of these samples, we then calculate the corresponding entropy function .
To derive the sampling probability , we adopt the following message passing algorithm:
which gives a Gaussian mixture distribution with respect to . For each samples , we can easily calculate based on (18). Then the loss function can be approximated by:
(23) 
where is the number of samples drawn from .
4 Results and Discussion
We apply our model to the popular UCI benchmark of “Human Activity Recognition using Smartphones” (Anguita et al., 2013)
. This dataset contains sensor measurements from a group of 30 volunteers performing six types of activities of daily living, including “walking” (1), “walking upstairs” (2), “walking downstairs” (3), “sitting” (4), “standing” (5), and “laying” (6). The sensor measurements are all taken at a constant rate and labeled manually with the corresponding activity type. We preprocess the data by Principal Component Analysis (PCA) to reduce computation overhead. Specifically, 10 principal components are derived from the 561 original features. We assume a constant mean for each activity by computing the population mean corresponding of each.
We evaluate the performance through two sets of evaluations. For the first evaluation, we assume activity states are known, and evaluate by the mean squared error (MSE) and absolute difference error (ABS) for trajectory prediction. In our second evaluation, we conduct joint activity recognition and trajectory prediction, where we only have observations up to the time of prediction. Several different Gaussian process setups and assumptions are compared to select the best performing model.
3 Model Setup  MSE  ABS 
3 Baseline + Separate time dependence + Separate Multivariate  0.4988  0.4480 
Baseline + Separate time dependence + Combined Multivariate  0.4040  0.4244 
Separate time dependence + Combined Multivariate  0.3852  0.4235 
4.1 Trajectory Performance Prediction assuming Known Activity States
We compare the sensor signal trajectory prediction performance under three different Gaussian process model assumptions. The first assumption uses two Gaussian process components: a baseline model to model the entire trajectory of all patients and an activity specific model to capture the varying dynamics of the different activities. We assume that either the temporal and multivariate dependencies are modeled separately for each activity, or the multivariate dependencies are combined between different activities. The prediction accuracy of these two models are shown in the first two entries of Table 1. Our best performing model, however, considers only a single multivariate effect model for different activities with no special consideration for any baseline trajectory. We use this best performer’s assumption for the remaining experiments. We set the ratio between observed measurements and heldout measurements to be 1:4. Figure 2
demonstrates prediction results of different variables where the model is able to follow the trajectories of the heldout data accurately. Moreover, the sensor signals for different activity states also have different variances, indicated by varying widths of the predicted confidence intervals.
We compare the activity state prediction results in two tasks. Figure 4 illustrates the first task, where the learned semiMarkov jump process duration distributions manifest a difference in duration between walking upstairs and downstairs compared to the other activities. Figure 3 demonstrates activity state prediction results in our second task. In this task, the average testing prediction accuracy can go up to 74.21%. Additionally, a large portion of inaccurate estimations are due to the time lag between the actual and predicted activity switch, which is a common problem in filtering tasks that need a certain amount of measurements in order to be confident in an activity switch.
4.2 Adaptive Monitoring
Finally, We implement the adaptive monitoring on the same UCI dataset with same training and testing setup. To perform sensor selection, we treat the 10 principal components as 10 different sensors and assume they have the same energy cost. We then define feature subsets consisting of all possible combinations of 4, 7, and 10 sensor measurements. Thus, the energy cost for each group is proportional to the number of measurements in the group. In our experiments, we vary the energy cost of a single sensor from 0 to 1 and view its impact on prediction performance.
As shown in Figure 5, we see that as the energy cost is increased, the prediction accuracy decreases as each sensor measurement comes at a higher cost. On the other hand, when there is no energy cost, the monitoring plan will prefer to utilize all of the sensor measurements available to perform the best prediction possible. A sample of the context prediction trajectory with energy cost
, which exhibits the most activity in adaptive feature selection, is shown in Figure
6. In this specific example, we are able to get an accuracy of 79.26% with an average sensor usage of 73.42%. The most used features are the leading principal components indexed starting from 10. This indicates that the leading principal components are better correlated to the activity recognition task compared to lower principal components. Moreover, this also shows that our adaptive monitoring framework can detect the most relevant principal component features without knowing their order in advance.5 Conclusion
In this paper, we propose a hierarchical model consisting of a multivariate switching Gaussian process to model the signals based on different activity types. We applied our model on trajectory and activity prediction with the UCI dataset for model verification. MSE for trajectory prediction can be as small as 0.3852, and activity recognition accuracy can reach 74.21%. Based on this model, we proposed an adaptive monitoring approach balancing the tradeoff between sensor energy cost and prediction uncertainty. Within this monitoring scheme, we characterize the tradeoff between monitoring accuracy and sensor energy efficiency. We show that our activity recognition scheme can stay robust and perform well under energy restrictions.
Acknowledgement
This project is in part supported by the Defense Advanced Research Projects Agency (FA87501820027) and the National Science Foundation Awards CCF1715027 and CCF1718513. Part of the computing time is provided by the Texas A&M High Performance Research Computing.
References
 Anguita et al. (2013) Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; and ReyesOrtiz, J. L. 2013. A public domain dataset for human activity recognition using smartphones. In ESANN.
 Ardywibowo et al. (2018) Ardywibowo, R.; Huang, S.; Gui, S.; Xiao, C.; Cheng, Y.; Liu, J.; and Qian, X. 2018. Switchingstate dynamical modeling of daily behavioral data. Journal of Healthcare Informatics Research 2(3):228–247.

Barber (2012)
Barber, D.
2012.
Bayesian Reasoning and Machine Learning
. Cambridge University Press.  Bonilla, Chai, and Williams (2008) Bonilla, E. V.; Chai, K. M.; and Williams, C. 2008. Multitask gaussian process prediction. In Advances in neural information processing systems, 153–160.
 Byon, Ntaimo, and Ding (2010) Byon, E.; Ntaimo, L.; and Ding, Y. 2010. Optimal maintenance strategies for wind turbine systems under stochastic weather conditions. IEEE Transactions on Reliability 59(2):393–404.
 Davis (2012) Davis, P. J. 2012. Circulant matrices. American Mathematical Soc.
 Funahashi and Nakamura (1993) Funahashi, K.i., and Nakamura, Y. 1993. Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks 6(6):801–806.
 Garnett, Osborne, and Roberts (2010) Garnett, R.; Osborne, M. A.; and Roberts, S. J. 2010. Bayesian optimization for sensor set selection. In Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, 209–219. ACM.
 Gunawardana, Meek, and Xu (2011) Gunawardana, A.; Meek, C.; and Xu, P. 2011. A model for temporal dependencies in event streams. In Advances in Neural Information Processing Systems, 1962–1970.
 Harvey (1990) Harvey, A. C. 1990. Forecasting, structural time series models and the Kalman filter. Cambridge university press.
 He et al. (2006) He, T.; Krishnamurthy, S.; Luo, L.; Yan, T.; Gu, L.; Stoleru, R.; Zhou, G.; Cao, Q.; Vicaire, P.; Stankovic, J. A.; et al. 2006. Vigilnet: An integrated sensor network system for energyefficient surveillance. ACM Transactions on Sensor Networks (TOSN) 2(1):1–38.

Karantonis et al. (2006)
Karantonis, D. M.; Narayanan, M. R.; Mathie, M.; Lovell, N. H.; and Celler,
B. G.
2006.
Implementation of a realtime human movement classifier using a triaxial accelerometer for ambulatory monitoring.
IEEE transactions on information technology in biomedicine 10(1):156–167.  Krishnamurthy (2016) Krishnamurthy, V. 2016. Partially Observed Markov Decision Processes. Cambridge University Press.
 Lukowicz et al. (2004) Lukowicz, P.; Ward, J. A.; Junker, H.; Stäger, M.; Tröster, G.; Atrash, A.; and Starner, T. 2004. Recognizing workshop activity using body worn microphones and accelerometers. In International conference on pervasive computing, 18–32. Springer.
 Morere, Marchant, and Ramos (2017) Morere, P.; Marchant, R.; and Ramos, F. 2017. Sequential bayesian optimization as a pomdp for environment monitoring with uavs. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, 6381–6388. IEEE.
 Poppe (2007) Poppe, R. 2007. Visionbased human motion analysis: An overview. Computer vision and image understanding 108(12):4–18.
 Rabiner (1986) Rabiner, L. R. 1986. An introduction to hidden markov models. ieee assp magazine 3(1):4–16.
 Torres et al. (2005) Torres, J. L.; Garcia, A.; De Blas, M.; and De Francisco, A. 2005. Forecast of hourly average wind speed with arma models in navarre (spain). Solar Energy 79(1):65–77.
 Wang et al. (2018) Wang, Y.; Nguyen, T.; Zhao, Y.; Wang, Z.; Lin, Y.; and Baraniuk, R. 2018. Energynet: Energyefficient dynamic inference. NIPS Workshop on Compact Deep Neural Networks with Industrial Applications.
 Wiser et al. (2008) Wiser, R.; Bollinger, M.; Barbose, G.; Belyeu, K.; Hand, M.; Heimiller, D.; Lew, D.; Milligan, M.; Mills, A.; Moreno, A.; et al. 2008. Annual report on us wind power installation, cost, and performance trends: 2006.

Wu et al. (2018)
Wu, J.; Wang, Y.; Wu, Z.; Wang, Z.; Veeraraghavan, A.; and Lin, Y.
2018.
Deep kmeans: Retraining and parameter sharing with harder cluster assignments for compressing deep convolutions.
In International Conference on Machine Learning, 5359–5368.  Yu (2010) Yu, S.Z. 2010. Hidden semimarkov models. Artificial intelligence 174(2):215–243.