Entropy and ShaPe awaRe timE-Series SegmentatiOn forprocessing heterogeneous sensor data
Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi-dimensional time-series that is formulated to exploit the entropy and temporal shape properties of time-series. ESPRESSO differs from existing methods that focus upon particular statistical or temporal properties of time-series exclusively. As part of model development, a novel temporal representation of time-series WCAC was introduced along with a greedy search approach that estimate segments based upon the entropy metric. ESPRESSO was shown to offer superior performance to four state-of-the-art methods across seven public datasets of wearable and wear-free sensing. In addition, we undertake a deeper investigation of these datasets to understand how ESPRESSO and its constituent methods perform with respect to different dataset characteristics. Finally, we provide two interesting case-studies to show how applying ESPRESSO can assist in inferring daily activity routines and the emotional state of humans.READ FULL TEXT VIEW PDF
Human Activity Recognition from body-worn sensor data poses an inherent
Sensor-based time series analysis is an essential task for applications ...
With the propagation of sensor devices applied in smart home, activity
Nowadays, multi-sensor technologies are applied in many fields, e.g., He...
Time series has wide applications in the real world and is known to be
Recognizing activities of daily living (ADLs) plays an essential role in...
In this article we address the problem of separation of shape and time
Entropy and ShaPe awaRe timE-Series SegmentatiOn forprocessing heterogeneous sensor data
Today, there is a growing demand for data mining technologies to transform the complex, unwieldy data collected from a broad diverse range of wearable devices, smartphones, and sensors into compact and actionable information. Whilst supervised methods work well, they require carefully labelled samples. Annotating datasets of wearable sensors can be challenging for a couple of reasons. In addition to the privacy issues associated with collecting human data, the huge volume of data and hierarchical structure of human activities can make the annotation process time-consuming, expensive and sometimes even infeasible. Consequently, unsupervised and self-supervised techniques have gained a lot of attention (Saeed et al., 2019). Furthermore, automatic knowledge extraction techniques are required to factorise this large volume of sensor data into interpretable pieces of information.
Time-series segmentation is the process of partitioning time-series into a sequence of discrete and homogeneous segments. We propose a new multivariate time-series segmentation technique to be used as a preliminary processing or exploratory data analysis step prior to tasks such as prediction, feature selection, semi-supervised or unsupervised classification. Therefore, the motivation of this paper is to enable a deeper unsupervised exploration of wearable sensor datasets by factorising them into a set of atomic primitives of physical action or emotion. By enabling these primitives to be discovered, the process of feature engineering can be accelerated, whilst in addition, greater insights into the underlying properties of the data can be learned.
Recent studies in Human Activity Recognition (HAR) have demonstrated the effectiveness of using temporal segmentation in combination with classification (Aminikhanghahi and Cook, 2019; Liono et al., 2016; Shoaib et al., 2016; Wang and Zheng, 2018; Chamroukhi et al., 2013). In addition to HAR, time-series segmentation have been applied to other modeling tasks with wearable sensors, including trajectory prediction (Sadri et al., 2018), motion-based user authentication (Huang et al., 2019b), life-logging (Chavarriaga et al., 2013), elderly rehabilitation (Lam et al., 2016)
, anomaly detection techniques(Rajagopalan and Ray, 2006), predictions (Song et al., 2018; Aminikhanghahi and Cook, 2017; Song et al., 2016), and feature selection (Lee et al., 2018).
In pervasive computing applications, the time-series being collected will often be heterogenous encompassing a diverse range of characteristics with respect to their dimensionality, continuity, statistical properties and shape. Figure 1 shows two time-series with very different properties. Figure 1(a) shows the repetitive temporal shape patterns of the human heart measured with a wearable electrocardiogram (ECG). Figure 1(b) shows a sequence of human postures that have been measured with a passive RFID tag array; the RSSI of each posture have different statistical properties. This figure clearly shows the semantics of each use case should be extracted by exploiting different time-series properties. Statistical changes can be used to segment the human postures with high precision, however, temporal shape changes will fail in distinguishing these segments. In contrast, exploiting temporal shape changes in the ECG data will be advantageous to segment abnormal heartbeats (the middle segment of the ECG) compared to using statistical changes that are more uniform across the segments in (a) than (b).
While current time-series segmentation methods exploit individual characteristics of the signal, that include the temporal shape, statistics, or probability distribution, we propose Entropy and ShaPe awaRe time-series SEgmentation (ESPRESSO), a hybrid model that incorporates multiple signal characteristics. To achieve this, ESPRESSO integrates the search and score based mechanisms of segmentation through a newly proposed shape representation, WCAC, and a greedy search that exploits a non-parametric entropy based cost function. The segmentation results are then further enhanced by devising an embedded channel ranking algorithm. ESPRESSO has been developed to accurately segment a wider range of time-series by relaxing some of the assumptions imposed by statistical or temporal shape-based methods.
The main challenges of current multi-dimensional time-series segmentation approaches are as follows: 1) Model assumptions: Models make parametric assumptions about the underlying properties of time-series that can limit its application. 2) Model parameterisation: Segmentation models generally utilise a number of parameters and thresholds that need to be carefully tuned based upon domain knowledge. 3) Channel ranking: In multi-dimensional time-series, not all dimensions (channels) are equally important to achieving accurate segmentation. Although there are numerous supervised channel selection techniques for classification tasks, there is little work on ranking the relevance of channels for unsupervised segmentation.
Current work on temporal shape-based segmentation (Gharghabi et al., 2019) operates under the principle that similar shaped patterns are associated with the same segment class and occur within close temporal proximity. This assumption, however, can lead to degraded segmentation performance under any of the following conditions: a) Several instances of the same class (with the same label) repeat multiple times across the time-series; b) segment classes are not comprised of repeated shape patterns; c) shape patterns drift with time. Each of these conditions are commonly encountered in wearable sensor use-cases. We propose a temporal shape-based segmentation method, Weighted Chain Arc Curve (WCAC)
, to address these limitations. In addition to temporal shape-based methods, there are a range of statistical based segmentation approaches. Such approaches have commonly employed parametric models in the form of Probability Density Functions(PDF) (Basseville et al., 1993; Hallac et al., 2018; Ni et al., 2016) and auto-regressive models (Takeuchi and Yamanishi, ) but have somewhat limited application given they impose strong assumptions upon the statistical properties of the time-series. Non-parametric kernel based methods have been proposed (Aminikhanghahi and Cook, 2019; Kawahara and Sugiyama, 2012; Liu et al., 2016; Yamada et al., 2013) to offer greater modelling flexibility, but can be difficult to train and provide poor estimates across smaller sample sets.
The main contributions of this paper are as follows:
ESPRESSO is a novel time-series segmentation approach which integrates temporal shape and entropy based properties of multidimensional time-series. Unlike most state of the art methods that require several carefully tuned parameters, ESPRESSO only depends on one parameter which can be selected with minimal risk, given ESPRESSO’s performance is shown to be relatively consistent with respect to this parameter. ESPRESSO
is shown to outperform four state-of-the-art segmentation methods in terms of their F-score and RMSE measure across seven public datasets of wearable sensors in this experiment.
We propose WCAC to address particular limitations of existing shape-based segmentation approaches. In contrast to other temporal shape methods, WCAC can accommodate both repeated segments and shifts in temporal shape across the time-series.
An embedded channel ranking has been utilised in ESPRESSO to make segmentation more robust to noisy and/or irrelevant channels.
We categorise time-series datasets with regards to their continuity and repetitive patterns. An ablation study is performed to evaluate ESPRESSO’s performance with respect to these categories.
Finally, we demonstrate the interpretability of segmentation results obtained by ESPRESSO for two real-world use-cases. The first study shows the ability of ESPRESSO to discover deviations in the daily routines of people through life-logging data. The second study shows ESPRESSO can identify the emotional states of people and provide an interpretation of their emotional transitions.
Although the term segmentation is frequently used in the time-series literature, we focus upon approaches that partition time-series from the bottom up by using salient changes in the series to identify individual segment boundaries, or from the top down by identifying the segment boundaries that optimise a cost function across an entire time-series. First, we review current applications of time-series segmentation in the field of wearable sensors and device-free dataset. We then provide an overview of general time-series segmentation approaches and highlight the current limitations of these approaches.
For wearable sensors, segmentation methods use a fixed-length sliding window.. Authors in (Shoaib et al., 2016) have compared the effect of window size in detecting different types of activities in HAR applications. To estimate the effect of window size on activity recognition, they divide activities into two main groups: simple activities with periodic actions such as running and waving and more complex, non-periodic actions such as drinking coffee. They found that shorter windows were shown to be effective for simple periodic activities but less reliable to represent the more complex activities. Consequently, recent works, such as (Peng et al., 2018), have exploited different sized sizes to represent activities of varying complexity. The (Liono et al., 2016) method proposed an optimization approach to find the optimal window size for activity segmentation, however, this can be challenging task to undertake given the variety of sensors and activities associated with real-world applications.
Temporal segmentation has been shown to be an important pre-processing step in high-dimensional wearable and device-free sensor applications (Lin et al., 2016; Haladjian, 2019; Bulling et al., 2014). Segmentation has been mentioned as an open challenge in analyzing life-logging data from wearable sensors in order to have more accurate models (Chavarriaga et al., 2013). A recently published wearable development toolkit, WDK, has incorporated time-series segmentation methods; this demonstrates the impact that these technique have had on wearable sensing applications (Haladjian, 2019). Authors in (Aminikhanghahi and Cook, 2019, 2017) showed that applying classification on top of a segmentation method produced more accurate results than performing classification with a fixed-length sliding window. They proposed a new probability metric, SEP to improve current probability density-ratio change point detection techniques in smart home applications. The authors in (Wang and Zheng, 2018) propose a simple segmentation method that exploit knowledge of the statistical characteristics of low-level human activities (i.e. walking, running) to segment RFID signals. Temporal segmentation of motion data has been studied extensively. The method in (Noor et al., 2017)
proposed using decision trees to find the split points. The authors in(Chamroukhi et al., 2013) proposed a multiple regression model based upon Expectation Maximization, MRHLP, which identifies activity boundaries as the points where there is a switch in the underlying models.
In addition to classification problems, the authors in (Sadri et al., 2018, ) have shown that using temporal segmentation inconjunction with a prediction model leads to performance improvements. They have utilized an entropy-based temporal segmentation method to improve the prediction quality of user activity trajectories. For a user identification and authentication application, (Huang et al., 2019a) devised a sequence labelling based segmentation approach to extract physical and behavioural characteristics of individuals from a sequence of their daily activities. Segmentation has also been used for feature selection in datasets of multi-dimensional human activity motion, electroencephalogram (EEG) signals and speech signals (Lee et al., 2018). (Sarker et al., 2017; Sarker and Salim, 2018) found temporal segmentation could be used to identify a user’s behavioral characteristics from their smartphone usage data. Table 1 summarizes some recent wearable sensor applications that benefit from unsupervised segmentation. Other than human-centric applications, time-series segmentation has been applied to a broad range of fields such as sensor data processing, environmental modelling, financial events, music and speech processing, energy consumption predictions and so on. (Aminikhanghahi and Cook, 2017) provides a detailed review of time-series change point detection methods.
|Year||Paper||Applying segmentation for …|
|2019||(Aminikhanghahi and Cook, 2019)||Activity recognition in smart-home|
|(Huang et al., 2019a)||Authentication and Identification|
|(Lee et al., 2018)||
|2018||(Sadri et al., 2018)||User trajectory prediction|
|(Wang and Zheng, 2018)||HAR using reflection of RFID signals|
|2017||(Sarker et al., 2017)||
|(Noor et al., 2017)||
|2016||(Liu et al., 2016)||
|(Lin et al., 2016)||
|(Lam et al., 2016)||
|2013||(Chamroukhi et al., 2013)||
|(Li et al., 2013)||
|(Lin and Kulić, 2013)||
Time-series segmentation approaches can be divided into supervised and unsupervised techniques. Supervised methods infer the class labels of underlying time-series using binary or multi-class classifiers formed from Hidden Markov Models(San-Segundo et al., 2016) and Decision Trees (Noor et al., 2017) in order to identify segment boundaries. Unsupervised techniques are more commonly utilized than supervised approaches given they do not require training sets of segmented data. Instead unsupervised methods exploit the underlying signal properties to estimate the change points.
The statistical properties of a time-series are most frequently exploited in unsupervised segmentation. These methods can be categorised as top-down optimisation approaches (Hallac et al., 2018; Sadri et al., 2017) that search for the set of segments that maximise its particular cost function, or bottom up approaches that identify individual segment boundaries from local deviations in the time series (Basseville et al., 1993; Kawahara and Sugiyama, 2012; Liu et al., 2013; Ni et al., 2016; Takeuchi and Yamanishi, ; Yamada et al., 2013).
Sadri proposed a top-down temporal segmentation method, IGTS, which was based upon the information gain (IG) metric (Sadri et al., 2017). Segment boundaries were estimated by using a dynamic programming approach to maximise the IG of its constituent segments. A similar top-down approach was used in (Hallac et al., 2018), where a greedy search was used to identify the segment boundaries that maximize the regularized likelihood estimate of a segmented Gaussian model.
The statistical differences between time intervals have commonly been measured with the likelihood ratio formulation (Basseville et al., 1993; Kawahara and Sugiyama, 2012; Liu et al., 2013; Ni et al., 2016; Takeuchi and Yamanishi, ). Within this formulation, parametric models have been used to estimate the intervals as Probability Density Functions (PDFs) (Basseville et al., 1993; Ni et al., 2016), auto-regressive models (Takeuchi and Yamanishi, ) or state space models (Kawahara et al., 2007)
. The parametric assumptions of these models, however, limit the types of statistical changes that can be detected. For instance, by fitting a Gaussian distribution to segments such as in(Basseville et al., 1993; Ni et al., 2016)
, only differences in the mean and/or standard deviation of adjacent intervals can be used in segmentation. Whilst these limitations can be relaxed by considering non-parametric density estimation, this still remains a difficult estimation problem to address.
Flexible non-parametric solutions (Kawahara and Sugiyama, 2012; Liu et al., 2013) were proposed to compute the likelihood ratio without the need for density estimation. It was found that estimating the ratio of PDFs directly was a simpler problem to address than density estimation. Hence, a non-parametric Gaussian kernel could be successfully used for this purpose. Kullback-Leibler Importance Estimation Procedure (KLIEP) was used to directly estimate the ratio of PDFs (Kawahara and Sugiyama, 2012). Liu adopted the Relative unconstrained Least Square Importance Fitting (RulSIF) to directly estimate the relative ratio of PDFs (Liu et al., 2013). These non-parametric approaches for direct ratio estimation were challenging to train and required a cross-validation procedure for model selection. They also tend to produce poor estimates with small datasets. Yamada utilised the non-parametric additive Hilbert-Schmidt Independence Criterion(aHSIC) for time-series segmentation (Yamada et al., 2013). Change points were detected by using the aHSIC criteria to compute the dependency between time adjacent intervals and the pseudo label of statistical change between the intervals.
Whilst non-parametric approaches offer greater flexibility to modeling statistical change than the earlier parametric methods, they are not universally applicable to HAR applications. They assume statistical homogeneity within each segment and statistical heterogeneity between different segments. Whilst this assumption is appropriate for low level segmentation tasks, it will not always be valid for wearable sensing applications where extracted segments need to characterise complex actions, emotions and behaviours.
The temporal shape is another unique property of time-series that can be exploited in segmentation (Gharghabi et al., 2019; Huang et al., 2014) where changes in the temporal shape patterns of a time-series were used to estimate the segment boundaries. FLOSS, Fast Low-Cost Semantic Segmentation (Gharghabi et al., 2019) works under the principle that patterns of similar shape were each associated with the same segment class and occur within close temporal proximity of each other. The limitations of such assumptions were described in the Introduction section. In contrast to FLOSS, which is based on the most similar repeated patterns, the authors in (Huang et al., 2014) proposed a segmentation model based on rare temporal patterns. Although shape-based methods can be beneficial for time-series composed of repeated shape patterns, performance will degrade when segments are composed of diverse shapes or when the shape patterns of a segment drift over time. Recently, the authors of (Wang et al., 2019; Zhu et al., 2017) proposed a new pattern-based primitive, Chain, to discover a chain of similarly shaped patterns. To make shape extraction robust against pattern drift, we customize this idea in our proposed shape-based segmentation method.
While we follow the Matrix Profile framework introduced in (Yeh et al., 2016), for completeness, we firstly provide a definition of multi-dimensional time-series.
Definition 1. The high-dimensional time-series , is an matrix of samples and channels (time-series), such that , where denotes the th sample of the th time-series channel.
Definition 2. The subsequence of a time-series,
, is a vector of samples in channelranging between index and index . is the length of the subsequence.
Definition 3. Matrix Profile, , is a matrix where denotes the distance between subsequence and its nearest neighbor in . Here we employed Euclidean distance as a similarity metric to compare subsequences.
Definition 4. Matrix Profile Index or is a matrix where denotes the index of the nearest neighbor (the most similar subsequence) for the subsequence .
According to this definition, the most similar pattern to subsequence is with the similarity distance of . The authors in (Gharghabi et al., 2019) defined on top of in their shape-based segmentation technique.
Definition 5. is an arc between and and , , is a vector of the same length of time-series where denotes how many arcs, , cross the th time tick in th channel.
Problem definition. Given the multi-dimensional dataset of time-series, we attempt to detect the transition times ,,…, that are indicative of state changes in . These transition times represent the boundaries needed to extract segments. Our proposed method consists of three steps:
Extracting potential segment boundary candidates by analyzing the temporal shape across all dimensions;
Employing a greedy search over boundary candidates in order to identify the set of segments with a minimum average entropy;
Ranking channels and estimating the number of segments.
Figure 2 provides a visual overview of the work flow associated with our method.
In this section we describe different parts of our proposed segmentation technique.
The main assumption of our shape-based segmentation methods is that repeated patterns relate to the same class segments, and hence, occur within close temporal proximity. To extract the most similar shapes within each time-series, we utilized the MP technique (Yeh et al., 2016) and the definition (Gharghabi et al., 2019). The FLOSS (Gharghabi et al., 2019) method works under the principle that the will have its minimum values at segment boundaries based on the assumption that a large majority of arcs will be confined to individual segments with very few arcs crossing over segments. This assumption is more likely to be violated, however, if the dataset contains repeated segments. Figure 3 shows an example of a sequence of physical activities that starts with jumping, is followed by running and then returns to jumping again. In this case, running subsequences can find their most similar subsequence within the alternative running segment. This leads to a larger number of arcs spanning over the intermediate jumping segment, and hence, higher AC values being produced across this intermediate segment. This degrades the ability of FLOSS to estimate the activity transition times (comparing Figure 3(b) and Figure 3(c)). In order to address this issue, FLOSS (Gharghabi et al., 2019) defined a temporal constraint parameter to ignore arcs that are longer than a threshold. Setting this threshold, however, requires having detailed knowledge about the particular problem in question. Furthermore, a threshold places an upper limit upon the segment size that can be considered; this is undesirable from an algorithm design perspective. To address this problem, we propose a novel time-series representation primitive named Weighted Chained Arc Curve (WCAC) to capture the density of pattern repetition with time. The WCAC evaluates the arc according to the temporal distance between each pair of similar subsequences. We show in Section 5 that we can achieve a far more accurate segmentation result by using WCAC when compared to FLOSS.
To define WCAC, we first explain how a chain of similar arcs is calculated. To increase the robustness of the representation to noise and signal drift, we modified the AC definition to consider a chain of similar subsequences instead of only the most similar subsequence. The authors in (Zhu et al., 2017) proposed the time-series Chain as a new primitive to sit on top of the MP representation. We modified this chain definition to fit our problem. We believe that considering a chain of patterns is crucial in the context of Human Activity Recognition (HAR), given many activities are associated with motion patterns that can drift with time. We define as follows:
Definition 6. Chained Arc Curve, , is a set of similar subsequences in the th channel of input. is ordered in terms of their temporal distance to , and for any , we have and , where is the length of channel and is the size of the subsequence, denotes the length of the chain.
Figure 4 illustrates a simplified version of with arc chains of second order neighbours. If is the nearest neighbour (the most similar subsequence) of , and is the nearest neighbour of , then we add an arc between and (if there is no arc yet). The distance of the new arc will be considered as . Any higher order arcs will be included in the if the distance is less than the specified threshold. To avoid trivial matches, we ignore similar subsequences in an exclusion zone of samples before and after the location of the query. In order to ensure the extracted patterns in the chain are similar, we limit the length of the chain by defining a distance threshold between the first and last subsequence of the chain.
The other modification is to consider the locality of repeated patterns. Each arc in the is assigned a weight as an inverse function of its length. Consequently, arcs of a smaller length were provided with greater weight, given they were more likely to belong to the same segment instance than the arcs of a greater length, which were more likely to cross over other segments. WCAC, is defined as follows:
Figure 3(d) shows that is a more effective representation to estimate change points, as unlike the original (Figure 3(c)), there are two local minima in close proximity to the actual segment boundaries.
The algorithm to compute the and are provided in Algorithm 1 and 2 respectively. The chain () is initialized with the MP index that represents the arcs between each subsequence and its nearest neighbour subsequence, which is the most similar subsequence too (line 1). A set of second-level arcs are then constructed between each subsequence and the nearest neighbor of its own nearest neighbor (line 2). We then look for next level arcs to add to the current chain (lines 3-13). If the new arc meets the distance condition (line 6), it will be added to the chain in line 7. This process is repeated until there are no new arcs to add. To calculate (Algorithm 2), for each arc in , the weight will be updated according to the similarity () and normalized arc length () in lines 6-9.
Change point candidates are estimated based upon the intuition that actual change points should often coincide with either a local or global minimum in the . Figure 5 shows , for six different time-series including a sequence of sitting, running and sitting activities. The figure clearly illustrates that in each of the channels, there are local minima in close proximity to the actual change points. These change point candidates are used as the search space for the entropy-based segmentation described in the next section.
The shape-based segmentation is unable to detect segment boundaries of time-series that have non-repeating shape patterns. Furthermore, the representation can be biased due to the segment size variation across the time-series. Shorter segments are more likely to possess fewer arcs than longer segments, and hence, can be biased towards estimating change points across the shorter segments. Consequently, we utilise the Information Gain (IG) metric to evaluate each local minima of as a potential segment boundary. A greedy search procedure is implemented upon the set of boundary candidates in order to identify change points that minimize the entropy-based cost function in (2). Given the first term of the cost function () is the entropy of the whole time-series as a single segment (no change-point), which have a constant value, maximising IG is equivalent to minimizing the entropy of the constituent segments. The cost function is defined as follows:
where is the time-series of dimensions, is the set of segment boundaries that have been selected during previous shape-based segmentation, is the segment between the and th selected boundaries, and is the length operator. is the list of change points that are chosen by our greedy entropy based-method. is the Shannon entropy of the segment :
and is the area of segment of series divided by this segment summed across all time-series.
Algorithm 3 describes the greedy search used to estimate the segment boundaries (GreedyEntropySearch), whilst Algorithm 4 explains the complete ESPRESSO approach. During each iteration of the , the remaining segment boundary candidates (not currently in ) were used to split an existing segment of the time-series into two segments. The entropy of the new segments were then computed (line 6). The candidate that produced the two segments of lowest entropy were then selected and added to the set . This greedy search was repeated for each dimension of the time-series in (Algoirthm 4).
In this study, we showed that the segmentation accuracy was positively correlated with the entropy of the estimated segments in (2). Consequently, we attempt to exploit this discovery to rank the channels of our multi-dimensional time-series and select the channel that was most likely to provide a higher segmentation accuracy than . For each channel in , a set of boundary candidates () were estimated from its representation. A greedy search of (as outlined in section 4.2) was then used to segment based upon maximising the IG metric of (2).
The number of segments were estimated by analyzing the extent to which a new segment contributes to decreasing the entropy of the segmented time-series. The relationship between entropy and the number of segments have been proven to be monotonic increasing (Sadri et al., 2017), given the entropy of constituent segments will always decrease as is increased. The following knee point detection equation in (5), which was proposed by (Sadri et al., 2017), was used to estimate , where denotes the reduced entropy for segments.
We introduce the seven datasets and four benchmark segmentation methods used in our experiment. The metrics used to evaluate the performance of ESPRESSO and the benchmark segmentation methods are then defined. Finally, the results of the experiment and some use-case studies are presented. The source code is available at the GitHub page: https://github.com/cruiseresearchgroup/ESPRESSO.
Seven public datasets comprised of smartphones, RFID tags and different wearable sensors including motion sensors, physiological sensors and eye wear computing sensors have been used to test the segmentation performance of ESPRESSO and benchmark methods. Table 1 provides a detailed summary of the seven public datasets used in this experiment. We evaluated our method on two different types of wearable sensor datasets; continuous (C) and non-continuous (NC). The continuous datasets were comprised of sensor data collected across an uninterrupted sequence of different human activities. In these datasets, the activity transition times were manually recorded by human observers. The second type of datasets were comprised of individual recordings of human activity that were manually stitched together to form an activity sequence. Consequently, the transition between adjacent activities were far more discontinuous in the second type of datasets than the first type. In addition, the datasets were categorised based on whether they contained repetitive temporal patterns (R) or were exclusively non-repetitive patterns (NR). Table 3 presents the datasets and categories they are each associated with.
|HandGesture(Bulling et al., 2014)||10 hand gestures||Accelerometer,Gyrometer||133.3K||18||600|
|PAMAP(Reiss and Stricker, 2012; Reiss et al., 2011)||14 physical activities||Accelerometer,Temperature||42.4k||10||21|
|USC-HAD (Zhang and Sawchuk, 2012)||12 physical activities||Accelerometer,Gyrometer||93.6k||6||36|
|EYE state(Rösler and Suendermann, 2013)||close/open eye||EEG||2k||8||5|
|WESAD(Schmidt et al., 2018)||
|RFID(Yao et al., 2015)||12 physical activities||RFID Readers||15.3K||12||84|
|Emotion(Heinisch et al., 2018)||
* Length of each time-series, Number of dimensions, and Number of segments.
(Zhang and Sawchuk, 2012): The USC-HAD dataset includes twelve human activities that were each recorded separately across fourteen subjects. Each human subject was fitted with a 3-axis accelerometer and a 3-axis gyrometer that was attached to the front of the right hip and sampled at 100Hz. Activities were repeated five times for each subject and consisted of: walking forward, walking left, walking right, walking upstairs, walking downstairs, running forward, jumping up, sitting, standing, sleeping, elevator up, and elevator down. To perform experiments, the different set of activities were manually stitched randomly, therefore, USC-HAD was considered a NC dataset.
(Reiss and Stricker, 2012; Reiss et al., 2011): This dataset includes fourteen low-level (such as walking and sitting) and high-level (such as ironing which consists of two or more low-level) human activities undertaken by eight different subjects. Each subject was fitted with an IMU (inertial measurement units) sensor on their wrist, chest and ankle. Each participant was given both an indoor and outdoor activity schedule to perform the following activities sequentially: lying, sitting, standing, walking very slow, normal walking, Nordic walking, running, ascending stairs, descending stairs, cycling, ironing, vacuum cleaning, jumping rope and playing soccer. Each IMU collected observations of temperature, 3-axis acceleration, 3-axis angular velocity (gyroscope), and the 3-axis magnetic field (magnetometer) at a sampling rate of 100Hz. As a result of missing readings in some of the sensors, only a subset of this IMU dataset was used in the experiment; the data from all three accelerometers and the hand fitted thermometer.
(Bulling et al., 2014): Our experiment used the Hand Gesture dataset, a collection of twelve hand movement activities performed by two subjects. Activities were captured by three IMUs that were attached to the subject’s hand, upper arm and lower arm, respectively. The activities that were recorded within the experiments included: opening the window, closing the window, drinking, watering plants, cutting, chopping, stirring, reading a book, a tennis forehand, a tennis backhand and a tennis smash.
(Yao et al., 2015): In this experiment nine passive RFID tags were placed on a wall. The experimental dataset consist of six subjects that each performed twelve predefined postures between the wall and an RFID antenna. Each posture was performed for 60 seconds. RFID was a NC dataset given it was formed by concatenating the twelve postures for each of the six subjects.
(Heinisch et al., 2018, 2019): This dataset has been collated to study the physiological response to different emotional states and to identify the effect of physical activity on these emotion state. Five hours of physiological data from E4-wristband, Biosignalsplux device, and smart-phone were collected from 18 subjects with respect to three different emotion categories, High Positive Pleasure High Arousal (HPHA), High Negative Pleasure High Arousal (HNHA), and Neutral.
(Schmidt et al., 2018): WESAD is a well-known dataset for stress that has been acquired from multi-modal wearable sensors. This dataset is comprised of physiological and motion data from chest and wrist-worn sensors of 15 subjects. In this study, only chest-worn sensors with a down-sampling rate of 10 were used to detect stress, amusement and meditation segments.
(Rösler and Suendermann, 2013): This dataset consists of 14980 samples of 15 EEG sensors collecting eye state data for 117 seconds. The labels (close/open) are manually annotated using video collected during the measurements.
The performance of the proposed ESPRESSO method was compared to four state-of-the-art algorithms: Fast Low-Cost Semantic Segmentation (FLOSS) (Gharghabi et al., 2019), Information Gain-based Time-series Segmentation (IGTS) (Sadri et al., 2017), additive Hilbert-Schmidt Independence Criterion (aHSIC) (Yamada et al., 2013), and Relative unconstrained Least Square Importance Fitting (RuLSIF) (Liu et al., 2013). To avoid inconsistencies and implementation errors, we evaluated our method against benchmark algorithms with publicly available source code.
FLOSS is a shape-based segmentation method that sits on top of the Matrix Profile time-series representation, and IGTS, is an Information gain based segmentation method, which have both been described in previous sections. IGTS requires no input parameters, whilst FLOSS requires the subsequence length as its input parameter. RuLSIF is based on estimating the relative probability density ratio of subsequences. The number of subsequences in each round and regularization constant were fixed at 10 and 0.01, respectively, as suggested by the authors. To enable a fair comparison with this method, we evaluate its performance across different subsequence lengths. The final method, aHSIC
, is a multi-dimensional time-series segmentation combined with channel selection. Firstly, this method selects important channels using a supervised learning method and then scores each time step according to the proposed dependency measure regarding a pseudo-binary-label. The regularization constant and the kernel parameter was set as 0.01 and 1, respectively, as suggested in their paper. Segmentation performance was compared across a range of subsequence sizes that were unique to each dataset based on its sample rate and the minimum segment duration of 0.5 seconds. The range of subsequence sizes varied from the narrowest set of 10 to 40 samples for the RFID dataset to the widest set of 100 to 900 samples for the PAMAP dataset.
The performance of the segmentation algorithms were evaluated with respect to the following metrics:
F-score: the F-score is defined as the harmonic mean of the Precision () and Recall (). Each estimated segment was defined as a True Positive () when it was located within a specified time window of the ground truth segment boundaries and a False Negative () when it fell outside the time window of all the ground truth segment boundaries. When multiple segment estimates fell within a specified time window of the ground truth segment boundary, only the closest estimate was considered to be and the remaining estimates were considered to be False Positives (). As a consequence of the sampling rate of sensors in the dataset, the time window (i.e. segmentation threshold) was set to 0.5 seconds for the EYE dataset and 2 seconds for each of the remaining datasets.
RMSE: The Root Mean Square Error (RMSE) was computed between the ground truth segment boundary time and its nearest estimated segment boundary time. The RMSE was then normalized into the range of [0, 1] by dividing it by the time-series duration.
MAE: To compare the performance of the proposed shape-based segmentation method, , and the other shape-based segmentation method, FLOSS, we employed the Mean Absolute Error (MAE) as used in (Gharghabi et al., 2017). For this particular study, segmentation performance was evaluated as the MAE between the estimated segment boundaries and ground truth segment boundaries.
The F-score metric depends upon the selection of a threshold value. For example, consider the actual segment transition time is at 250 seconds and a threshold of five seconds is set. Suppose two segmentation methods estimate segments boundaries, A and B, at 256 seconds and 300 seconds, respectively. The F-score metric will evaluate the A and B estimates as False Negatives despite A being a far superior estimate to B. The RMSE and MAE, will address this problem given their continuous metric space ensures A will be represented as a superior estimate to B. Typically all metrics might incorporate the error of several transition estimates that are in closest proximity to a single ground truth boundary. Consequently, we ensure each ground truth boundary is exclusively mapped to only one segment boundary estimate to ensure that metrics are not biased by change point estimates being clustered around a subset of segment boundaries. Existing studies often evaluate performance using one of these metrics, however, we believe including the F-score as well as RMSE (or MAE) provides a more comprehensive evaluation.
In this section, we first compare the effectiveness of the proposed shape-based segmentation technique, against FLOSS. Then, we investigate the performance of ESPRESSO against four state-of-the-art segmentation techniques. Finally, we undertake an ablation study to compare the performance of the shape and entropy based components of ESPRESSO.
In this section, we compare the performance of our proposed shape-based segmentation method, , and state of the art shape based segmentation method FLOSS. To compare the effectiveness of these methods, the Hand Gesture, USC-HAD and RFID datasets were selected given they contained a diverse set of sensors and contained repetitive temporal patterns. The experiments were repeated over a set of subsequence lengths ranging from between 10 and 40 samples for the Hand Gesture dataset 7, 50 and 550 samples for the USC-HAD dataset 7 and 20 and 100 samples for the RFID dataset 7. The subsequence lengths were set based upon the sampling rate of each dataset. The minimum subsequence length was set to 0.5 seconds, whilst the maximum subsequence length was set to half of the minimum segment size in the dataset.
Figure 7 shows the segmentation performance with respect to the Mean Absolute Error (MAE). These figures show that the proposed method does not only have consistently superior segmentation performance to FLOSS, but was far more insensitive to the subsequence length that was used.
The performance of ESPRESSO was compared to four competing segmentation techniques: FLOSS, IGTS, aHSIC, and RuLSIF. We performed extensive experiments across seven datasets and over a range of different subsequence lengths. Figure 7 compares the F-score of the ESPRESSO, aHSIC, RulSIF and FLOSS methods with respect to the subsequence length across three datasets; Hand Gesture, PAMAP and RFID. The IGTS method was not included in this comparison given it does not utilise subsequences in order to perform segmentation.
For the PAMAP and RFID datasets, ESPRESSO was shown to offer a very high level of segmentation performance that was superior to three benchmark algorithms across all subsequence lengths, apart from a small set of subsequence lengths (with 33 to 37 samples) in the RFID dataset where aHSIC’s performance was equivalent. In addition, ESPRESSO’s performance was consistently high across all subsequence lengths, a desirable property of the algorithm.
For the Hand Gesture dataset, ESPRESSO achieved superior segmentation performance to the benchmark methods for a large majority of the subsequence lengths. The exception was a narrow range of subsequence lengths (with 35 to 55 samples) where FLOSS outperformed ESPRESSO. FLOSS’s performance was found to be far more sensitive to subsequence length than ESPRESSO, given it exhibited a significant performance decline outside of this optimal subsequence range.
Table 4 shows the average F-score and RMSE of ESPRESSO and four benchmark methods across each of the seven datasets. For each of the methods apart from IGTS, the F-score and RMSE were averaged across all of the subsequence lengths and subjects of each dataset. For IGTS, the F-score was only averaged across the dataset subjects given its a top-down method that does not utilise subsequences. In the PAMAP and WESAD datasets, given the computational cost of aHSIC became prohibitively high over longer window sizes, the window size was fixed at 50 samples as suggested in their paper.
The segmentation results in Table 4 indicate that ESPRESSO was superior to the benchmark methods, on average, across all datasets with an F-Score improvement of 45.6%, 7%, 44.4%, and 45.2% over the FLOSS, IGTS, aHSIC, and RuLSIF methods, respectively. In addition, ESPRESSO had an average RMSE improvement of 140%, 21%, 92% and 224% over the FLOSS, IGTS, aHSIC, and RuLSIF methods, respectively. ESPRESSO had a clear advantage over the FLOSS, RulSIF, aHSIC methods across each of the datasets. The RFID dataset was the only one where IGTS was shown to be superior to ESPRESSO across both performance metrics. ESPRESSO was advantageous over IGTS across four of the seven datasets (RFID, Hand Gesture, USC-HAD and WESAD) given both of its performance metrics were superior. We hypothesize that RFID was an optimal dataset for a top-down entropy based approach such as IGTS, given the segments had salient statistical differences (as shown in Figure 1(b)).
The effectiveness of ESPRESSO was then examined with respect to particular dataset characteristics in Figure 8:
Datasets with continuous (C) and non-continuous (NC) segments: the datasets associated with the C and NC categories are shown in Table 3. ESPRESSO was shown to be superior to each of the benchmark methods for the C and NC categories with an average F-score of 0.59 and 0.67, respectively. Figure 8 shows ’s performance advantage over the statistical based methods ( IGTS, RulSIF) was more significant for the continuously recorded datasets than the non-continuous datasets. This suggests ESPRESSO was better equipped to detect segment transitions with higher correlation than statistical methods. In contrast, ESPRESSO’s performance advantage over the shape-based FLOSS method was greater for the NC datasets, in particular, the NC datasets with non-repeating temporal patterns.
Datasets with repeating temporal patterns (R) and non-repeating temporal patterns (NR): HAR datasets consist of physical activities that contain both repeating (such as walking or stirring) and non-repeating (such as sitting or opening the window) actions. The association between the seven datasets used in the experiment and action categories are shown in Table 3. In Figure 8, ESPRESSO’s performance was shown to be far superior across datasets that possess at least some repeated patterns (R) (average F score of 0.83) when compared to the datasets composed of non-repeating patterns (NR) exclusively (average F score of 0.6). This can be attributed to ESPRESSO using to identify potential segment boundary candidates; is far more effective at detecting segment transitions across time-series with repeating shape patterns. ESPRESSO was shown to be superior to each of the four benchmark methods across both categories of temporal patterns. Figure 8 shows ESPRESSO’s performance advantage was greater across R datasets than NR datasets. This could be attributed to ESPRESSO’s unique ability to exploit shape and statistical properties of time-series in the R datasets where both properties are useful to segment the combination of repetitive and non repetitive patterns.
An additional study is performed to investigate the temporal shape and statistical components of ESPRESSO independently. Table 5 compares the segmentation performance of ESPRESSO with its constituent shape-based method () and entropy-based method (GreedyEntropySeg). Furthermore, Figure 9 compares the segmentation performance of the and GreedyEntropySeg methods across the four dataset categories introduced in section 5.4.2.
The segmentation performance of ESPRESSO was largely attributed to the strong and consistent segmentation performance of the GreedyEntropySeg method with respect to the NC and NR dataset categories. In contrast, the method was shown to offer a more significant contribution to ESPRESSO’s segmentation performance for the R dataset categories (with repeating patterns). This can be attributed to needing to exploit repeated temporal shape patterns in order to perform accurate segmentation of the time-series.
* Combination of WCAC + GreedyEntropy Segmentation.
In this section we show how performing segmentation with the data of wearable sensors can help to extract a user’s daily life patterns and to identify any deviations in their daily routines. In this study, we used an existing life-logging dataset from NTCIR-13 Life-logging track (Gurrin et al., 2019). The data from three biometric sensors (calories burnt, heart rate, and skin temperature) and a step counter were used in this study. The aim of this study was to model the physical activity intensity of a user on a daily basis and then compare these in order to detect any activity deviations (Deldari et al., 2019). To extract different levels of activity intensity, we changed the granularity of the estimated segments from 4 segments to 11 segments per day.
Figure 10 shows the average extracted transition times over 19 weekdays (orange) and 8 holiday/weekend (blue). As we increase the number of segments in a day, the granularity of the task routine increased. There are several distinct differences between weekdays and weekends in terms of transition times, segment lengths at different times of the day, and the timing of the first and last estimated segments (morning and evening). For example, during the weekdays, the user usually starts the day at around 4:00 am, whilst at weekends, this segment starts at approximately 5:00 am. It should be noted that this start time is associated with changes in the biometric parameters and may be related to the subject’s biological clock. It also shows that users are likely to wake up later on the weekends due to having no work obligations. There was similar shifts in segment transitions in the middle of the day (noon) and later in the day (evening) which are highlighted using grey arrows in the Figure. To detect unusual daily patterns, we devised a threshold-based algorithm to find any deviation from normal daily routines. We analyzed the deviation in daily patterns from the reference routine and consider it as an unusual day
if the dissimilarity metric was greater than a predefined threshold. Using the corresponding images as ground truth, we attempt to explain the reasons behind the strong deviation in activity levels. Table6 shows the unusual days that had been identified and provides an interpretation of each day.
|Date||Reason of deviation|
|17-8-2016||The user left work earlier to shop and have lunch|
|24-8-2016||The user did not go to work.|
|29-8-2016||The user caught a bus instead of driving|
|30-8-2016 and 8-9-2016||The user caught a flight and then went back to work.|
The well-known stress and emotional affect dataset WESAD (Schmidt et al., 2018) were analysed with a new perspective. This dataset includes wearable physiological and motion sensors commonly used in medical applications such as electrocardiogram (ECG), electromyogram (EMG), Blood Volume Pulse (BVP), temperature, electrodermal activity (EDA), respiration and the accelerometer. Data is labelled in 5 different categories including: Baseline, Stress, Amusement, Medication and Not Defined (ND).
Through this experiment we aim to answer the following questions: 1) What sort of emotion transitions (for example transition from“Stress” to “Meditation”) are more detectable? 2) How long does it take for a physiological response to occur for each category of emotion?
We applied ESPRESSO to the data of 15 subjects in two sets of experiments. The first involved estimating the emotion transition times between “Stress”, “Amusement” and “Meditation” segments. The second involved estimating emotional transition times across the entire set of data containing Not Defined segments. For each experiment, we evaluated the estimated emotion transitions in terms of the True Positive Rate (TPR). Section 5.3 explains how we consider an extracted boundary as the True Positive. TPR is calculated by dividing the number of true positives by the total number of segments. The detection threshold was changed from 15 seconds to 275 seconds which accounts for the delay in the physiological response. Figure 11 shows the TPR across different threshold values for the emotion transitions. These figures show that transitions into and out of the “Stress” state can be accurately detected in less than 100 seconds. This is due to the strength of the physiological response to “Stress” and hints at the negative impact that stress can have on human health. In contrast, we hypothesise the “Amusement” to “Meditation” transitions are detected with lower accuracy for several reasons. Firstly, the duration of “Amusement” segments is much smaller than other emotion states. Secondly, the “Amusement” emotion is unlikely to have as strong a physiological effect as “Stress” may have. Thirdly, subjectivity is often introduced into such experiments. For the “Amusement” segments, subjects are provided with 11 “funny” clips, however, these clips may not be found amusing by all subjects. We believe this study opens a new avenue towards improving personality-inference based applications by considering the subject’s reactions and response time in different situations dynamically.
We propose a novel unsupervised method for multivariate time-series segmentation, ESPRESSO, and test it on a range of wearable sensing and device-free applications. ESPRESSO has a hybrid formulation that enables time series to be segmented on the basis of its temporal shape and statistical properties (i.e. entropy). The proposed temporal shape representation, the Weighted Chained of Arc Curve (), was used to detect potential candidates of segment boundaries. Segments were then estimated with the GreedyEntropySeg method that performed a greedy search upon this limited space of boundary candidates using an entropy-based metric.
Our proposed shape segmentation method, Weighted Chained of Arc Curve, , was shown to consistently outperform a state of the art temporal shape based method across three public datasets. This improvement was attributed to our novel primitive, , addressing the limitations of current shape based methods such as segmenting repeated segments or temporal shape patterns that drift over time.
Experiments were run across a diverse set of seven public datasets of wearable and device free sensors and showed that ESPRESSO achieved an average segmentation performance improvement (in terms of F-score and RMSE) over four state of the art methods FLOSS, aHSIC, RulSIF and IGTS. Furthermore, it was demonstrated that ESPRESSO outperformed the four benchmark methods across different categories of the data related to the repetition of patterns and the continuity of segments. An ablation study of ESPRESSO demonstrated the method offered a more significant contribution to segmenting time series with repetitive patterns, whilst the GreedyEntropySeg method offered a greater contribution to segmenting time series with non-repetitive patterns and time series composed of non-continuous segments.
We demonstrated the value of using ESPRESSO in two real-world use-cases of inferring daily activity routines and emotional state transitions in an unsupervised context.
Future work will involve extending the current method to an online version of the channel ranking algorithm where the set of channels accounting for system change can be dynamically selected. Currently, we use the highest ranked channel for segmentation. We also aim to consider subset of channels and their correlation for future segmentation based channel selection.
A unifying framework for detecting outliers and change points from non- stationary time-series data.. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, New York, NY, USA, pp. 676–681. External Links: Cited by: §1, §2.2.1, §2.2.1.
Proc. of 23th International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1, §2.2.1, §2.2.1, §5.2.