Modelling temporal series of data is important in many different domains, including disciplines as diverse as hydrology or economics, but also to monitor and understand human behaviours from wireless sensor networks in smart environments [1, 2, 3]
. Typically, different processes require different models to interpret and forecast new sensor data. The nature of the process, the amount of required data and the extend of the forecasting determine the kind of model finally chosen. Temporal models should be able to capture the frequencies of important event occurrences – e.g. the routine activities performed by a home-monitoring system for the elderly. Methods for frequency analysis (i.e. Fourier transform) can reveal periodic patterns in the sensor data but, if occurring only within specific time intervals, they fail to determine when these periodicities start and end. Moreover, short events that manifest in localized peaks of the sensor signal are difficult to be captured by standard Fourier analysis, unless a large number of frequency components are considered. And even with a high frequency resolution, temporal information – i.e. when those peaks are happening – is lost in Fourier analysis.
In this paper, therefore, we propose a new wavelet-based method that is suitable for modelling sparse periodic and/or very short events in sensor data. Wavelet analysis indeed has the advantage that it simultanously provides temporal and frequency information of a signal with very little loss of information, and it is therefore more powerful than Fourier analysis in capturing and forecasting sensor data in many real-world applications. One of these, Active & Assisted Living (AAL), is an important application area where good temporal representations of events can enable the implementation of many useful well-being services .
To this end, our wavelet-based temporal model can be used to identify patterns of human activity from smart-home sensors and detect anomalies in the occurrence of typical daily routines. The latter, indeed, have a significant temporal component, which is often periodic, but with occasional variations and very short-term events (e.g. repeatedly opening/closing the fridge in the morning, but only on weekdays). In particular, we adopt the anomaly definition in , which considers the amount of motion in specific locations as a normalized entropy beyond some given thresholds. Note that the term “motion” is used in a broad sense to include the activation of various binary sensors, such as passive infrared (PIR) motion detectors or contact sensors on doors, cupboards, etc. We also refer to this type of motion in the environment as activity level.
In this work, we apply our wavelet-based representation of human activities to a new anomaly detection system for AAL (see Fig. 1). In particular, given a set of smart-home binary sensors (i.e. motion detectors and contact sensors), we build accurate temporal models to represent and forecast their expected output. Then, using an entropy-based method 
, we estimate the current and expected levels of human activity. These two are finally compared by an original inference system based on an Hybrid Markov Logic Network (HMLN), to detect potential anomalies. The paper includes three main contributions:
First, we propose a novel technique for temporal modelling of (long-term) human activities based on wavelet transforms. Among its possible applications, this wavelet-based temporal model enables the forecasting of smart-sensor signals for the detection of potential anomalies, i.e. human activities that deviate significantly from the norm. A software implementation of this temporal modelling tool is made publicly available.
Second, we describe a new automatic system for anomaly detection that uses a HMLN to combine three sources of information about human activities, namely i) actual entropy level from smart-home sensors, ii) expected entropy from wavelet-based temporal models, and iii) expert knowledge in the form of logic rules.
Finally, we present extensive experimental results based on two large datasets, one previously recorded in an office environment  and a new one from a real elderly home, which we also made publicly available. These datasets were recorded in MongoDB format  for easy access and re-usability by the scientific community.
The remainder of the paper is organized as follows. Sec. 2 reviews state-of-the-art methods for temporal modelling and anomaly detection with smart-home sensors, including relevant public datasets. Sec. 3 briefly introduces the wavelet transform and describes the respective temporal models of sensor data. Sec. 4 explains the entropy-based method used to represent human activity levels in smart-home scenarios. Sec. 5 describes the design of the HMLN-based inference systems and its expert rules to analyse and detect anomalies in human activities. Sec. 6 illustrates the architecture and practical implementation of the anomaly detection system. Sec. 7 presents datasets and experiments to validate the effectiveness of the temporal models and the anomaly detection in an office and AAL scenarios. Finally, Sec. 8 discusses advantages and disadvantages of the proposed approach, suggesting directions for future work in this area.
2 Related work
A reliable temporal model of human activities can benefit many smart-home and robotics applications for AAL . Such model could help an automated system understand the current scenario and plan opportune interventions, for example by sending a mobile service robot to a human user when it is more likely to be actually helpful.
Temporal modelling is widely used to detect regular patterns in data. From time series analysis, a relevant tool is the autoregressive integrated moving average model (ARIMA) and its derivations , including the stationary process case described by the autoregressive moving average (ARMA) model. The main problem with these models is that they may become unstable  or are only suitable for relatively short temporal windows or known temporal trends .
Other non-linear techniques, such as Gaussian Processes , could theoretically achieve the full reconstruction of signals from mixture models. Similarly, Ghassemi & Deisenroth  use periodic Gaussian Processes for long-term forecasting. In 
, Poisson processes are used instead as probabilistic models to recognize patterns and, in combination with Markov Chains, to identify anomalies in the data. These models are typically robust against model instabilities, but they require heavy computational processes.
A technique called FreMEn (Frequency Map Enhancement) has recently been proposed for spatio-temporal representations of robot environments in long-term scenarios 
. It uses Fourier analysis to extract periodicities in sensor data, in combination with a Bernoulli distribution or Poisson processes to represent binary information states. FreMEn is a simple yet effective modelling tool, but it is not suitable to describe sparse or very short events.
Wavelet-based methods have been used for temporal modelling in many different fields such as drought or price forecasting [17, 18], passenger flow prediction , human motion analysis  or iris recognition . Since wavelets contain both frequency and time domain information, they are particularly suitable to represent sparse non-stationary signals.
Some temporal models are specifically tailored to the specific sensor or data source. For example, [21, 22] proposed spatio-temporal models of motion detectors in which an anomaly is seen as a significant deviation from the typical sensor response. Although relatively simple, this approach is very sensitive to potential misplacements or faults of the deployed sensors Alternative activity and temporal models were proposed by  using 4D-fluents (i.e. logic predicates that depend on time) to add a temporal layer on the top of an underlying description logic. , instead, proposed an Extended Episode Discovery model that defines habits in terms of length, frequency and periodicity for offline processing. In 
, the authors compare three sequential activity models – hmm, crf and sequential mln – where feature vectors were generated during fixed-time windows for on-line processing. These sequential activity models offer a straightforward approach to anomaly detection, which is not addressed in those works though.
Typically, anomaly detection systems are designed for the specific sensor(s) used. Depending on the input data, approaches may vary greatly. Wearable activity trackers like the one proposed by , for example, provide rich and continuous motion and pose information without requiring any additional preprocessing. But wearables can be forgotten, misplaced or misused by volunteers, leading to false anomalies in the datasets. Automated video sequence-based analysis, instead, does not require explicit user intervention. However, extra effort is needed to extract meaningful information from the input sequences. For example, Xu et al.  used a multiple one-class svm models to predict anomaly scores, while Leyva et al. used Markov Chains to detect abnormal events on a video stream. Compared to camera-based systems, smart-home sensors offer a cheaper alternative for anomaly detection .
Markov Logic Networks (MLNs) are both a modelling [29, 30] and inference [31, 32] tool, often used for their flexibility to define rich models. They are able to perform inferences using imprecise or incomplete inputs, useful to deal with sensor faults and network errors. In addition, they can blend both sensor data and expert logic rules within a probabilistic framework for robust inference in real time applications . Compared to the SVM and HMM-based system, the advantage of using MLNs for anomaly detection is that they require a smaller amount of sensor data to build their models and that they better handle uncertain information 
. SVM have been successfully combined with deep learning (DL) techniques for anomaly detection and achieved promising results in high dimensional problems, but without exploiting the available temporal information. The HMLN proposed in this paper combines wavelet-based temporal models and expert rules, mixing for the first time discrete and continuous predicates, to infer about potential anomalies. These expert rules allow also to overcome the lack of data otherwise required to train DL-based methods.
Public datasets with labelled sensor data are important to test and compare different algorithms. The dataset hosted by Tim van Kasteren111https://sites.google.com/site/tim0306/datasets  offers a collection of compressed Matlab files with several recordings of binary sensors (e.g. open/closed doors; pressure mats; motion detectors). The Center for Advanced Studies in Adaptive Systems (CASAS) also provides an extensive collection of datasets222http://casas.wsu.edu/datasets/ for activity recognition, in which every entry has a different format, usually a compressed text or binary file. The Smart project , even if focused on energetic sustainability and consumption management, created a wide collection of datasets from real houses, including smart-home sensors333http://traces.cs.umass.edu/index.php/Smart/Smart. All these datasets contain non-standard, plain text or binary files which are difficult to handle by other researchers, especially if of large size. They lack of an standarized format and access mechanisms, suitable for systematic data processing in big data. To our knowledge, there are no smart-home datasets based on such standardised and easily manageable formats. Our new dataset, instead, was created by storing raw data in a MongoDB database. This approach provides an accessible, platform- and application-independent format readily available for other research in our paper’s application area and beyond.
3 Wavelet-based Temporal Forecasting
In this section we present a novel approach to forecast sensor data for human activity monitoring using a wavelet-based temporal model. We start with a brief description of the discrete wavelet transform algorithm, and then we explain how to tune and use this algorithm for building our temporal model of the sensor data.
Standard Fourier analysis is useful for the frequency decomposition of signals, but it does not keep important time information. That is, we know which frequency components are present in a signal, but not when they are present. In addition, signal discontinuities are poorly represented by Fourier transform, since its basis is non-local. This is known as the Gibb’s phenomenon .
Wavelets provide an alternative representation that overcomes the limitations of Fourier analysis. They decompose signals into individual components, which maintain both frequency and time information. Also, they can effectively represent and provide localized information about discontinuities. These advantages (i.e. time-frequency and discontinuity representations) are very important to handle the non-periodic and often “spiky” nature of real-world sensor data, especially in the context of activity monitoring.
3.1 Discrete Wavelet Transform
A discrete wavelet transform (DWT) is a sampled wavelet transform applicable to digital signals. Let us consider a discrete time signal in the space, with finite energy and defined in the interval with a sampling frequency . This signal can be represented using the following orthogonal decomposition:
where is the orthogonal complement of subspace inside . The subspace can be further subdivided into two orthogonal subspaces , and so recursively:
defines the -level decomposition of the space.
The subspace maintains the time domain properties of the signal, whereas the subspaces
preserve its properties in the frequency domain. These time and frequency subspaces are generated by the following function families:
The scaling functions are weighted and displaced versions of a “father wavelet” function . They can also be obtained iteratively re-scaling a previous one. The parameter determines the scale and magnitude of the corresponding scaling function, keeping the energy constant. As a result, is only defined in the interval . Values of close to infinity will turn the scaling function into a delta function, whereas the opposite will lead to an almost constant (and low) value. Finally, the parameter determines the time displacement of the wavelet.
Similarly, every wavelet function is built scaling and displacing a “mother wavelet” , or recursively. However, they are related to the higher frequency components of instead of its average trends. Any function belonging to can then be represented by the following linear combination of subspaces:
where the averaging coefficients and detail coefficients are obtained using the following inner products:
This set of coefficients and the original wavelets are all we require to perform the inverse discrete wavelet transform (IDWT):
Using a geometric analogy, this process can be seen as a change of basis. For example, Fig. 2(a) shows a signal in the vector space defined by the function family . The DWT of , shown in Fig. 2(b), is a representation of the same signal but with a different basis defined by the functions in (3). The inner products in (5) are then used to obtain the coordinates in the new basis.
The DWT is usually performed by a bank of equivalent filters , as depicted in Fig. 3. The input signal is processed by a series of low- and high-pass filters and , respectively, and then subsampled to obtain the averaging and detail coefficients. The figure shows also how each group of coefficients is related to a specific range of frequencies. Fig. 4, instead, illustrates the frequency bands corresponding to the function families and . Here, the detail coefficients () concentrate on higher frequency bands depending on the decomposition level, while the averaging ones () belong to the narrow low-frequency band.
Using wavelets, we can study a signal using different frequency resolutions at once. Fig. 5 shows the scalogram of a signal generated by an infrared motion detector, installed in an office environment, over a period of 24h using the dataset from . In this representation, the axis shows the temporal displacement , while the axis indicates the scale (or period) of the DWT. Higher scale values of the scalogram correspond to higher frequencies of the signal, although with reduced temporal resolution. In the figure, we can see several peaks representing sudden spikes of the sensor data, repeated throughout the day, localized at certain temporal instants. The vertical bar on the left shows also the average energy per scale (or period) of the DWT, which can be interpreted as a discrete Fourier transform of the original signal.
3.2 Parameter Selection for Wavelet Transforms
In order to fully describe a DWT, we need to define its mother wavelet and decomposition level. The mother wavelet is usually chosen through quantitative or qualitative approaches. The former favour wavelets that are visually similar to the decomposed signals. The latter instead optimize specific parameters such as number of components to describe the signal, fidelity of the reconstructed signal, denoising capabilities of the chosen wavelet, etc.
In order to obtain the best possible fidelity, in our model we use a MSE criterion. Originally proposed by , this criterion chooses the mother wavelet that minimizes the error on the reconstructed signal.
The decomposition level is limited by the length of the signal and by the chosen wavelet. Looking at the bank of filters implementation in Fig. 3, we can see that every decomposition level halves the length of the signal. A practical rule is to stop the decomposition before the signal becomes shorter than the length of the low pass filter . Let be the length of the filter and the length of the signal. The maximum decomposition level is the following:
However, reaching the maximum decomposition level is not always necessary. In , for example, the authors proposed a method to choose the decomposition level based on the sparseness (i.e. number of zero-elements) of the signal. The same will be applied to our model to obtain a compact representation of the sensor data.
Another relevant parameter is the coefficient thresholding level. The number of coefficients obtained from the wavelet transform is initially equal to the length of the discrete input. Some of these coefficients, however, carry very little information, especially if the mother wavelet is optimal. We can discard coefficients below a thresholding level, and still reconstruct the original signal good approximation:
This approach is commonly used in imaged processing to remove noise and perform lossless compression . Here we will use a statistical threshold, originally proposed by , that preserves some statistics on the compressed signal. In practice, we will use the set of coefficients above a certain threshold that still allows a lossless reconstruction of the signal. All the remaining coefficients, below the selected threshold, will be removed from our sensor data model.
3.3 Sensor Data Modelling and Forecasting
After introducing the wavelet transform and its parameters, we can use them to model smart-home sensors and to forecast their data. Our model is an efficient representation of a generic temporal signal, similar to some compression techniques commonly used in image processing .
Let us consider the signal generated by a smart-home sensor over time. The sampling frequency of the sensor data is . Our training model signal is transformed into the wavelet domain using a 1-level DWT decomposition. Since the input data is relatively sparse (i.e. mostly containing localized activation peaks), a higher decomposition level would not bring any particular advantage to the resulting wavelet transform. We then threshold the wavelet coefficients and keep a significantly smaller number of them, while maintaining a low Root Mean Square Error (RMSE). We can finally reconstruct the signal using this small subset of coefficients and the inverse wavelet transform (IDWT).
Our wavelet-based model is therefore described by this subset of coefficients, a mother wavelet , the decomposition level , a coefficient threshold , the number of samples , the sampling frequency , and the time reference :
Once this model is available, it is possible to represent the sensor output at a future time instant . The model in (7) assumes that the sensor output has periodicity starting from time . The index of the sensor data sample at time is therefore given by the following equation:
and the actual sensor data sample can then be obtained from the reconstructed signal as follows:
In Sec. 7.2.1 we will describe an empirical method to determine the parameters of this model, including the most suitable mother wavelet and the thresholding level of the coefficients.
4 Entropy-based Activity Representation
4.1 Normalized Entropy
The metric used in our system to describe anomalous situations is based on the concept of entropy of a (discrete) probabilistic distribution
, as defined in information theory. Entropy is invariant to probability permutations and it describes the overall information contained in the distribution as follows:
Highly probable events carry little information, and therefore reduce the entropy. On the other hand, uniform probability distributions are characterised by high levels of entropy, denoting situations with significant amount of information (i.e. high uncertainty).
using the maximum entropy of a discrete uniform distribution. Such entropy is given by the logarithm of the total number of possible outcomes. Therefore, our normalized entropyfor a probability distribution with entropy and possible outcomes is defined as follows:
In an environment monitored by sensors, the above quantity defines a metric to measure the amount of information that is associated to the events detected by the sensors. A method to determine the probability of an “activity” event from a motion detector was proposed in  and it is described in the next section, extended to the general case of binary smart-home sensors.
4.2 Activity Levels
We consider a network of binary smart-home sensors (e.g. motion detectors, contact sensors, etc.) distributed around different rooms, areas, or objects of interest in an indoor environment (e.g. office and apartment in Fig. 10 and Fig. 11 of the experiments). We want to model the probabilities of human activities associated to those sensors and compute the normalized entropy for the whole environment.
In this case, the activity probability can be obtained observing the sensor’s output during a fixed time interval (i.e. 30 seconds). For example, motion detectors trigger an event whenever something moves within their detection field, while contact sensors can check whether doors have been opened or closed. From this, it is possible to observe for how long such activity was detected by sensor , that is, the amount of time that the sensor was “on”. Under the assumption that there are no overlapping sensors (i.e. each sensor covers a different room, area, or object), we can define the probability of an activity detected by :
The distribution of these probabilities provides some information about the current activity level in the environment, but it is not a good metric on its own to determine whether such activity should be considered “normal” or not. For example, the distribution depends on the order of the considered sensors, and a simple permutation of different sensor probabilities would change the distribution’s mean and standard deviation. This is illustrated by the example in Fig. 6: after the activity probabilities of two motion detectors in different rooms are swapped, the mean and the standard deviation of the distribution change significantly, whereas the total (normalized) entropy remains unaffected. The latter will be used therefore to represent the activity level in the environment as input for our anomaly detection system.
5 Anomaly Detection
Markov Logic Networks can be used to combine different sources of information for probabilistic inference. In this paper, we use both smart-home motion sensors and their wavelet-based models to analyse the difference between actual and expected entropy, respectively, of the environment. The first one represents the current activity level, whereas the second one represents the most likely one. These entropy values, together with direct sensor inputs and expert rules, provide the necessary information for our MLN to detect anomalous situations, as shown also in Fig. 1.
5.1 Hybrid Markov Logic Networks
MLNs combine both probabilistic and logical reasoning . Briefly, a MLN consists of a set of weighted first-order logic formulas or clauses. The latter include the following elements:
constants, which are possible objects in the domain of interest;
variables, describing a set of objects in that domain;
functions, mapping relations between different objects;
predicates, defining logical attributes or relationships over the domain’s elements, which can be combined into more complex formulas using logical connectors.
Functions, variables and constants are called terms. If they do not contain variables, they are ground terms. A predicate that contains only ground terms is a ground predicate. When a logical value is assigned to all grounded predicates in a network, we have a possible world.
Using evidences, MLNs can produce Markov networks that describe the probability of all possible combinations of grounded clauses. We can then perform inference on these Markov networks, usually by using approximate methods such as MC-SAT . Besides discrete evidence value, it is also possible to consider continuous ones using an extension called Hybrid Markov Logic Network (HMLN) . Thanks to the latter, we can thus consider predicates based on continuous variables that contain our entropy values of the activity levels.
5.2 Wavelet Model as Prior for HMLN
The wavelet-based sensor data model defined in Sec. 3.3 can be used to predict the expected output of a particular sensor based on historical data. From the expected output of all the sensors, it is also possible to compute the normalized entropy that represents the expected activity level for the whole environment (see Fig. 1). The entropy from all the real sensors represents instead the current activity level. These two activity levels, current and expected, are compared by the following HMLN to determine whether an anomalous situation is occurring.
We define two clauses to combine our sources of information: one to check whether the current entropy is above a certain threshold, and the other to compare current versus expected entropy. The occurrence of one or both conditions indicates a potentially anomalous situation at time , captured by the predicate :
Here is the 90% of . This threshold was first suggested in  as a statistically meaningful indicator of anomaly. The predicate in (13) and its clauses are represented by the blue connected nodes in Fig. 7, which shows the graph of a grounded HMLN at time .
An advantage of MLNs is that they can combine different logical rules. This allows us to include additional expert rules that describe “inappropriate behaviours”. For AAL applications, such rules could be provided by clinicians or professional carers and adapted to the specific person being monitored. For example, typical behaviours that are cause for concern in case of people with cognitive impairments include wandering and repetitive actions . In our system these can be monitored by means of motion detector and contact sensors on doors and appliances. Their outputs determine the state of the predicate , which is implemented in our HMLN as follows (see also yellow nodes in Fig. 7):
where is a contact sensor, is a motion detector, is the minimum time of a door left open for considering it an anomaly, and is the resting time interval suggested by some human expert (e.g. 11:00 P.M. to 7:00 A.M.).
The two types of anomaly are finally combined by the following predicate (central node in Fig. 7):
6 System Implementation
The solutions described in the previous sections have been implemented in ENRICHME444http://www.enrichme.eu, a research project integrating ambient intelligence and robotics to provide AAL services for elderly people wit mild cognitive impairments . The ENRICHME system monitors the activity of these people at home, exchanging information between a network of smart-home sensors, a mobile robot and an auxiliary Ambient Intelligence Server (AIS) (see Fig. 8). The latter consists of an embedded PC, located at home, which acts as a multiprotocol gateway, collecting and forwarding the information shared wirelessly between robot and smart-home sensors for monitoring human motion, doors/cupboards use, and energy consumption. The sensor network is based on the Z-Wave communication protocol and uses the OpenHAB middleware555http://www.openhab.org, which supports a wide range of different smart-home technologies with a uniform interface, decoupling sensor information from specific smart-home protocols and manufacturers .
The embedded PC for data recording and processing is an Intel NUC i7-5557U CPU @ 3.10GHz with 8 GB of RAM, running Linux OS Ubuntu 14.04 64 bits (see Fig. 8(a)). The smart-home sensors are commercial Z-Wave wireless devices produced by the Fibar Group666http://www.fibaro.com (see Fig. 8(b)). These sensors are small, easily deployable, widely available and have a long battery life.
The anomaly detection system is implemented as a Robot Operating System777http://www.ros.org (ROS) module making use of efficient MLN libraries for online inference . ROS provides a common framework for information exchange between AIS and robot, so that the latter can easily access the results of the HLMN inference engine. The HMLN can be queried using evidence provided by any ROS source, including the actual and expected house entropies obtained from the sensors and the wavelet models, respectively. The output of the inference process is also available to any other node on the ROS network, for example to trigger a specific robot behaviour or alert a remote telecare system.
The performance of our proposed solutions were evaluated using real data recorded from different scenarios. In this sections, we will first describe two different datasets: one already presented in  and one newly recorded. Then, we will use them to evaluate the forecasting capabilities of our wavelet sensor model compared to another similar tool in the literature. Based on these wavelet models, we will calculate the expected entropy levels of the testing environments and finally demonstrate their use as priors for anomaly detection.
7.1 Sensor Datasets
All the datasets were recorded using MongoDB, an open-source cross-platform document-oriented database. MongoDB is a NoSQL database program, using JSON-like documents with schemas. Compared to traditional log and spreadsheet files, this storage approach offers better data management and manipulation, which is particularly important for long-term datasets like ours. MongoDB provides also efficient and flexible querying methods, so we can easily retrieve any data interval, sensor set, or even combine data from other sources.
The first dataset was collected in an office environment (L-CAS dataset ) including: a lounge with sofas and a coffee table; a kitchenette with various appliances and cupboards for storing and preparing food; an entrance and a workshop area. This dataset contains data from ten different physical devices, which provided six different types of sensor data readings: humidity, temperature, light, energy consumption, motion, and binary contact (for door activation). The sensors were located in five different locations, and their data recorded every 30 seconds, generating more than 400,000 data entries in total.
More than ten people were working in the L-CAS premises during the recording. The sensors were mostly concentrated in places where a rich set of activities were typically performed (entering, exiting, eating, drinking, resting, etc.). Fig. 10 illustrates our sensors’ deployment and approximate area coverage. The dataset is split in two parts: the first one, used for training, includes sensor data continuously recorded for three months and a half; the second one includes one week of data used for testing.
|People in dataset||Duration (days)|
|L-CAS||Motion, Binary Contact, Humidity, Light, Energy Consumption, Temperature||Entrance, Fridge, Kitchen, Lounge, Workshop||492,441||Binary, Float, Integer||12||104|
|ENRICHME||Motion, Binary Contact, Light, Energy Consumption, Temperature||Entrance, Fridge, Kitchen, Bathroom, Bedroom, Livingroom, TV||33,838||Binary, Float, Integer||2||31|
The new dataset was recorded in the apartment of an elderly couple within the residential facilities of LACE Housing888http://lacehousing.org as part of the ENRICHME project. It contains one month of sensor data with five types of readings (temperature, light, energy consumption, motion and door activation), corresponding to approximately 33,000 entries in total. The sensors covered most of the apartment area, recording data from the entrance, the kitchen, the living room, the main bedroom and the bathroom. Fig. 11 illustrates the approximate sensors’ position and area coverage. The first three weeks of the dataset were used for training, while the last week for testing. Table I summarizes the locations, the sensors and the general characteristics of the recorded datasets. 999Datasets are publicly available at LCAS website: ENRICHME https://lcas.lincoln.ac.uk/wp/lace-house-domotic-sensors-dataset/ and LCAS https://lcas.lincoln.ac.uk/wp/research/data-sets-software/l-cas-domotic-sensors-dataset/
7.2 Performance of Wavelet-based Models
In the following subsections, we first present an empirical method to select the best parameters of our wavelet-based sensor model (Sec. 3.3), and then use the latter to predict the expected sensor output in our datasets. Our wavelet-based sensor modelling system is available 101010https://github.com/LCAS/wtfacts as a ROS action server. This sofware allows creation, management and querying multiple binary models.
7.2.1 Model Parameters Selection
A key step for the compact representation of sensor data with our new model is the selection of the mother wavelet. There are several methods to do this, but in general it is common practice to choose a wavelet that better describes a signal through minimization of a given parameter. Here we propose to minimizes the RMSE of the reconstructed signal.
We tested a set of wavelets from four different families, analysing different motion detection sequences as input signals from the sensors used to estimate the activity level. We compared the Daubechies, Haar, Biorthogonal and Reverse Biorthogonal wavelet families by accurately reconstructing signals with a large number of coefficients (i.e. low threshold ). The signals were one-month long sequences from the L-CAS dataset, transformed using a 1-level DWT decomposition.
We used the smallest coefficient threshold that produced a non perfect reconstruction in all variants.
Among the reconstructed sequences, the ones using the Reverse Biorthogonal family produced the lowest RMSE, when compared to the original signals. Fig. 12 shows different RMSE values using wavelets from the Reverse Biorthogonal family. The best performance was obtained using the rbio3.1 wavelet, with small differences from other wavelets of the same family (i.e. RMSE ).
After selecting the mother wavelet, the second step is to choose a threshold level for the coefficients. As anticipated in Sec. 3.2, this threshold determines a subset of meaningful coefficients, which should be as few as possible but also enough to reconstruct the original signal with good approximation. Because we are dealing with binary sensors, using a subset of coefficients introduces an error in the reconstruction, since the inverse transform generates non-binary values. We therefore discretize the reconstructed signal into a binary one, and measure the RMSE between the latter and the original signal. This process can be observed in Fig. 13, where the green subplot is the (non-binary) inverse transform of the original signal (red subplot), and the yellow one is the final binary prediction.
Fig. 14 illustrates the trade-off between the fidelity of our wavelet model representation (in terms of RMSE) and its size (as number of coefficients) for the Entrance motion detector in the L-CAS dataset. The blue line shows the decreasing number of coefficients as the threshold increases. The red line shows instead the increasing reconstruction error for the same threshold increase. We therefore chose the lowest threshold () across all the wavelet sensor models, allowing full reconstruction of all the signals in the training dataset.
We can thus reconstruct these signals using a small subset of coefficients and the inverse transform, discretized into binary values to obtain the original sensor output. Our wavelet-based model in (7) is described therefore by the subset of non-thresholded coefficients , the mother wavelet rbio3.1, the decomposition level , the threshold , the number of samples , the sampling frequency Hz. and the time reference s (POSIX time).
7.2.2 Model Training and Forecasting
We divided our datasets (see Table I) into two folds: one for training and one for testing the prediction. In the L-CAS dataset, we used the first three months of sensor data for training and then one week for testing. The ENRICHME dataset had a smaller number of entries, so we used three weeks for training and one week for testing.
In order to evaluate the prediction quality of our wavelet sensor model, we compared it to another tool called Frequency Map Enhancement (FreMEn) , which was originally developed for robotics applications but then applied also to smart-home sensors . FreMEn is a method that allows to model periodic changes of the environment using Fourier-based spectral analysis. It considers the probability of the environment’s state to be a function of time, represented by a (compressed) combination of harmonic components. The problem of Fourier-based methods though is that they are usually not suitable for describing sparse (i.e. non periodic) or very short events, at least not without considering a very large number of harmonics, which is impractical for many applications. In these experiment we aim to demonstrate how the wavelet-based model overcomes some of those limitations in delivering more reliable sensor predictions.
To start with, Tab. II presents some statistics of the predictions in the L-CAS dataset. For all the considered metrics, we can see that our new wavelet model clearly outperforms the frequency-based one. In particular, the wavelet model performs much better in terms of accuracy. Tab. III presents also some results on the ENRICHME dataset. In this case, the precision of FreMEn is slightly higher than our wavelet model, probably due to the periodic nature of the activities in the considered scenario. FreMEn indeed captures all the most relevant frequency components, so the predicted activations can be very precise (i.e. high number of true positives). However, for the recall, which considers the correct predictions over the total number of real activations, we can observe a significant improvement of the wavelet models compared to FreMEn, since the latter is not able to predict some of the sensor activations. This improvement is further confirmed by the F1 score and the accuracy, also shown in the same table.
|Presence detectors||Sensor Average|
|F1 score (%)||W||59.9||67.5||61.7||52.5||63.2|
|Presence detectors||Door sensors||Sensor Average|
|Sensor Location||Bathroom||Bedroom||Kitchen||Living room||Entrance||Fridge|
The wavelet model can also capture very short peaks of sensor signal. Fig. 13
illustrates the real temporal evolution of a sensor (red), the activation probability and prediction computed by FreMEn (blue and purple, respectively), the output of our wavelet model (green) and its binarized version (yellow). Due to the limitations of the frequency-only representation, FreMEn fails to reproduce the original sensor data, whereas our wavelet model provides a reasonably good approximation of it. The improvement can be further appreciated in Fig.15, where the FreMEn and wavelet models of the same sensor are compared over a week-time period, showing that the average daily activation of the sensor is better predicted by our model.
7.3 Performance of Activity Representation
In the following sub-sections we illustrate the performance of our system to represent human activities using the normalized entropy method in Sec. 4 and comparing the expected levels of activity to the actual ones.
7.3.1 Real vs. Predicted Entropies
We compared the entropies of human activity predicted by our wavelet model with the actual ones computed on both datasets.
We used three popular metrics to measure the statistical similarity between these two entropies: RMSE, correlation coefficient, and explained variance.
Table IV illustrates the good performance of our solution in predicting the entropy of human activities, showing better results than a FreMEn-based approach. We can also see that the entropy predicted by our wavelet model is slightly better for the ENRICHME dataset compared to the L-CAS dataset (i.e. lower RMSE; higher correlation and explained variance). However, for both cases, our results confirms that real and predicted entropies are reasonably similar and, therefore, that the wavelet-based model is suitable to forecast the level of activity in the environment.
7.3.2 Examples of Activity Forecasting
As explained in Sec. 4, human activities can be represented by the normalized entropy of the environment. Fig. 16 illustrates two examples of such entropy calculated from the real sensors and predicted by our wavelet-based model. In particular, the red graph shows the real normalized house entropy (as percentage) based on the available sensor setups. The blue graph is the predicted entropy at the same time, using the wavelet models of our sensors.
Fig. 15(a) is based on the ENRICHME dataset, collected in the relatively quiet apartment of an elderly couple. The figure refers to a typical morning of the two residents. The predicted entropy of their activities differs from the real one of less than 10%, with only two significant exceptions: in the morning, around 10:00, the activity’s level was higher than expected (about 20% error between real and predicted entropies); a little later, around 11:30, the real activity’s entropy decreased sharply a few minutes after the usual time (still about 20% error). These differences between real and predicted data, however, are understandable under normal variations of the resident’s schedule, which cannot be predicted by our model. It is worth to notice that the latter is able to predict a very sharp transition, where the activity’s entropy goes from high to no activity at all. This shows the capability of our system to consider high-frequency elements in its wavelet-based model.
Fig. 15(b) refers instead to the activity of a non-typical Friday afternoon in the L-CAS offices. The real entropy (red) shows that it was a particularly busy day, with a high activity level for most of the time. However, a significant decrease of the entropy between 18:00 and 19:00, when most of the researchers left the office, is followed by another increase between 19:00 and 20:00, when some people came back. The activity remained then relatively high for the rest of the evening, which was unusual. The entropy’s prediction (blue) is able to capture several important trends of the activity levels, including a few small negative peaks between 17:00 and 18:00 hours, which are probably due to some researchers leaving the office, and the sharp decrease around 18:00 hours, when most of them left. Our model captures also some of the evening activities and the entropy’s increase between 19:00 and 20:00. Although after this time there a significant difference between real and predicted entropies, due to the unusual presence of people on a Friday night, the general trends of the activity’s entropy are correctly captured by our prediction system.
7.4 Performance of Anomaly Detection
The normalized entropy computed by our system can be used indeed as a time series for unsupervised anomaly detection. Here we evaluate our HMLN-based anomaly detector against two state-of-the-art unsupervised methods from a previous statistical framework . We consider in particular the following anomaly detectors111111See the open-source framework for real-time anomaly detection – https://github.com/MentatInnovations/datastream.io:
Gaussian1D – A frequentist anomaly detection method that assumes the intput data is gaussian, searching for low likelihood values.
LOFEstimator – This method relies on local deviations of the density of a given sample with respect to its neighbors. It is local in the sense that the anomaly score depends on how isolated the object is from the surrounding neighborhood.
To compare our HLMN to the above methods, we first count the number of anomalies that each detector has in common with the other two. The results are summarized in Table V and VI for the L-CAS and ENRICHME datasets, respectively. For a fair comparison, the tables include also a variant of our method (HMLN*) that does not implement any expert rule, but considers only statistical anomalies based on activity entropy. We can see that all the anomalies reported by the HMLN* with no rules are also reported by the original HMLN, but not the opposite, as expected. The results show also that our HMLN approach shares a significant number of detections with the other two statistical methods. In particular, our solutions enable a more balanced detector that captures a reasonable number of anomalies from both Gaussian1D and LOFEstimator.
To identify the best one among these detection systems, but lacking a consistent and reliable annotation of true anomalies, we used the method proposed by Lamiroy & Sun 
to estimate precision and recall, and from these compute the F1 score. Although not accurate in absolute terms, this approach has been shown to be useful for ranking different binary classifiers in absence of ground-truth. Fig.17 summarizes our results for the two datasets. In particular, if no expert rules are considered (HMLN*, Fig. 16(a)), our approach performs always better than the other two methods. If the rules are taken into account though (HMLN, Fig. 16(b)), the relative performance of our anomaly detector increases for the ENRICHME dataset, but decreases for the L-CAS one. The reason of such change is that our expert rules were specifically designed for the AAL scenarios in the former dataset. This shows indeed that it is possible to ’tune’ the sensibility of our anomaly detection system in case additional expert knowledge is available, which is a desired feature in many applications.
8 Conclusions and Future Work
This paper presented a new approach for wavelet-based temporal modelling of smart binary sensors, which we used to forecast levels of human activity in dynamic indoor environments. We also proposed an original application of HMLNs combining real and predicted entropies of human activity with expert rules to detect potential anomalies. Our solutions have been evaluated using two large public datasets, one of which newly collected from a real elderly home, to demonstrate their effectiveness.
Although the proposed wavelet temporal model can be applied to any arbitrary signal, our current implementation focused only on binary sensor data, partly because it simplifies the subsequent entropy-based representation of human activities. It remains to be studied how analog smart sensors (e.g. light, temperature) can also be integrated and exploited by our system.
Finally, despite the flexibility of HMLNs, there are still limitations in the way logic rules are formulated and their weights learned, which requires particular attention and fine tuning to guarantee the convergence of the training process. Also, the time required by the latter grows exponentially with the number and complexity of the rules, which can be a problem in case a richer spectrum of human activities and sensor data is considered. Possible alternatives combining deep neural networks and symbolic representations, like Logic Tensor Networks, could potentially overcome some of these problems and enable more powerful inference systems for anomaly detection.
The research leading to these results has received funding from the EC H2020 Programme under grant agreement No. 643691, ENRICHME.
-  Y. Sun, B. Leng, and W. Guan, “A novel wavelet-svm short-time passenger flow prediction in beijing subway system,” Neurocomputing, vol. 166, pp. 109 – 121, 2015.
-  J. Galiana-Merino, C. Pla, A. Fernandez-Cortes, S. Cuezva, J. Ortiz, and D. Benavente, “Environmentalwavelettool: Continuous and discrete wavelet analysis and filtering for environmental time series,” Computer Physics Communications, vol. 185, no. 10, pp. 2758 – 2770, 2014.
-  Y.-A. L. Borgne, S. Santini, and G. Bontempi, “Adaptive model selection for time series prediction in wireless sensor networks,” Signal Processing, vol. 87, no. 12, pp. 3010 – 3020, 2007, special Section: Information Processing and Data Management in Wireless Sensor Networks.
N. Bellotto, M. Fernandez-Carmona, and S. Cosar, “ENRICHME integration of
ambient intelligence and robotics for AAL,” in
Wellbeing AI: From Machine Learning to Subjectivity Oriented Computing (AAAI 2017 Spring Symposium). AAAI, March 2017.
-  M. Fernandez-Carmona, S. Cosar, C. Coppola, and N. Bellotto, “Entropy-based Abnormal Activity Detection Fusing RGB-D and Domotic Sensors,” in IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI), 2017.
J. Wang and P. Domingos, “Hybrid Markov Logic Networks,” in
23 National Conference on Artificial Intelligence, ser. AAAI’08, vol. 2. AAAI Press, 2008, pp. 1106–1111.
-  K. Chodorow, MongoDB: The Definitive Guide. O’Reilly Media, 2013.
-  C. Coppola, T. Krajnik, T. Duckett, and N. Bellotto, “Learning temporal context for activity recognition,” in 2016 European Conference on Artificial Intelligence (ECAI), 2016, pp. 107–115.
-  P. Chen, T. Pedersen, B. Bak-Jensen, and Z. Chen, “Arima-based time series model of stochastic wind power generation,” IEEE Transactions on Power Systems, vol. 25, no. 2, pp. 667–676, May 2010.
-  H. Zou and Y. Yang, “Combining time series models for forecasting,” International Journal of Forecasting, vol. 20, no. 1, pp. 69–84, jan 2004.
-  C. Xie, A. Bijral, and J. L. Ferres, “NonSTOP: A NonSTationary Online Prediction Method for Time Series,” IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1545–1549, oct 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8450033/
R. J. Povinelli, M. T. Johnson, A. C. Lindgren, and J. Ye, “Time series classification using gaussian mixture models of reconstructed phase spaces,”IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 779–783, June 2004.
-  N. H. Ghassemi and M. P. Deisenroth, “Analytic long-term forecasting with periodic gaussian processes,” in AISTATS, 2014, pp. 303–311.
-  A. Ihler, J. Hutchins, and P. Smyth, “Adaptive event detection with time-varying poisson processes,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. New York, NY, USA: ACM, 2006, pp. 207–216.
-  T. Krajnik, J. P. Fentanes, G. Cielniak, C. Dondrup, and T. Duckett, “Spectral analysis for long-term robotic mapping,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014, pp. 3706–3711.
-  F. Jovan, J. Wyatt, N. Hawes, and T. Krajník, “A poisson-spectral model for modelling temporal patterns in human data observed by a robot,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2016, pp. 4013–4018.
-  R. Maity, M. Suman, and N. K. Verma, “Drought prediction using a wavelet based approach to model the temporal consequences of different types of droughts,” Journal of Hydrology, vol. 539, pp. 417 – 428, 2016.
-  A. J. Conejo, M. A. Plazas, R. Espinola, and A. B. Molina, “Day-ahead electricity price forecasting using the wavelet transform and arima models,” IEEE Transactions on Power Systems, vol. 20, no. 2, pp. 1035–1042, May 2005.
-  B. Ayrulu-Erdem, B. Barshan, B. Ayrulu-Erdem, and B. Barshan, “Leg Motion Classification with Artificial Neural Networks Using Wavelet-Based Features of Gyroscope Signals,” Sensors, vol. 11, no. 2, pp. 1721–1743, jan 2011.
S. Majumder, K. Jilenkumari Devi, and S. K. Sarkar, “Singular value decomposition and wavelet-based iris biometric watermarking,”IET Biometrics, vol. 2, no. 1, pp. 21–27, mar 2013.
-  O. Aran, D. Sanchez-Cortes, M. T. Do, and D. Gatica-Perez, “Anomaly Detection in Elderly Daily Behavior in Ambient Sensing Environments,” Human Behavior Understanding, pp. 51–67, 2016.
-  E.-E. Steen, T. Frenken, M. Eichelberg, M. Frenken, and A. Hein, “Modeling individual healthy behavior using home automation sensor data: Results from a field trial,” JAISE, vol. 5, pp. 503–523, 2013.
-  G. Okeyo, L. Chen, and H. Wang, “Combining ontological and temporal formalisms for composite activity modelling and recognition in smart homes,” Future Generation Computer Systems, vol. 39, pp. 29–43, 2014.
-  J. Soulas, P. Lenca, and A. Thépaut, “Unsupervised discovery of activities of daily living characterized by their periodicity and variability,” Engineering Applications of Artificial Intelligence, vol. 45, pp. 90–102, 2015.
-  P. Chahuara, A. Fleury, and M. Vacher, “On-line Human Activity Recognition from Audio and Home Automation Sensors: comparison of sequential and non-sequential models in realistic Smart Homes,” Journal of ambient intelligence and smart environments, vol. 8, no. 4, pp. 399–422, 2016.
-  A. Godfrey, M. Leonard, S. Donnelly, M. Conroy, G. ÓLaighin, and D. Meagher, “Validating a new clinical subtyping scheme for delirium with electronic motion analysis,” Psychiatry Research, vol. 178, no. 1, pp. 186–190, 2010.
-  D. Xu, Y. Yan, E. Ricci, and N. Sebe, “Detecting Anomalous Events in Videos by Learning Deep Representations of Appearance and Motion,” Computer Vision and Image Understanding, vol. 156, pp. 117–127, 2016.
-  R. Leyva, V. Sanchez, and C.-T. Li, “Video Anomaly Detection With Compact Feature Sets for Online Performance,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3463–3478, jul 2017.
-  T. Li, H. Guan, J. Ma, G. Zhang, and K. Liang, “Modeling travel mode choice behavior with bounded rationality using Markov Logic Networks,” Transportation Letters, vol. 11, no. 6, pp. 303–310, 2019.
-  A. Vrečko, A. Leonardis, and D. Skočaj, “Modeling binding and cross-modal learning in Markov logic networks,” Neurocomputing, vol. 96, pp. 29–36, 2012.
-  J. Jiang, X. Li, C. Zhao, Y. Guan, and Q. Yu, “Learning and inference in knowledge-based probabilistic model for medical diagnosis,” Knowledge-Based Systems, vol. 138, pp. 58–68, 2017.
-  Y. Liu, C. Ouyang, and J. Li, “Ensemble method to joint inference for knowledge extraction,” Expert Systems with Applications, vol. 83, pp. 114–121, 2017.
-  T. Sztyler, G. Civitarese, and H. Stuckenschmidt, “Modeling and reasoning with problog: An application in recognizing complex activities,” in 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), March 2018, pp. 259–264.
-  K. S. Gayathri, S. Elias, and B. Ravindran, “Hierarchical activity recognition for dementia care using markov logic network,” Personal Ubiquitous Comput., vol. 19, no. 2, pp. 271–285, Feb. 2015.
-  S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning,” Pattern Recognition, vol. 58, pp. 121 – 134, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320316300267
-  T. van Kasteren, G. Englebienne, and B. Kröse, “Human activity recognition from wireless sensor network data: Benchmark and software,” in Activity Recognition in Pervasive Intelligent Environments, ser. Ambient and Pervasive Intelligence. Atlantis Press, 2010.
-  S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht, “Smart*: An open data set and tools for enabling research in sustainable homes,” in 1st KDD Workshop on Data Mining Applications in Sustainability, 2012.
-  E. Hewitt and R. E. Hewitt, “The Gibbs-Wilbraham Phenomenon: An Episode in Fourier Analysis,” Archive for History of Exact Sciences, vol. 21, no. 2, pp. 129–160, 1979.
-  M. Vetterli and C. Herley, “Wavelets and filter banks: theory and design,” IEEE Transactions on Signal Processing, vol. 40, no. 9, pp. 2207–2232, Sep 1992.
A. Phinyomark, C. Limsakul, and P. Phukpattaranont, “Evaluation of mother wavelet based on robust emg feature extraction using wavelet packet transform,” in13th International Annual Symposium on Computational Science and Engineering, 2009, pp. 333–339.
-  L. Lei, C. Wang, and X. Liu, “Discrete wavelet transform decomposition level determination exploiting sparseness measurement,” International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering, vol. 7, no. 9, pp. 1182 – 1185, 2013.
-  A. Fathi and A. R. Naghsh-Nilchi, “Efficient image denoising method based on a new adaptive wavelet packet thresholding function,” IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 3981–3990, Sept 2012.
-  A. A. Nashat and N. M. H. Hassan, “Image compression based upon wavelet transform and a statistical threshold,” in 2016 International Conference on Optoelectronics and Image Processing (ICOIP), June 2016, pp. 20–24.
-  M. Richardson and P. Domingos, “Markov logic networks,” Machine learning, vol. 62, no. 1-2, pp. 107–136, 2006.
-  D. Goldberg and Y. Shan, “The Importance of Features for Statistical Anomaly Detection,” 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15), pp. 1–6, 2015.
-  K. Cubit, G. Farrell, A. Robinson, and M. Myhill, “A survey of the frequency and impact of behaviours of concern in dementia on residential aged care staff,” Australasian Journal on Ageing, vol. 26, no. 2, pp. 64–70, 2007.
-  L. Smirek, G. Zimmermann, and D. Ziegler, “Towards Universally Usable Smart Homes – How Can MyUI , URC and openHAB Contribute to an Adaptive User Interface Platform ?” in CENTRIC 2014 : The Seventh International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services, no. c, Nice, France, 2014, pp. 29–38.
-  M. Fernandez-Carmona and N. Bellotto, “On-line inference comparison with markov logic network engines for activity recognition in AAL environments,” in IEEE International Conference on Intelligent Environments. IEEE, September 2016.
-  T. Niemueller, G. Lakemeyer, and S. S. Srinivasa, “A generic robot database and its application in fault analysis and performance evaluation,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2012, pp. 364–369.
-  J. Fairbanks, D. Ediger, R. McColl, D. A. Bader, and E. Gilbert, “A statistical framework for streaming graph analysis,” in 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), Aug 2013, pp. 341–347.
-  B. Lamiroy and T. Sun, “Computing precision and recall with missing or uncertain ground truth,” in Proceedings of the 9th International Conference on Graphics Recognition: New Trends and Challenges, ser. GREC’11. Berlin, Heidelberg: Springer-Verlag, 2013, pp. 149–162.
-  L. Serafini and A. A. Garcez, “Logic tensor networks: Deep learning and logical reasoning from data and knowledge,” in 11th Int. Workshop on Neural-Symbolic Learning and Reasoning (NeSy16), 2016.