Energy Prediction using Spatiotemporal Pattern Networks

02/03/2017 ∙ by Zhanhong Jiang, et al. ∙ Iowa State University of Science and Technology 0

This paper presents a novel data-driven technique based on the spatiotemporal pattern network (STPN) for energy/power prediction for complex dynamical systems. Built on symbolic dynamic filtering, the STPN framework is used to capture not only the individual system characteristics but also the pair-wise causal dependencies among different sub-systems. For quantifying the causal dependency, a mutual information based metric is presented. An energy prediction approach is subsequently proposed based on the STPN framework. For validating the proposed scheme, two case studies are presented, one involving wind turbine power prediction (supply side energy) using the Western Wind Integration data set generated by the National Renewable Energy Laboratory (NREL) for identifying the spatiotemporal characteristics, and the other, residential electric energy disaggregation (demand side energy) using the Building America 2010 data set from NREL for exploring the temporal features. In the energy disaggregation context, convex programming techniques beyond the STPN framework are developed and applied to achieve improved disaggregation performance.



There are no comments yet.


page 12

page 15

page 18

page 21

page 24

page 26

page 27

page 28

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Energy prediction problems are essential for operating, monitoring, and optimizing (efficiency, cost) in diverse energy systems, from the supply side (e.g., wind energy, solar energy, power systems, battery) to the demand side (e.g., load monitoring, usage of electric vehicles, building energy management). Numerous studies are being carried out in terms of predicting the energy generation/consumption using time-series data ziel2016forecasting ; liu2015highly ; alessandrini2015analog ; zuluaga2015short ; wang2015study ; garshasbi2016hybrid

. For instance, Kalman filtering, wavelet packet transforms, and least square support vector machines are used to predict wind power performance

zuluaga2015short ; wang2015study , while an analog ensemble method is applied to forecast solar power alessandrini2015analog . Liu et al. liu2015highly

predicts remaining state of charge of electric vehicle batteries based on predictive control theory. Hybrid genetic algorithms and Monte Carlo simulation approaches are applied to predict energy generation and consumption in net-zero energy buildings

garshasbi2016hybrid . For modern energy systems, a large number of subsystems is usually involved, for example, hundreds of wind turbines are closely collocated in a wind farm where the wind resource is similar and the conditions of them are analogous in terms of the power transmission to the power system. As a result, prediction of wind turbine output is related among each of them, and the characteristics of spatial interactions can be potentially applied for prediction jiang2015understanding and design optimization. The prediction approaches discussed above can be viewed as methods of exploring temporal relationships. Spatial and temporal relationship widely exists in energy systems jain2014forecasting ; liu2010prediction ; jung2014current ; kwon2010uncertainty , yet spatiotemporal features are less commonly leveraged for energy prediction problems. The exploration of such features has been shown efficient in wind speed forecasting problems tascikaraoglu2016exploiting ; jung2014current ; tascikaraoglucompressive .

To facilitate the energy prediction for energy systems with both spatial and temporal characteristics, probabilistic graphical models (PGM, including a variety of models described by conditional dependence structures, so-called graphs, including Bayesian networks and undirected/directed Markov networks, can be used to deal with dynamics systems and relational data


), can possibly be employed as the spatiotemporal interactions are naturally suited for graph representations and can be evaluated by the associated probabilities. Bayesian networks are a type of PGM that captures causal relationships using directed edges


, where the overall joint probability distribution of the network nodes (variables) is computed as a product of the conditional distributions (factors) defined by the nodes in the network. However, prediction problems are not straightforward for Bayesian networks, as they only encode node-based conditional probabilities, and the approximation of the joint distribution using node-based structures is often intractable

sarkar2016pgm . This is because a certain directed acyclic graphical structure may not allow for easy and exact computation of certain probabilities related to inference questions.

Markov models, as a class of statistical models, have been widely applied to different domains, e.g., natural language processing and speech recognition leek1997information

. These models are shown to be efficient in identifying the probabilistic dependencies among random variables in both directed and undirected manner. Hidden Markov Models (HMMs) have been particularly successful for learning temporal dynamics of an underlying process

rabiner1989tutorial . Several modifications for HMMs have been proposed, such as integrated HMM (IHMM) beal2001infinite which integrated several parameters to three hyper-parameters to model countably infinite hidden state sequences, integrated hierarchical HMM (IHHMM) heller2009infinite extended HMMs to an infinite number of hierarchical levels, and wakabayashi2012forward applied a forward-backward algorithm to reduce model complexity through the order of operations. However, Markov Models with hidden states usually rely on iterative learning algorithms that may be computationally expensive. To alleviate such issues, symbolic dynamic filtering (SDF) was proposed ray2004symbolic ; rajagopalan2006symbolic based on the concepts of symbolic dynamics and probabilistic finite state automata (PFSA). Several improvements related to coarse graining of continuous variables SSS13 , state splitting and merging techniques for PFSA mukherjee2014state , efficient inference algorithms sarkar2013symbolic , and hierarchical model learning akintayo2015symbolic

have been proposed over the last decade within the SDF framework. SDF has been shown to be extremely efficient for anomaly detection and fault diagnostics of various complex systems, such as gas turbine engines 

SSM12 , shipboard auxiliary systems SSV14 , nuclear power plants JGSRE11 , coal gasification systems CSGR08 and bridge monitoring process LGLPS17 .

For the purpose of addressing prediction problems in disparate energy systems, this work presents a new data-driven framework (namely spatiotemporal pattern networks, or STPN) to leverage the spatiotemporal interactions of energy systems for prediction. Built on SDF, a STPN aims to capture the spatiotemporal characteristics of complex energy systems, and implement prediction at both spatial and temporal resolutions. For validation, two representative cases are proposed using the proposed approach, the first is taken from the energy supply side, wind power prediction in large-scale wind farm, and the second is from the energy demand side, energy disaggregation (also as non-intrusive load monitoring (NILM), a well-established problem that involves disaggregating the total electrical energy consumption of a household into its constituent load components without the necessity for extensive metering installations on individual household or appliance GH92 ; MZKR11 ; cominola2017hybrid ).

Contributions: First, a novel data-driven method for energy prediction based on the STPN framework is proposed and the concepts of interests and relevance are established. Second, two typical case studies based on wind turbine power (supply side energy) and residential building energy disaggregation (demand side energy) are performed for validating the proposed scheme. For wind turbine power prediction, the spatiotemporal characteristics between different wind turbines are identified, while the complex coupled temporal features for home energy disaggregation. A STPN-based convex programming is presented in this work in order to improve energy disaggregation performance. We also present a comparative study of energy prediction performance of the proposed technique for both cases with other state-of-the-art methods.

The remaining sections are outlined as follows. In Section 2 some necessary background of SDF is presented as well as the concepts of a -Markov machine. While the prediction approach based on STPN is given in Section 3, two typical case studies, i.e., supply side (wind turbines) and demand side (NILM), for validating the proposed framework are presented in Section 4 and Section 5, respectively. In Section 6, conclusive remarks and future research directions beyond the existing results are offered.

2 Symbolic Dynamic Filtering and -Markov Machines

This section gives an essential background on symbolic dynamic filtering necessary to characterize the proposed prediction method. We refer interested readers to SSS13

for more details. SDF is built upon the relevant concepts of discrete dynamic systems in which discretization and symbolization are critical steps to convert collected or observed continuous data to discrete symbol sequences. Therefore, the dynamic systems can be studied in deterministic or probabilistic settings in terms of symbolic space by using language-theoretic approaches, e.g., shift-maps and sliding block codes. The simplest approaches for partitioning are the uniform partitioning and maximum entropy, while these two methods were mainly applied to simple dynamic systems with data of less variance. The state-of-the-art partitioning or discretization approaches include symbolic false nearest neighbor partitioning (SFNNP) 

PhysRevLett.91.084102 , wavelet transform SSS13 , and Hilbert-transform-based analytic signal space partitioning (ASSP) SR08 . Recently, a supervised partitioning scheme, i.e., maximally bijective discretization (MBD) SSS13 has been proposed for modeling and analyzing complex dynamic systems. Unlike the other methods, MBD is able to maximally preserve the input-output relationship originating from the continuous domain after discretization in dynamical systems.

After discretization of the time-series data in continuous domain, symbolization is implemented subsequently for establishing the

-Markov machines. For SDF, a critical assumption is that we can approximate any symbol sequence generated by a time series data as a Markov chain of order

(which is a positive integer). Therefore, such a Markov chain is called -Markov machine, which is used to establish the model for each time series data due to the temporal features associated with the symbol sequence. Some relevant definitions are more formally given as follows.

Definition 2.1.

SSV14 (DFSA) A deterministic finite state automaton (DFSA) is a 3-tuple where:

  1. is a set of finite size for the symbol alphabet and ;

  2. is a set of finite size for states and ;

  3. is the mapping function for state transition;

while represents the collection of all finite symbol sequences from including the empty sequence .

Definition 2.2.

SSV14 (PFSA) A probabilistic finite state automaton (PFSA) is an extension to probabilistic setting from a DFSA as a pair , i.e., the PFSA is a 4-tuple , where:

  1. , and have the same definitions as in Definition 2.1;

  2. is defined as a symbol generation function, i.e., probability morph function which are such that , where indicates the probability of the symbol occurring with the state .

Definition 2.3.

SSV14 (D-Markov) A D-Markov machine is an extension of a PFSA where the previous symbols form a state as defined by:

  1. signifies the depth of a Markov machine;

  2. is a set of finite size for states with , i.e., each state in a Markov machine is identified by some equivalence class of symbol strings whose length are with symbols in ;

  3. signifies the state transition function such that if , then there exist any two symbols and such that and .

Remark 2.1.

Based on the Definition 2.3 it can be concluded that a D-Markov machine is naturally a stationary stochastic process , in which the probability of occurrence of a new symbol is determined by the last symbols, i.e., .

We denote by the state transition matrix and each entry of the matrix demonstrates the transition probability from one symbolic state to another. We give a simple example to illustrate this. Let the state of one dynamical system be such that the entry, i.e., of the matrix indicates the probability of as given that the previous state was , i.e.,

Moreover, one can model individual dynamical system making use of -Markov machines. Because a -Markov machine cannot capture the interaction dependencies for multiple systems or sub-systems in a large complex system, it has recently been extended to a x-Markov machine, which was originally developed in order to obtain the internally causal dependencies among different systems or sub-systems. Different from correlation-based analysis, such a model can efficiently build up and fairly generalize the causal dependencies C14 . The following shows the formal definition of x-Markov machine.

Definition 2.4.

SSV14 (xD-Markov) Let and be the PFSAs which correspond to symbol streams and respectively. Therefore a -Markov machine is defined as a 5-tuple such that:

  1. represents the alphabet set of symbol sequence

  2. is the state set which corresponds to symbol sequence

  3. represents the alphabet set of symbol sequence

  4. gives the state transition mapping that maps the transition in symbol sequence from one state to another based on occurrence of a symbol in

  5. is the symbol generation matrix of size ; the entry of denotes the probability of obtaining the symbol of while making a transition from the state of

Therefore, it can be observed that one can obtain the probability of a new symbol occurring after the previous symbols given for an individual symbol sequence. On the other hand, in order to know the probability of a new symbol occurring in a symbol sequence with the last symbols given in another different symbol sequence, a x-Markov machine can be applied correspondingly. Equivalently speaking, given a x-Markov machine, the causal dependency of one symbol sequence on another symbol sequence can be captured.

Figure 1: Illustration of generation of a PFSA using (a) maximal bijectively discretization and (b) maximum entropy partitioning for system A.

3 Spatiotemporal Pattern Network

This section mainly presents how to construct the spatiotemporal pattern network (STPN) for two dynamical systems, and , based on the concepts of SDF introduced above. First we start with data partitioning/discretization and symbolization followed by the details of STPN construction.

3.1 Discretization and Symbolization

Suppose there are two different dynamic systems and . In real-world problems, such as wind power prediction, and can represent two different wind turbines in a large wind power farm. Alternatively, in residential home energy disaggregation, and could represent HVAC system electricity consumption and that of all appliances. For each system, there are various measured variables and typically some key observations are picked to establish the model and analyze. For example, for a wind turbine, wind speed and wind power are those two key observations for power predictions. It is, however, noted that some other variables, e.g., wind direction and humidity possibly affect power such that these variables can also be taken into account. The first step to model dynamic systems in terms of symbolic dynamics is the data discretization. As mentioned above, there are many approaches that can be used; in this paper, maximally bijective discretization (MBD) is applied to the supply side dynamic systems (wind turbines) and maximum entropy partitioning is used in demand side dynamic systems (HVAC, appliances, etc.). The reason we select different methods is because of the difference of measured variables. For wind turbines, wind speed and wind power are chosen and their input-output relation in the continuous domain can be maximally maintained. However, for home energy disaggregation, the unique variable for each part of the home energy use is the energy consumption itself such that there is no input-output relation in the continuous domain.

3.2 Symbolic Modeling of Dynamical Systems and Interactions

Figure 1 shows the symbol sequence generation in the form of PFSA using two different methods, i.e., maximally bijective discretization and maximum entropy partitioning, respectively. As discussed before it has been acknowledged that a -Markov machine can be represented by a PFSA using a previous symbols to indicate one state. In this context, we take into consideration two different systems for addressing the quantification of their spatiotemporal or temporal relations. From Figure 2, the state transition matrices and show the self-relations of systems and respectively. Then the cross-state transition matrices and correspondingly represent the cause-effect relations from A to B and B to A respectively. However, it should be noted that such casual dependencies between systems and are not necessarily equivalent. For quantification of the relations in a -Markov machine, a x-Markov machine, atomic patterns (AP) and relational patterns (RP) were introduced in SSV14 , which can give more details. More formally, the entries of the cross-state transition matrices and can be expressed by:

where and . The above relations show that a cross-state transition matrix can be constructed from symbol sequences obtained from two different dynamical systems while each entry of each matrix signifies the transition probability from one state in the first dynamical system to another state in the second dynamical system. For instance, means the transition probability from state in the system to another state in the system .

Figure 2: Construction of STPN: Atomic patterns (APs) and relational patterns (RPs) formulation.

Moreover, we use an information metric in order to quantify the value of the atomic and relational patterns (in this work the relational pattern is the major concern). In this context, mutual information is a metric of interest introduced to address the quantification. For example, from Figure 2, we denote by and the atomic and relational patterns respectively associated with systems A to B. Formally, the atomic pattern of system A is expressed as follows:


Therefore, based on the quantity (defined using different entropy values as presented above), the temporal self-prediction capability of the system A can be correspondingly identified.

On the other hand, the mutual information for the relational pattern involved in systems A and B can be described as:


Hence, the quantity of identifies system A’s capability of predicting system B’s outputs and vice versa for . Furthermore, based on the mutual information, patterns can be assigned with weights such that some patterns with low mutual information may be rejected for simplifying the model. Interested readers can find more details in SSV14 .

Based on the above analysis, it has been shown that the proposed STPN in this paper can be an effective tool for capturing the spatiotemporal interactions between different dynamic systems. For validating such a data-driven method this paper offers two case studies in terms of supply side dynamic systems (i.e., wind turbines in a wind farm) and demand side dynamic systems (i.e., home electric energy disaggregation) to demonstrate the efficacy and effectiveness. The prediction process can be described as follows: Given a training data set in the continuous domain, we use partitioning methods to discretize and symbolize the data for running the x-Markov machine. The probability transition matrices are obtained for predictions in symbolic or continuous domains. For the symbolic prediction, we find out the most likely symbol sequence for system given another symbol sequence of system running the x-Markov model numerous times. While in continuous domain the prediction can be acquired based on the symbolic prediction using expectation as follows:


where, represents the expectation of energy at the instant, signifies the probability of symbol occurring at the instant after running numerous simulations of Monte Carlo Markov Chain, indicates the expectation of energy for the discrete bin labelled by symbol (suppose that in that bin there are discrete symbols).

The pseudocode of energy prediction based on STPN is as follows.

Input : Training data sets of systems , ( represents any system), depth of
Output : Predicted results
1 Discretize and symbolize the continuous data to ;
2 Calculate state transition matrices and mutual information by ;
3 Calculate the expected value of energy in the discrete bin;
4 Use Eqn. 1 to calculate the prediction results ;
Algorithm 1 Energy Prediction based on STPN
Figure 3: Geographical information of wind turbines under analysis which are located in California, between 35.28-35.33n and 118.09-118.17w
Figure 4: Representation of STPN for 12 wind turbines

4 Supply Side: Wind Turbines

4.1 Geographical information

In this subsection, a case study based on the supply side energy systems, i.e., wind turbines, is used for validating the data-driven method proposed in this work. The STPN framework is used in a wind turbine network in order to capture the causal dependencies among different wind turbines that can be regarded as sub-systems of a wind farm. This paper uses the 2006 Western Wind Integration data set obtained from NREL NREL06 to uncover causal dependencies which are vitally important to individual wind turbine power prediction in a mutual turbine-turbine setting. For establishing the STPN, twelve wind turbines (located in California) that have capacity factors in excess of 40% are chosen; their IDs can be identified as: 4494, 4495, 4496, 4497, 4423, 4424, 4425, 4426, 4427, 4361, 4313 and 4314 (labeled by 1-12) in this context and the capacity factors are between 41% and 45% approximately. For completeness, the geographical information of the wind turbines is also provided. The annual average wind velocity in the area where the considered turbines are located is around 9 , with an elevation from 1019 to 1207 m.

Figure 5: Discretization of a typical wind turbine systems using MBD
Figure 6: Symbol sequence plot for a typical wind turbine

As shown in Figure 3, twelve wind turbines are distributed in various locations, which can be identified as nodes in the STPN represented by Figure 4. From Figure 5 the relation between wind speed and wind power can be observed and the other wind turbines as well have the same pattern. The input-output relation involving a wind turbine is significant such that MBD enables the maximum preservation of their correspondence in the symbolic domain. The spatiotemporal patterns between different wind turbines and the very relational patterns between them can be found from the symbol sequences. Figure 6 shows an instance of symbol sequence for a wind turbine and it can be observed that most of the symbols are 1, 5, 8 and 9.

Figure 7: Mutual information of relational patterns for selected pairs of wind turbines.

4.2 Results and Discussion

The mutual information of RP between a pair of wind turbines is first to be investigate according to the state transition matrices generated by x-Markov machines. We set the depth as 1 for simplicity and one can increase the parameter. Therefore, it can be immediately known that the current state of one selected wind turbine depends only on the last state of another selected wind turbine. The effect of time lag on the mutual information between wind turbines is studied for addressing the temporal characteristics. The results in Figure 7 show that as the time lag increases the mutual information decreases correspondingly. Thus in this work one can maximize the causal dependencies between any two different wind turbines at time lag 1.

Figure 8: Spatiotemporal pattern network for the group of wind turbines

The spatial characteristics between two different wind turbines is also another critical factor in STPN. Wind turbines labeled by 5, 6, 7, 1, and 10 are chosen for the purpose of such an analysis. Figure 8 shows that the causal dependency between any two wind turbines reduces with the increment of geographical (spatial) distance between them along any direction. Figure 9 also illustrates that the metric based on mutual information for a pair of wind turbines with the Euclidean distance between them exhibits a generally decreasing trend. Consequently, in summary, based on both of these observations made, applying the metric based on mutual information is an effective technique to capture the spatial and temporal patterns in wind turbine systems.

Figure 9: A monotonically decreasing relationship for different pairs of wind turbines when spatial distances increase

Next, we evaluate the effectiveness of the STPN in revealing causal dependencies through wind power prediction. The symbolic and continuous prediction of one wind turbine power is based on the observed symbol sequence emerging from another turbine. According to the procedure of energy prediction described above, Figure 10 and Figure 11 show the symbol prediction results in which the predicted symbol sequences emerging from the wind turbine 5 under the observations of wind turbines 6 and 7 respectively are compared to the true symbol sequences emerging from the wind turbine 5. It is noted that the model is trained by the data from the first half-year of 2006 while tested by the second half-year data. From those two plots it can be observed that for most of time the proposed x-Markov machines have a strong prediction capability, while some errors may come from the transient symbols. Moreover from observation it can be found that the prediction by wind turbine 6 is slightly better than that by wind turbine 7 as implied by mutual information.

Figure 12 shows that the mean square error (MSE) is a function of spatial distance between any pair of wind turbines using wind turbines 5, 6, 7, 8, and 9 and it displays a monotonically increasing trend. The prediction capacity in terms of symbols using the proposed STPN has been shown. An example of energy prediction for wind turbine 5 in the continuous domain with the observation of symbol sequence for wind turbine 6 is shown here to validate the energy prediction method. The plot of Figure 13 shows that the major trend in the actual data can be caught quite well and accurately for the continuous domain prediction as the partitioning method MBD is effective in preserving the input-output relation. However, a finer discretization may improve the prediction result in the continuous domain even though that requires a larger amount of data and increases the computational complexity correspondingly.

In order to evaluate the proposed scheme in wind power prediction, in this work we compare the performing capabilities of the STPN framework and a quite popular approach, namely, the Hidden Markov Model (HMM) with mixture which is adapted from HMM to deal with multiple variables. A toolbox compatible with MATLAB murphy2013hidden is applied in this context. The results in Figure 13 have shown that the proposed prediction method based on STPN framework outperforms the HMM with mixture under visual inspection. Quantitatively, while the MSE for predicted power using HMM with mixture is , the MSE for predicted power using the proposed algorithm is . Therefore, it can be concluded that the STPN scheme in which causal dependencies between different wind turbines are captured is a quite effective technique in wind power prediction.

Figure 10: Symbolic prediction of wind turbine 5 behavior with the observation of wind turbine 6
Figure 11: Symbolic prediction of wind turbine 5 behavior with the observation of wind turbine 7
Figure 12: MSEs of prediction of wind turbine 5 power using observation from other turbines: As geographical (spatial) distance increases, MSE increases
Figure 13: Wind power prediction for wind turbine 5 under the observation of symbol sequence of wind turbine 6 using STPN and HMM with mixture

5 Demand Side: Non-Intrusive Load Monitoring

This subsection presents a second case study based on demand side energy systems; in particular, non-intrusive load monitoring (NILM) of electrical demand with the purpose of identifying electric load components for residential homes. As described in the above section, the STPN framework is used as well for electric load component disaggregation. In order to best identify the disaggregated energy usage corresponding to each electric energy consuming component from the total energy consumption, convex programming is applied here. This is necessary because for NILM there is no clear input-output relation with the result that–even though the STPN is used in this case study–the results obtained may not be optimal. Here, optimal means that the summation of all load components of residential home energy consumer adds up to the whole building electricity use. Therefore, with the prediction results by STPN, a convex programming based modification is introduced to achieve said optimal disaggregation.

5.1 Problem Description

For this case, the data set used for energy disaggregation is based on the Building America 2010 data set available from NREL hendron2010building . The data is for the hot and dry location of Bakersfield, California with ample of heating, ventilation, and air-conditioning (HVAC) in the summer and includes the whole building electric (WBE), which is the sum of HVAC, lights, appliances (APPL), and miscellaneous electric loads (MELS). The goal here is to apply the measured WBE time series to predict HVAC, LIGHTS, APPL, and MELS, respectively. It is noted that WBE is the only known variable and for each part of prediction one month data is adopted where the first three week data is used for training the model, while the fourth week for testing.

Convex Programming: Before stating the prediction results, the convex programming problem setup is formulated for completeness. Suppose that the results obtained by STPN framework are group truth for each part except WBE. Thus the optimization problem can be expressed by


where represent the decision variables to be determined, signify the prediction results obtained from STPN, is the known values of WBE, is the Euclidean norm between and .

The pseudocode of energy prediction based on STPN framework and convex programming is shown as follows. We use STPN+convex programming for reference of the combination of the STPN framework and convex programming technique throughout the rest of analysis.

Input : Training data sets , depth of
Output : Optimal results
1 Run all of steps in Algorithm 1;
2 Get results by STPN and solve the optimization problem in Eq. 2;
3 Obtain the optimal results ;
Algorithm 2 Energy Prediction using STPN+convex programming

Factorial Hidden Markov Model: Factorial Hidden Markov Model (FHMM) ZGMJ97 is an extension of Hidden Markov Models that parallelizes multiple Markov models in a distributed manner, and performs some task–related inference to arrive at predicted observation. The application of such models is done by representing each end–use as a hidden state that is modeled by multinomial distribution using discrete values, and then sum each appliance meter’s individual independent contribution to the expected observation (i.e., the total expected main meter value). AFAMAP JZKTJ12 variant of FHMM which includes the trends in the hidden states of FHMM have also been reported to be effective in the disaggregation task. In our application of FHMM, the number of hidden states are the number of testing appliances, while = 3 in order to keep the computational requirements low.

Combinatorial Optimization

: Combinatorial optimization (CO) 


algorithm is a heuristic scheme that attempts to minimize the

–norm of the total power at the mains and the sum of the power of the end–uses, given either single or multi–state formulation of the sum. The drawbacks of CO for disaggregation tasks are its sensitivity to transients and degradation with increasing number of devices or similarity in device characteristics.

We applied the algorithms as available in the non–intrusive load monitoring toolkit NJOHWAAM14 with an exact inference ZGMJ97 for the FHMM.

Figure 14: Mutual information between WBE and HVAC, WBE and LIGHTS, WBE and APPL, and WBE and MELS with the increment of time lag of 2 minutes in July, 2010
Figure 15: STPN using variables, WBE, HVAC, LIGHTS, APPL and MELS in July

5.2 Results and Discussion

For validation of the proposed energy prediction approach, two months, i.e., April and July, are selected to study the prediction performance accordingly. As the Building America 2010 data set has 1 hour sampling frequency and three weeks data is for training, such scale of data may not meet the requirement of data size for the construction of STPN. Building up STPN with not enough amount of data may result in the poor accuracy of causal dependencies between different variables. Therefore, a data reprocessing technique, i.e., upsampling is applied in this case and the upsampling fold is 30 such that the sampling frequency for the data set is 2 minutes.

First, we study the causal dependencies among these five variables by computing the mutual information. Figure 14 shows the variation of mutual information with respect to time lag in 2 minutes for addressing the temporal characteristics. The depth of xMarkov machine is still 1 such that the current symbol of any part of HVAC, LIGHTS, APPL and MELS depends only on the past one symbol of WBE. Different from the wind turbine systems, the causal dependencies between WBE and the other four load components have decreased little with an increase of time lag, which reflects that using WBE to predict other parts of energy consumption is temporally robust. However, it also shows that the causal dependency between WBE and HVAC in July is the maximum compared with those between WBE and other load components (i.e., LIGHTS, APPL, and MELS) such that the prediction of HVAC using WBE yields the best accuracy.

The results in Figure 15 show the causal dependencies quantified by mutual information among all of five variables. It can be observed that the causal dependency between HVAC and APPL is larger than that between HVAC and MELS as well as that between HVAC and LIGHTS. While the relations among LIGHTS, APPL and MELS can be seen to be quite significant due to the causal dependencies obtained in this context. In summary, this relational pattern network captures temporal interactions between different end uses that can be an effective technical tool for energy disaggregation.

Figure 16 shows the energy disaggregation of HVAC, LIGHTS, APPL and MELS using STPN and STPN+convex programming in April. In this month, the energy consumption of HVAC is most significant such that it accounts for the largest percentage of WBE. A strong prediction capabilities of STPN can be observed from the plots and based on that the STPN+convex programming is able to improve STPN performance, which is attributed to the constraint imposed in the convex programming. It can also be seen from Figure 17 that the total energy consumption by STPN without convex programming is worse than STPN+convex programming results and the optimal disaggregation appears to be achieved. However, the prediction performance for APPL and LIGHTS is slightly worse than HVAC and MELS because they account for a lower percentage of WBE, which is also evident as suggested by Figure 18.

Therefore, it can be implied that for energy disaggregation the more accurate prediction can be achieved when one load component (i.e., HVAC, LIGHTS, APPL, and MELS) accounts for a more significant percentage of WBE. It is seen from Figure 18 that the prediction for the last two days in the fourth week is worse though it is able to catch the trend, which may be attributed to the fact that on those two days some transient external factors, such as weather and occupancy, affect the energy consumption. A similar observation can be made from Figure 19 that the optimal disaggregation can be made via STPN+convex programming. For a direct visual inspection of the prediction capability difference, Figure 20 and Figure 21 reveal that STPN+convex programming outperforms STPN alone as for each part the energy consumption is predicted optimally. The fact that these two plots show an energy prediction difference by STPN or STPN+convex programming of less than 5% demonstrates efficacy and effectiveness of the proposed framework.

Figure 16: Energy prediction of HVAC, LIGHTS, APPL, and MELS in April 2010 using STPN, STPN+convex programming, FHMM, and CO separately shown in (b) for better visualization
Figure 17: Calculated WBE from disaggregated energy values in April 2010 using STPN, STPN+convex programming, FHMM and CO
Figure 18: Energy prediction of HVAC, LIGHTS, APPL, and MELS in July 2010 using STPN, STPN+convex programming, FHMM, and CO separately shown in (b) for better visualization
Figure 19: Calculated WBE from disaggregated energy values in July 2010 using STPN, STPN+convex programming, FHMM and CO
Figure 20: Energy prediction difference of HVAC, LIGHTS, APPL, and MELS in April 2010 among STPN, STPN+convex programming, FHMM and CO
Figure 21: Energy prediction difference of HVAC, LIGHTS, APPL, and MELS in July 2010 among STPN, STPN+convex programming, FHMM and CO

To see the comparison between the proposed method and the current state-of-the-art techniques in literature, in this context we compare the STPN and STPN+convex programming method to FHMM and CO. However, for obtaining enough accuracy of prediction results, the data set is as well upsampled for FHMM with upsampling fold being 1200. Thus the sampling frequency becomes 3 sec accordingly and the number of states used is 3. The energy disaggregation results from Figure 16 show that both FHMM and CO perform worse than the proposed method although the predicted WBE in Figure 17 looks quite promising. It is because FHMM cannot predict the transient peaks appearing as quite well as the proposed method and CO is unable to disaggregate the load component well. The very similar conclusion is made as well for the month of July. From Figure 18 it is observed that when the energy curves are more oscillatory, the proposed method is able to outperform FHMM and CO. It can be suggested both from Figures 17 and 19 that the proposed STPN and STPN+convex programming present better energy prediction in terms of WBE. Results in Figure 20,  21 and Table 1 quantitatively present the difference among the proposed method (STPN, STPN+convex programming), FHMM, and combinatorial optimization method. It strengthens the conclusion that using STPN and STPN+convex programming yield quite encouraging and promising disaggregation results in NILM. Hence, the comparison among the proposed method and FHMM, combinatorial optimization indicates the effectiveness of the STPN-based energy prediction scheme as an important tool to deal with energy prediction. We also remark on the computational efficiency on the proposed method, FHMM, and CO.

Method Time (s) Memory (MB) Accuracy (MSE)
STPN 28.74 962 0.0072
STPN+convex programming 369.64 2756 0.0070
FHMM 38.10 798.67 0.0163
CO 11.25 769.37 0.0564
Table 1: Computational information for different methods in April
Remark 5.1.

In this case we also consider the computational time, memory along with accuracy (MSE) in order to compare the performance of different methods. FHMM and combinatorial optimization methods were implemented in ipython notebook for the NILM toolkit (NILMTK) while STPN and STPN+convex programming in the MATLAB environment and CVX package grant2008cvx . The results in Table 1 show that STPN can spend less time than FHMM while more memory is required as the number of states for STPN is more than FHMM in this case. STPN+convex programming approach needs more computational time and memory to run the whole process due to the optimizing iterations. FHMM and CO use less memory compared to the proposed schemes. However, in terms of accuracy, the STPN outperforms FHMM and CO approaches as shown in Table 1. The MSE of FHMM is more than two times as that of STPN. Moreover, STPN+convex programming is able to improve the accuracy obtained from the STPN framework. In summary, the energy prediction method based on the STPN framework may be an effective way in the applications of energy prediction. Note, the FHMM and the CO codes used here are part of a well–optimized toolbox and we expect that similar code and platform optimization can bring our proposed methods to a comparable level in terms of memory and time complexity.

6 Conclusions and Future Work

This paper presents a novel data-driven framework, spatiotemporal pattern network (STPN) to predict energy consumption for both supply side and demand side energy systems. While symbolic dynamic filtering performs the discretization and symbolization of continuous domain data for data level fusion of different variables in a dynamic system, a -Markov machine is able to capture its temporal characteristics. This work establishes another PFSA, called x-Markov machine, for addressing the issue of how to capture the causal dependencies between two time-series in this work. Moreover, for the quantification of causal dependencies, a mutual information based metric is applied in this regard. Prediction based on the STPN framework is proposed using expectation from symbolic domain to symbolic and continuous domain.

The proposed scheme is validated by two case studies, wind turbine power prediction (supply side energy systems) and non-intrusive load monitoring (demand side energy systems). For wind power prediction, the primary observation made in this paper is that the proposed STPN models can capture the salient spatiotemporal features and it is demonstrated that causal dependencies decrease with an increase in both spatial distances and temporal lags as intuitively expected. Based on such observation, the power prediction for a wind turbine is performed by using the observation from another wind turbine with a high degree of accuracy. For non-intrusive load monitoring, energy disaggregation performance of the proposed STPN framework with and without a convex programming step is evaluated. While the STPN scheme shows that each part of disaggregated energy can be predicted significantly better than state-of-the-art techniques such as FHMM and combinatorial optimization, a convex programming approach based on STPN is able to improve the prediction performance to achieve a further optimized disaggregation involving the constraint – disaggregated energy values should sum up to the total energy usage.

While current efforts are focusing on applying the proposed techniques on real data and problems, some of the other future research directions include:

  1. For wind power prediction – Impact analysis of other physical variables, e.g., wind direction on model quality for wind power prediction;

  2. For energy disaggregation – Joint state prediction by taking multiple variables into account for energy disaggregation;

  3. For energy disaggregation – Weighted factor and penalty term analysis in convex optimization for energy disaggregation.


This work was supported by the National Science Foundation under Grant No. CNS-1464279.


  • (1)

    F. Ziel, C. Croonenbroeck, D. Ambach, Forecasting wind power–modeling periodic and non-linear effects under conditional heteroscedasticity, Applied Energy 177 (2016) 285–297.

  • (2) G. Liu, M. Ouyang, L. Lu, J. Li, J. Hua, A highly accurate predictive-adaptive method for lithium-ion battery remaining discharge energy prediction in electric vehicle applications, Applied Energy 149 (2015) 297–314.
  • (3) S. Alessandrini, L. Delle Monache, S. Sperati, G. Cervone, An analog ensemble for short-term probabilistic solar power forecast, Applied Energy 157 (2015) 95–110.
  • (4) C. D. Zuluaga, M. A. Álvarez, E. Giraldo, Short-term wind speed prediction based on robust kalman filtering: An experimental comparison, Applied Energy 156 (2015) 321–330.
  • (5) J.-Z. Wang, Y. Wang, P. Jiang, The study and application of a novel hybrid forecasting model–a case study of wind speed forecasting in china, Applied Energy 143 (2015) 472–488.
  • (6) S. Garshasbi, J. Kurnitski, Y. Mohammadi, A hybrid genetic algorithm and monte carlo simulation approach to predict hourly energy consumption and generation by a cluster of net zero energy buildings, Applied Energy 179 (2016) 626–637.
  • (7) Z. Jiang, S. Sarkar, Understanding wind turbine interactions using spatiotemporal pattern network, in: ASME 2015 Dynamic Systems and Control Conference, American Society of Mechanical Engineers, 2015, p. V001T05A001.
  • (8) R. K. Jain, K. M. Smith, P. J. Culligan, J. E. Taylor, Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy, Applied Energy 123 (2014) 168–178.
  • (9) H. Liu, J. Shi, E. Erdem, Prediction of wind speed time series using modified taylor kriging method, Energy 35 (12) (2010) 4870–4879.
  • (10) J. Jung, R. P. Broadwater, Current status and future advances for wind speed and power forecasting, Renewable and Sustainable Energy Reviews 31 (2014) 762–777.
  • (11) S.-D. Kwon, Uncertainty analysis of wind energy potential assessment, Applied Energy 87 (3) (2010) 856–865.
  • (12) A. Tascikaraoglu, B. M. Sanandaji, K. Poolla, P. Varaiya, Exploiting sparsity of interconnections in spatio-temporal wind speed forecasting using wavelet transform, Applied Energy 165 (2016) 735–747.
  • (13) A. Tascikaraoglu, B. Sanandaji, G. Chicco, V. Cocina, F. Spertino, O. Erdinc, N. Paterakis, J. P. Catalao, Compressive spatio-temporal forecasting of meteorological quantities and photovoltaic power, IEEE Transactions on Sustainable Energy 7 (2016) 1295–1305.
  • (14) D. Koller, N. Friedman, Probabilistic graphical models: principles and techniques, MIT press, 2009.
  • (15) S. Sarkar, Z. Jiang, A. Akintayo, S. Krishnamurthy, A. Tewari, Probabilistic graphical modeling of distributed cyber-physical systems, in: H. Song, D. B. Rawat, S. Jeschke, C. Brecher (Eds.), Cyber-Physical Systems: Foundations, Principles and Applications, Todd Green, 2016, Ch. 18, pp. 265–286.
  • (16) T. R. Leek, Information extraction using hidden markov models, Ph.D. thesis, University of California, San Diego (1997).
  • (17) L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286.
  • (18) M. J. Beal, Z. Ghahramani, C. E. Rasmussen, The infinite hidden markov model, in: Advances in neural information processing systems, 2001, pp. 577–584.
  • (19) K. A. Heller, Y. W. Teh, D. Görür, G. Unit, Infinite hierarchical hidden markov models., in: AISTATS, 2009, pp. 224–231.
  • (20) K. Wakabayashi, T. Miura, Forward-backward activation algorithm for hierarchical hidden markov models, in: Advances in Neural Information Processing Systems, 2012, pp. 1493–1501.
  • (21) A. Ray, Symbolic dynamic analysis of complex systems for anomaly detection, Signal Processing 84 (7) (2004) 1115–1130.
  • (22) V. Rajagopalan, A. Ray, Symbolic time series analysis via wavelet-based partitioning, Signal Processing 86 (11) (2006) 3309–3320.
  • (23) S. Sarkar, A. Srivastav, M. Shashanka, Maximally bijective discretization for data-driven modeling of complex systems, in Proceedings of Americal Control Conference, Wahsington, DC, USA (June, 2013) 2674–2679.
  • (24) K. Mukherjee, A. Ray, State splitting and merging in probabilistic finite state automata for signal representation and analysis, Signal processing 104 (2014) 105–119.
  • (25) S. Sarkar, K. Mukherjee, S. Sarkar, A. Ray, Symbolic dynamic analysis of transient time series for fault detection in gas turbine engines, Journal of Dynamic Systems, Measurement, and Control 135 (1) (2013) 014506.
  • (26)

    A. Akintayo, S. Sarkar, A symbolic dynamic filtering approach to unsupervised hierarchical feature extraction from time-series data, in: 2015 American Control Conference (ACC), IEEE, 2015, pp. 5824–5829.

  • (27) S. Sarkar, S. Sarkar, K. Mukherjee, A. Ray, A. Srivastav, Multi-sensor information fusion for fault detection in aircraft gas turbine engines, Proc IMechE Part G: J Aerospace Engineering 227 (2012) 1988–2001.
  • (28) S. Sarkar, S. Sarkar, N. Virani, A. Ray, M. Yasar, Sensor fusion for fault detection & classification in distributed physical processes, frontiers in Robotics and AI - Sensor Fusion and Machine Perception.
  • (29) X. Jin, Y. Guo, S. Sarkar, A. Ray, R. M. Edwards, Anomaly detection in nuclear power plants via symbolic dynamic filtering, Nuclear Science, IEEE Transactions on 58 (1).
  • (30) S. Chakraborty, S. Sarkar, S. Gupta, A. Ray, Damage monitoring of refractory wall in a generic entrained-bed slagging gasification system, Proceedings of I Mech E Part A: Journal of Power and Energy 222 (8) (October, 2008) 791–807.
  • (31) C. Liu, Y. Gong, S. Laflamme, B. Phares, S. Sarkar, Bridge damage detection using spatiotemporal patterns extracted from dense sensor network, Measurement Science and Technology 28 (1) (2017) 014011.
  • (32) G. Hart, Nonintrusive appliance load monitoring, Proceedings of IEEE 80 (12).
  • (33) M. Zeifman, K. Roth, Non-intrusive appliance load monitoring (nialm): Review and outlook*, International Conference on Cosumer Electronics (2011) 1–27.
  • (34) A. Cominola, M. Giuliani, D. Piga, A. Castelletti, A. Rizzoli, A hybrid signature-based iterative disaggregation algorithm for non-intrusive load monitoring, Applied Energy 185 (2017) 331–344.
  • (35) M. B. Kennel, M. Buhl, Estimating good discrete partitions from observed data: Symbolic false nearest neighbors, Phys. Rev. Lett. 91 (2003) 084102. doi:10.1103/PhysRevLett.91.084102.
  • (36) A. Subbu, A. Ray, Space partitioning via hilbert transform for symbolic time series analysis, Applied Physics Letters 92 (2008) 084107.
  • (37) I. Chattopadhyay, Causality network, arXiv:1406.6651v1[cs.LG].
  • (38)
  • (39) K. Murphy, Hidden markov model (hmm) toolbox for matlab, 2005, URL http://www. cs. ubc. ca/murphyk/Software/HMM/hmm. html.
  • (40) R. Hendron, C. Engebrecht, Building america house simulation protocols (revised), Tech. rep., National Renewable Energy Laboratory (NREL), Golden, CO. (2010).
  • (41) Z. Ghahramani, M. I. Jordan, Factorial hidden markov models, Kluwer Academic Publishers, Boston, MA (1997).
  • (42)

    J. Z. Kolter, T. Jaakola, Approximate inference in additive factorial hidden markov models with application in energy disaggregation, In Proceedings of the International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands (2012) 1472 – 1482.

  • (43) W. J. Cook, W. H. cunningham, W. R. Pulleyblank, A. Schrijver, Combinatorial Optimization, 5th Edition, Vol. 21, Springer, 53113 Bonn, Germany, 2011.
  • (44) N. Batra, J. Kelly, O. Parson, H. Dutta, W. Knottenbelt, A. Rogers, A. Singh, M. Srivastava, Nilmtk: An open source toolkit for non-intrusive load monitoring, 5th International Conference on Future Energy Systems(ACM e-Energy), Cambridge UK (2014) 1 –14.
  • (45) M. Grant, S. Boyd, Y. Ye, Cvx: Matlab software for disciplined convex programming (2008).