1 Introduction
Energy prediction problems are essential for operating, monitoring, and optimizing (efficiency, cost) in diverse energy systems, from the supply side (e.g., wind energy, solar energy, power systems, battery) to the demand side (e.g., load monitoring, usage of electric vehicles, building energy management). Numerous studies are being carried out in terms of predicting the energy generation/consumption using timeseries data ziel2016forecasting ; liu2015highly ; alessandrini2015analog ; zuluaga2015short ; wang2015study ; garshasbi2016hybrid
. For instance, Kalman filtering, wavelet packet transforms, and least square support vector machines are used to predict wind power performance
zuluaga2015short ; wang2015study , while an analog ensemble method is applied to forecast solar power alessandrini2015analog . Liu et al. liu2015highlypredicts remaining state of charge of electric vehicle batteries based on predictive control theory. Hybrid genetic algorithms and Monte Carlo simulation approaches are applied to predict energy generation and consumption in netzero energy buildings
garshasbi2016hybrid . For modern energy systems, a large number of subsystems is usually involved, for example, hundreds of wind turbines are closely collocated in a wind farm where the wind resource is similar and the conditions of them are analogous in terms of the power transmission to the power system. As a result, prediction of wind turbine output is related among each of them, and the characteristics of spatial interactions can be potentially applied for prediction jiang2015understanding and design optimization. The prediction approaches discussed above can be viewed as methods of exploring temporal relationships. Spatial and temporal relationship widely exists in energy systems jain2014forecasting ; liu2010prediction ; jung2014current ; kwon2010uncertainty , yet spatiotemporal features are less commonly leveraged for energy prediction problems. The exploration of such features has been shown efficient in wind speed forecasting problems tascikaraoglu2016exploiting ; jung2014current ; tascikaraoglucompressive .To facilitate the energy prediction for energy systems with both spatial and temporal characteristics, probabilistic graphical models (PGM, including a variety of models described by conditional dependence structures, socalled graphs, including Bayesian networks and undirected/directed Markov networks, can be used to deal with dynamics systems and relational data
koller2009probabilistic), can possibly be employed as the spatiotemporal interactions are naturally suited for graph representations and can be evaluated by the associated probabilities. Bayesian networks are a type of PGM that captures causal relationships using directed edges
koller2009probabilistic, where the overall joint probability distribution of the network nodes (variables) is computed as a product of the conditional distributions (factors) defined by the nodes in the network. However, prediction problems are not straightforward for Bayesian networks, as they only encode nodebased conditional probabilities, and the approximation of the joint distribution using nodebased structures is often intractable
sarkar2016pgm . This is because a certain directed acyclic graphical structure may not allow for easy and exact computation of certain probabilities related to inference questions.Markov models, as a class of statistical models, have been widely applied to different domains, e.g., natural language processing and speech recognition leek1997information
. These models are shown to be efficient in identifying the probabilistic dependencies among random variables in both directed and undirected manner. Hidden Markov Models (HMMs) have been particularly successful for learning temporal dynamics of an underlying process
rabiner1989tutorial . Several modifications for HMMs have been proposed, such as integrated HMM (IHMM) beal2001infinite which integrated several parameters to three hyperparameters to model countably infinite hidden state sequences, integrated hierarchical HMM (IHHMM) heller2009infinite extended HMMs to an infinite number of hierarchical levels, and wakabayashi2012forward applied a forwardbackward algorithm to reduce model complexity through the order of operations. However, Markov Models with hidden states usually rely on iterative learning algorithms that may be computationally expensive. To alleviate such issues, symbolic dynamic filtering (SDF) was proposed ray2004symbolic ; rajagopalan2006symbolic based on the concepts of symbolic dynamics and probabilistic finite state automata (PFSA). Several improvements related to coarse graining of continuous variables SSS13 , state splitting and merging techniques for PFSA mukherjee2014state , efficient inference algorithms sarkar2013symbolic , and hierarchical model learning akintayo2015symbolichave been proposed over the last decade within the SDF framework. SDF has been shown to be extremely efficient for anomaly detection and fault diagnostics of various complex systems, such as gas turbine engines
SSM12 , shipboard auxiliary systems SSV14 , nuclear power plants JGSRE11 , coal gasification systems CSGR08 and bridge monitoring process LGLPS17 .For the purpose of addressing prediction problems in disparate energy systems, this work presents a new datadriven framework (namely spatiotemporal pattern networks, or STPN) to leverage the spatiotemporal interactions of energy systems for prediction. Built on SDF, a STPN aims to capture the spatiotemporal characteristics of complex energy systems, and implement prediction at both spatial and temporal resolutions. For validation, two representative cases are proposed using the proposed approach, the first is taken from the energy supply side, wind power prediction in largescale wind farm, and the second is from the energy demand side, energy disaggregation (also as nonintrusive load monitoring (NILM), a wellestablished problem that involves disaggregating the total electrical energy consumption of a household into its constituent load components without the necessity for extensive metering installations on individual household or appliance GH92 ; MZKR11 ; cominola2017hybrid ).
Contributions: First, a novel datadriven method for energy prediction based on the STPN framework is proposed and the concepts of interests and relevance are established. Second, two typical case studies based on wind turbine power (supply side energy) and residential building energy disaggregation (demand side energy) are performed for validating the proposed scheme. For wind turbine power prediction, the spatiotemporal characteristics between different wind turbines are identified, while the complex coupled temporal features for home energy disaggregation. A STPNbased convex programming is presented in this work in order to improve energy disaggregation performance. We also present a comparative study of energy prediction performance of the proposed technique for both cases with other stateoftheart methods.
The remaining sections are outlined as follows. In Section 2 some necessary background of SDF is presented as well as the concepts of a Markov machine. While the prediction approach based on STPN is given in Section 3, two typical case studies, i.e., supply side (wind turbines) and demand side (NILM), for validating the proposed framework are presented in Section 4 and Section 5, respectively. In Section 6, conclusive remarks and future research directions beyond the existing results are offered.
2 Symbolic Dynamic Filtering and Markov Machines
This section gives an essential background on symbolic dynamic filtering necessary to characterize the proposed prediction method. We refer interested readers to SSS13
for more details. SDF is built upon the relevant concepts of discrete dynamic systems in which discretization and symbolization are critical steps to convert collected or observed continuous data to discrete symbol sequences. Therefore, the dynamic systems can be studied in deterministic or probabilistic settings in terms of symbolic space by using languagetheoretic approaches, e.g., shiftmaps and sliding block codes. The simplest approaches for partitioning are the uniform partitioning and maximum entropy, while these two methods were mainly applied to simple dynamic systems with data of less variance. The stateoftheart partitioning or discretization approaches include symbolic false nearest neighbor partitioning (SFNNP)
PhysRevLett.91.084102 , wavelet transform SSS13 , and Hilberttransformbased analytic signal space partitioning (ASSP) SR08 . Recently, a supervised partitioning scheme, i.e., maximally bijective discretization (MBD) SSS13 has been proposed for modeling and analyzing complex dynamic systems. Unlike the other methods, MBD is able to maximally preserve the inputoutput relationship originating from the continuous domain after discretization in dynamical systems.After discretization of the timeseries data in continuous domain, symbolization is implemented subsequently for establishing the
Markov machines. For SDF, a critical assumption is that we can approximate any symbol sequence generated by a time series data as a Markov chain of order
(which is a positive integer). Therefore, such a Markov chain is called Markov machine, which is used to establish the model for each time series data due to the temporal features associated with the symbol sequence. Some relevant definitions are more formally given as follows.Definition 2.1.
SSV14 (DFSA) A deterministic finite state automaton (DFSA) is a 3tuple where:

is a set of finite size for the symbol alphabet and ;

is a set of finite size for states and ;

is the mapping function for state transition;
while represents the collection of all finite symbol sequences from including the empty sequence .
Definition 2.2.
SSV14 (PFSA) A probabilistic finite state automaton (PFSA) is an extension to probabilistic setting from a DFSA as a pair , i.e., the PFSA is a 4tuple , where:

, and have the same definitions as in Definition 2.1;

is defined as a symbol generation function, i.e., probability morph function which are such that , where indicates the probability of the symbol occurring with the state .
Definition 2.3.
SSV14 (DMarkov) A DMarkov machine is an extension of a PFSA where the previous symbols form a state as defined by:

signifies the depth of a Markov machine;

is a set of finite size for states with , i.e., each state in a Markov machine is identified by some equivalence class of symbol strings whose length are with symbols in ;

signifies the state transition function such that if , then there exist any two symbols and such that and .
Remark 2.1.
Based on the Definition 2.3 it can be concluded that a DMarkov machine is naturally a stationary stochastic process , in which the probability of occurrence of a new symbol is determined by the last symbols, i.e., .
We denote by the state transition matrix and each entry of the matrix demonstrates the transition probability from one symbolic state to another. We give a simple example to illustrate this. Let the state of one dynamical system be such that the entry, i.e., of the matrix indicates the probability of as given that the previous state was , i.e.,
Moreover, one can model individual dynamical system making use of Markov machines. Because a Markov machine cannot capture the interaction dependencies for multiple systems or subsystems in a large complex system, it has recently been extended to a xMarkov machine, which was originally developed in order to obtain the internally causal dependencies among different systems or subsystems. Different from correlationbased analysis, such a model can efficiently build up and fairly generalize the causal dependencies C14 . The following shows the formal definition of xMarkov machine.
Definition 2.4.
SSV14 (xDMarkov) Let and be the PFSAs which correspond to symbol streams and respectively. Therefore a Markov machine is defined as a 5tuple such that:

represents the alphabet set of symbol sequence

is the state set which corresponds to symbol sequence

represents the alphabet set of symbol sequence

gives the state transition mapping that maps the transition in symbol sequence from one state to another based on occurrence of a symbol in

is the symbol generation matrix of size ; the entry of denotes the probability of obtaining the symbol of while making a transition from the state of
Therefore, it can be observed that one can obtain the probability of a new symbol occurring after the previous symbols given for an individual symbol sequence. On the other hand, in order to know the probability of a new symbol occurring in a symbol sequence with the last symbols given in another different symbol sequence, a xMarkov machine can be applied correspondingly. Equivalently speaking, given a xMarkov machine, the causal dependency of one symbol sequence on another symbol sequence can be captured.
3 Spatiotemporal Pattern Network
This section mainly presents how to construct the spatiotemporal pattern network (STPN) for two dynamical systems, and , based on the concepts of SDF introduced above. First we start with data partitioning/discretization and symbolization followed by the details of STPN construction.
3.1 Discretization and Symbolization
Suppose there are two different dynamic systems and . In realworld problems, such as wind power prediction, and can represent two different wind turbines in a large wind power farm. Alternatively, in residential home energy disaggregation, and could represent HVAC system electricity consumption and that of all appliances. For each system, there are various measured variables and typically some key observations are picked to establish the model and analyze. For example, for a wind turbine, wind speed and wind power are those two key observations for power predictions. It is, however, noted that some other variables, e.g., wind direction and humidity possibly affect power such that these variables can also be taken into account. The first step to model dynamic systems in terms of symbolic dynamics is the data discretization. As mentioned above, there are many approaches that can be used; in this paper, maximally bijective discretization (MBD) is applied to the supply side dynamic systems (wind turbines) and maximum entropy partitioning is used in demand side dynamic systems (HVAC, appliances, etc.). The reason we select different methods is because of the difference of measured variables. For wind turbines, wind speed and wind power are chosen and their inputoutput relation in the continuous domain can be maximally maintained. However, for home energy disaggregation, the unique variable for each part of the home energy use is the energy consumption itself such that there is no inputoutput relation in the continuous domain.
3.2 Symbolic Modeling of Dynamical Systems and Interactions
Figure 1 shows the symbol sequence generation in the form of PFSA using two different methods, i.e., maximally bijective discretization and maximum entropy partitioning, respectively. As discussed before it has been acknowledged that a Markov machine can be represented by a PFSA using a previous symbols to indicate one state. In this context, we take into consideration two different systems for addressing the quantification of their spatiotemporal or temporal relations. From Figure 2, the state transition matrices and show the selfrelations of systems and respectively. Then the crossstate transition matrices and correspondingly represent the causeeffect relations from A to B and B to A respectively. However, it should be noted that such casual dependencies between systems and are not necessarily equivalent. For quantification of the relations in a Markov machine, a xMarkov machine, atomic patterns (AP) and relational patterns (RP) were introduced in SSV14 , which can give more details. More formally, the entries of the crossstate transition matrices and can be expressed by:
where and . The above relations show that a crossstate transition matrix can be constructed from symbol sequences obtained from two different dynamical systems while each entry of each matrix signifies the transition probability from one state in the first dynamical system to another state in the second dynamical system. For instance, means the transition probability from state in the system to another state in the system .
Moreover, we use an information metric in order to quantify the value of the atomic and relational patterns (in this work the relational pattern is the major concern). In this context, mutual information is a metric of interest introduced to address the quantification. For example, from Figure 2, we denote by and the atomic and relational patterns respectively associated with systems A to B. Formally, the atomic pattern of system A is expressed as follows:
where
Therefore, based on the quantity (defined using different entropy values as presented above), the temporal selfprediction capability of the system A can be correspondingly identified.
On the other hand, the mutual information for the relational pattern involved in systems A and B can be described as:
where
Hence, the quantity of identifies system A’s capability of predicting system B’s outputs and vice versa for . Furthermore, based on the mutual information, patterns can be assigned with weights such that some patterns with low mutual information may be rejected for simplifying the model. Interested readers can find more details in SSV14 .
Based on the above analysis, it has been shown that the proposed STPN in this paper can be an effective tool for capturing the spatiotemporal interactions between different dynamic systems. For validating such a datadriven method this paper offers two case studies in terms of supply side dynamic systems (i.e., wind turbines in a wind farm) and demand side dynamic systems (i.e., home electric energy disaggregation) to demonstrate the efficacy and effectiveness. The prediction process can be described as follows: Given a training data set in the continuous domain, we use partitioning methods to discretize and symbolize the data for running the xMarkov machine. The probability transition matrices are obtained for predictions in symbolic or continuous domains. For the symbolic prediction, we find out the most likely symbol sequence for system given another symbol sequence of system running the xMarkov model numerous times. While in continuous domain the prediction can be acquired based on the symbolic prediction using expectation as follows:
(1) 
where, represents the expectation of energy at the instant, signifies the probability of symbol occurring at the instant after running numerous simulations of Monte Carlo Markov Chain, indicates the expectation of energy for the discrete bin labelled by symbol (suppose that in that bin there are discrete symbols).
The pseudocode of energy prediction based on STPN is as follows.
4 Supply Side: Wind Turbines
4.1 Geographical information
In this subsection, a case study based on the supply side energy systems, i.e., wind turbines, is used for validating the datadriven method proposed in this work. The STPN framework is used in a wind turbine network in order to capture the causal dependencies among different wind turbines that can be regarded as subsystems of a wind farm. This paper uses the 2006 Western Wind Integration data set obtained from NREL NREL06 to uncover causal dependencies which are vitally important to individual wind turbine power prediction in a mutual turbineturbine setting. For establishing the STPN, twelve wind turbines (located in California) that have capacity factors in excess of 40% are chosen; their IDs can be identified as: 4494, 4495, 4496, 4497, 4423, 4424, 4425, 4426, 4427, 4361, 4313 and 4314 (labeled by 112) in this context and the capacity factors are between 41% and 45% approximately. For completeness, the geographical information of the wind turbines is also provided. The annual average wind velocity in the area where the considered turbines are located is around 9 , with an elevation from 1019 to 1207 m.
As shown in Figure 3, twelve wind turbines are distributed in various locations, which can be identified as nodes in the STPN represented by Figure 4. From Figure 5 the relation between wind speed and wind power can be observed and the other wind turbines as well have the same pattern. The inputoutput relation involving a wind turbine is significant such that MBD enables the maximum preservation of their correspondence in the symbolic domain. The spatiotemporal patterns between different wind turbines and the very relational patterns between them can be found from the symbol sequences. Figure 6 shows an instance of symbol sequence for a wind turbine and it can be observed that most of the symbols are 1, 5, 8 and 9.
4.2 Results and Discussion
The mutual information of RP between a pair of wind turbines is first to be investigate according to the state transition matrices generated by xMarkov machines. We set the depth as 1 for simplicity and one can increase the parameter. Therefore, it can be immediately known that the current state of one selected wind turbine depends only on the last state of another selected wind turbine. The effect of time lag on the mutual information between wind turbines is studied for addressing the temporal characteristics. The results in Figure 7 show that as the time lag increases the mutual information decreases correspondingly. Thus in this work one can maximize the causal dependencies between any two different wind turbines at time lag 1.
The spatial characteristics between two different wind turbines is also another critical factor in STPN. Wind turbines labeled by 5, 6, 7, 1, and 10 are chosen for the purpose of such an analysis. Figure 8 shows that the causal dependency between any two wind turbines reduces with the increment of geographical (spatial) distance between them along any direction. Figure 9 also illustrates that the metric based on mutual information for a pair of wind turbines with the Euclidean distance between them exhibits a generally decreasing trend. Consequently, in summary, based on both of these observations made, applying the metric based on mutual information is an effective technique to capture the spatial and temporal patterns in wind turbine systems.
Next, we evaluate the effectiveness of the STPN in revealing causal dependencies through wind power prediction. The symbolic and continuous prediction of one wind turbine power is based on the observed symbol sequence emerging from another turbine. According to the procedure of energy prediction described above, Figure 10 and Figure 11 show the symbol prediction results in which the predicted symbol sequences emerging from the wind turbine 5 under the observations of wind turbines 6 and 7 respectively are compared to the true symbol sequences emerging from the wind turbine 5. It is noted that the model is trained by the data from the first halfyear of 2006 while tested by the second halfyear data. From those two plots it can be observed that for most of time the proposed xMarkov machines have a strong prediction capability, while some errors may come from the transient symbols. Moreover from observation it can be found that the prediction by wind turbine 6 is slightly better than that by wind turbine 7 as implied by mutual information.
Figure 12 shows that the mean square error (MSE) is a function of spatial distance between any pair of wind turbines using wind turbines 5, 6, 7, 8, and 9 and it displays a monotonically increasing trend. The prediction capacity in terms of symbols using the proposed STPN has been shown. An example of energy prediction for wind turbine 5 in the continuous domain with the observation of symbol sequence for wind turbine 6 is shown here to validate the energy prediction method. The plot of Figure 13 shows that the major trend in the actual data can be caught quite well and accurately for the continuous domain prediction as the partitioning method MBD is effective in preserving the inputoutput relation. However, a finer discretization may improve the prediction result in the continuous domain even though that requires a larger amount of data and increases the computational complexity correspondingly.
In order to evaluate the proposed scheme in wind power prediction, in this work we compare the performing capabilities of the STPN framework and a quite popular approach, namely, the Hidden Markov Model (HMM) with mixture which is adapted from HMM to deal with multiple variables. A toolbox compatible with MATLAB murphy2013hidden is applied in this context. The results in Figure 13 have shown that the proposed prediction method based on STPN framework outperforms the HMM with mixture under visual inspection. Quantitatively, while the MSE for predicted power using HMM with mixture is , the MSE for predicted power using the proposed algorithm is . Therefore, it can be concluded that the STPN scheme in which causal dependencies between different wind turbines are captured is a quite effective technique in wind power prediction.
5 Demand Side: NonIntrusive Load Monitoring
This subsection presents a second case study based on demand side energy systems; in particular, nonintrusive load monitoring (NILM) of electrical demand with the purpose of identifying electric load components for residential homes. As described in the above section, the STPN framework is used as well for electric load component disaggregation. In order to best identify the disaggregated energy usage corresponding to each electric energy consuming component from the total energy consumption, convex programming is applied here. This is necessary because for NILM there is no clear inputoutput relation with the result that–even though the STPN is used in this case study–the results obtained may not be optimal. Here, optimal means that the summation of all load components of residential home energy consumer adds up to the whole building electricity use. Therefore, with the prediction results by STPN, a convex programming based modification is introduced to achieve said optimal disaggregation.
5.1 Problem Description
For this case, the data set used for energy disaggregation is based on the Building America 2010 data set available from NREL hendron2010building . The data is for the hot and dry location of Bakersfield, California with ample of heating, ventilation, and airconditioning (HVAC) in the summer and includes the whole building electric (WBE), which is the sum of HVAC, lights, appliances (APPL), and miscellaneous electric loads (MELS). The goal here is to apply the measured WBE time series to predict HVAC, LIGHTS, APPL, and MELS, respectively. It is noted that WBE is the only known variable and for each part of prediction one month data is adopted where the first three week data is used for training the model, while the fourth week for testing.
Convex Programming: Before stating the prediction results, the convex programming problem setup is formulated for completeness. Suppose that the results obtained by STPN framework are group truth for each part except WBE. Thus the optimization problem can be expressed by
(2)  
where represent the decision variables to be determined, signify the prediction results obtained from STPN, is the known values of WBE, is the Euclidean norm between and .
The pseudocode of energy prediction based on STPN framework and convex programming is shown as follows. We use STPN+convex programming for reference of the combination of the STPN framework and convex programming technique throughout the rest of analysis.
Factorial Hidden Markov Model: Factorial Hidden Markov Model (FHMM) ZGMJ97 is an extension of Hidden Markov Models that parallelizes multiple Markov models in a distributed manner, and performs some task–related inference to arrive at predicted observation. The application of such models is done by representing each end–use as a hidden state that is modeled by multinomial distribution using discrete values, and then sum each appliance meter’s individual independent contribution to the expected observation (i.e., the total expected main meter value). AFAMAP JZKTJ12 variant of FHMM which includes the trends in the hidden states of FHMM have also been reported to be effective in the disaggregation task. In our application of FHMM, the number of hidden states are the number of testing appliances, while = 3 in order to keep the computational requirements low.
Combinatorial Optimization
: Combinatorial optimization (CO)
BKJV11algorithm is a heuristic scheme that attempts to minimize the
–norm of the total power at the mains and the sum of the power of the end–uses, given either single or multi–state formulation of the sum. The drawbacks of CO for disaggregation tasks are its sensitivity to transients and degradation with increasing number of devices or similarity in device characteristics.We applied the algorithms as available in the non–intrusive load monitoring toolkit NJOHWAAM14 with an exact inference ZGMJ97 for the FHMM.
5.2 Results and Discussion
For validation of the proposed energy prediction approach, two months, i.e., April and July, are selected to study the prediction performance accordingly. As the Building America 2010 data set has 1 hour sampling frequency and three weeks data is for training, such scale of data may not meet the requirement of data size for the construction of STPN. Building up STPN with not enough amount of data may result in the poor accuracy of causal dependencies between different variables. Therefore, a data reprocessing technique, i.e., upsampling is applied in this case and the upsampling fold is 30 such that the sampling frequency for the data set is 2 minutes.
First, we study the causal dependencies among these five variables by computing the mutual information. Figure 14 shows the variation of mutual information with respect to time lag in 2 minutes for addressing the temporal characteristics. The depth of xMarkov machine is still 1 such that the current symbol of any part of HVAC, LIGHTS, APPL and MELS depends only on the past one symbol of WBE. Different from the wind turbine systems, the causal dependencies between WBE and the other four load components have decreased little with an increase of time lag, which reflects that using WBE to predict other parts of energy consumption is temporally robust. However, it also shows that the causal dependency between WBE and HVAC in July is the maximum compared with those between WBE and other load components (i.e., LIGHTS, APPL, and MELS) such that the prediction of HVAC using WBE yields the best accuracy.
The results in Figure 15 show the causal dependencies quantified by mutual information among all of five variables. It can be observed that the causal dependency between HVAC and APPL is larger than that between HVAC and MELS as well as that between HVAC and LIGHTS. While the relations among LIGHTS, APPL and MELS can be seen to be quite significant due to the causal dependencies obtained in this context. In summary, this relational pattern network captures temporal interactions between different end uses that can be an effective technical tool for energy disaggregation.
Figure 16 shows the energy disaggregation of HVAC, LIGHTS, APPL and MELS using STPN and STPN+convex programming in April. In this month, the energy consumption of HVAC is most significant such that it accounts for the largest percentage of WBE. A strong prediction capabilities of STPN can be observed from the plots and based on that the STPN+convex programming is able to improve STPN performance, which is attributed to the constraint imposed in the convex programming. It can also be seen from Figure 17 that the total energy consumption by STPN without convex programming is worse than STPN+convex programming results and the optimal disaggregation appears to be achieved. However, the prediction performance for APPL and LIGHTS is slightly worse than HVAC and MELS because they account for a lower percentage of WBE, which is also evident as suggested by Figure 18.
Therefore, it can be implied that for energy disaggregation the more accurate prediction can be achieved when one load component (i.e., HVAC, LIGHTS, APPL, and MELS) accounts for a more significant percentage of WBE. It is seen from Figure 18 that the prediction for the last two days in the fourth week is worse though it is able to catch the trend, which may be attributed to the fact that on those two days some transient external factors, such as weather and occupancy, affect the energy consumption. A similar observation can be made from Figure 19 that the optimal disaggregation can be made via STPN+convex programming. For a direct visual inspection of the prediction capability difference, Figure 20 and Figure 21 reveal that STPN+convex programming outperforms STPN alone as for each part the energy consumption is predicted optimally. The fact that these two plots show an energy prediction difference by STPN or STPN+convex programming of less than 5% demonstrates efficacy and effectiveness of the proposed framework.
To see the comparison between the proposed method and the current stateoftheart techniques in literature, in this context we compare the STPN and STPN+convex programming method to FHMM and CO. However, for obtaining enough accuracy of prediction results, the data set is as well upsampled for FHMM with upsampling fold being 1200. Thus the sampling frequency becomes 3 sec accordingly and the number of states used is 3. The energy disaggregation results from Figure 16 show that both FHMM and CO perform worse than the proposed method although the predicted WBE in Figure 17 looks quite promising. It is because FHMM cannot predict the transient peaks appearing as quite well as the proposed method and CO is unable to disaggregate the load component well. The very similar conclusion is made as well for the month of July. From Figure 18 it is observed that when the energy curves are more oscillatory, the proposed method is able to outperform FHMM and CO. It can be suggested both from Figures 17 and 19 that the proposed STPN and STPN+convex programming present better energy prediction in terms of WBE. Results in Figure 20, 21 and Table 1 quantitatively present the difference among the proposed method (STPN, STPN+convex programming), FHMM, and combinatorial optimization method. It strengthens the conclusion that using STPN and STPN+convex programming yield quite encouraging and promising disaggregation results in NILM. Hence, the comparison among the proposed method and FHMM, combinatorial optimization indicates the effectiveness of the STPNbased energy prediction scheme as an important tool to deal with energy prediction. We also remark on the computational efficiency on the proposed method, FHMM, and CO.
Method  Time (s)  Memory (MB)  Accuracy (MSE) 

STPN  28.74  962  0.0072 
STPN+convex programming  369.64  2756  0.0070 
FHMM  38.10  798.67  0.0163 
CO  11.25  769.37  0.0564 
Remark 5.1.
In this case we also consider the computational time, memory along with accuracy (MSE) in order to compare the performance of different methods. FHMM and combinatorial optimization methods were implemented in ipython notebook for the NILM toolkit (NILMTK) while STPN and STPN+convex programming in the MATLAB environment and CVX package grant2008cvx . The results in Table 1 show that STPN can spend less time than FHMM while more memory is required as the number of states for STPN is more than FHMM in this case. STPN+convex programming approach needs more computational time and memory to run the whole process due to the optimizing iterations. FHMM and CO use less memory compared to the proposed schemes. However, in terms of accuracy, the STPN outperforms FHMM and CO approaches as shown in Table 1. The MSE of FHMM is more than two times as that of STPN. Moreover, STPN+convex programming is able to improve the accuracy obtained from the STPN framework. In summary, the energy prediction method based on the STPN framework may be an effective way in the applications of energy prediction. Note, the FHMM and the CO codes used here are part of a well–optimized toolbox and we expect that similar code and platform optimization can bring our proposed methods to a comparable level in terms of memory and time complexity.
6 Conclusions and Future Work
This paper presents a novel datadriven framework, spatiotemporal pattern network (STPN) to predict energy consumption for both supply side and demand side energy systems. While symbolic dynamic filtering performs the discretization and symbolization of continuous domain data for data level fusion of different variables in a dynamic system, a Markov machine is able to capture its temporal characteristics. This work establishes another PFSA, called xMarkov machine, for addressing the issue of how to capture the causal dependencies between two timeseries in this work. Moreover, for the quantification of causal dependencies, a mutual information based metric is applied in this regard. Prediction based on the STPN framework is proposed using expectation from symbolic domain to symbolic and continuous domain.
The proposed scheme is validated by two case studies, wind turbine power prediction (supply side energy systems) and nonintrusive load monitoring (demand side energy systems). For wind power prediction, the primary observation made in this paper is that the proposed STPN models can capture the salient spatiotemporal features and it is demonstrated that causal dependencies decrease with an increase in both spatial distances and temporal lags as intuitively expected. Based on such observation, the power prediction for a wind turbine is performed by using the observation from another wind turbine with a high degree of accuracy. For nonintrusive load monitoring, energy disaggregation performance of the proposed STPN framework with and without a convex programming step is evaluated. While the STPN scheme shows that each part of disaggregated energy can be predicted significantly better than stateoftheart techniques such as FHMM and combinatorial optimization, a convex programming approach based on STPN is able to improve the prediction performance to achieve a further optimized disaggregation involving the constraint – disaggregated energy values should sum up to the total energy usage.
While current efforts are focusing on applying the proposed techniques on real data and problems, some of the other future research directions include:

For wind power prediction – Impact analysis of other physical variables, e.g., wind direction on model quality for wind power prediction;

For energy disaggregation – Joint state prediction by taking multiple variables into account for energy disaggregation;

For energy disaggregation – Weighted factor and penalty term analysis in convex optimization for energy disaggregation.
Acknowledgement
This work was supported by the National Science Foundation under Grant No. CNS1464279.
References

(1)
F. Ziel, C. Croonenbroeck, D. Ambach, Forecasting wind power–modeling periodic and nonlinear effects under conditional heteroscedasticity, Applied Energy 177 (2016) 285–297.
 (2) G. Liu, M. Ouyang, L. Lu, J. Li, J. Hua, A highly accurate predictiveadaptive method for lithiumion battery remaining discharge energy prediction in electric vehicle applications, Applied Energy 149 (2015) 297–314.
 (3) S. Alessandrini, L. Delle Monache, S. Sperati, G. Cervone, An analog ensemble for shortterm probabilistic solar power forecast, Applied Energy 157 (2015) 95–110.
 (4) C. D. Zuluaga, M. A. Álvarez, E. Giraldo, Shortterm wind speed prediction based on robust kalman filtering: An experimental comparison, Applied Energy 156 (2015) 321–330.
 (5) J.Z. Wang, Y. Wang, P. Jiang, The study and application of a novel hybrid forecasting model–a case study of wind speed forecasting in china, Applied Energy 143 (2015) 472–488.
 (6) S. Garshasbi, J. Kurnitski, Y. Mohammadi, A hybrid genetic algorithm and monte carlo simulation approach to predict hourly energy consumption and generation by a cluster of net zero energy buildings, Applied Energy 179 (2016) 626–637.
 (7) Z. Jiang, S. Sarkar, Understanding wind turbine interactions using spatiotemporal pattern network, in: ASME 2015 Dynamic Systems and Control Conference, American Society of Mechanical Engineers, 2015, p. V001T05A001.
 (8) R. K. Jain, K. M. Smith, P. J. Culligan, J. E. Taylor, Forecasting energy consumption of multifamily residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy, Applied Energy 123 (2014) 168–178.
 (9) H. Liu, J. Shi, E. Erdem, Prediction of wind speed time series using modified taylor kriging method, Energy 35 (12) (2010) 4870–4879.
 (10) J. Jung, R. P. Broadwater, Current status and future advances for wind speed and power forecasting, Renewable and Sustainable Energy Reviews 31 (2014) 762–777.
 (11) S.D. Kwon, Uncertainty analysis of wind energy potential assessment, Applied Energy 87 (3) (2010) 856–865.
 (12) A. Tascikaraoglu, B. M. Sanandaji, K. Poolla, P. Varaiya, Exploiting sparsity of interconnections in spatiotemporal wind speed forecasting using wavelet transform, Applied Energy 165 (2016) 735–747.
 (13) A. Tascikaraoglu, B. Sanandaji, G. Chicco, V. Cocina, F. Spertino, O. Erdinc, N. Paterakis, J. P. Catalao, Compressive spatiotemporal forecasting of meteorological quantities and photovoltaic power, IEEE Transactions on Sustainable Energy 7 (2016) 1295–1305.
 (14) D. Koller, N. Friedman, Probabilistic graphical models: principles and techniques, MIT press, 2009.
 (15) S. Sarkar, Z. Jiang, A. Akintayo, S. Krishnamurthy, A. Tewari, Probabilistic graphical modeling of distributed cyberphysical systems, in: H. Song, D. B. Rawat, S. Jeschke, C. Brecher (Eds.), CyberPhysical Systems: Foundations, Principles and Applications, Todd Green, 2016, Ch. 18, pp. 265–286.
 (16) T. R. Leek, Information extraction using hidden markov models, Ph.D. thesis, University of California, San Diego (1997).
 (17) L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286.
 (18) M. J. Beal, Z. Ghahramani, C. E. Rasmussen, The infinite hidden markov model, in: Advances in neural information processing systems, 2001, pp. 577–584.
 (19) K. A. Heller, Y. W. Teh, D. Görür, G. Unit, Infinite hierarchical hidden markov models., in: AISTATS, 2009, pp. 224–231.
 (20) K. Wakabayashi, T. Miura, Forwardbackward activation algorithm for hierarchical hidden markov models, in: Advances in Neural Information Processing Systems, 2012, pp. 1493–1501.
 (21) A. Ray, Symbolic dynamic analysis of complex systems for anomaly detection, Signal Processing 84 (7) (2004) 1115–1130.
 (22) V. Rajagopalan, A. Ray, Symbolic time series analysis via waveletbased partitioning, Signal Processing 86 (11) (2006) 3309–3320.
 (23) S. Sarkar, A. Srivastav, M. Shashanka, Maximally bijective discretization for datadriven modeling of complex systems, in Proceedings of Americal Control Conference, Wahsington, DC, USA (June, 2013) 2674–2679.
 (24) K. Mukherjee, A. Ray, State splitting and merging in probabilistic finite state automata for signal representation and analysis, Signal processing 104 (2014) 105–119.
 (25) S. Sarkar, K. Mukherjee, S. Sarkar, A. Ray, Symbolic dynamic analysis of transient time series for fault detection in gas turbine engines, Journal of Dynamic Systems, Measurement, and Control 135 (1) (2013) 014506.

(26)
A. Akintayo, S. Sarkar, A symbolic dynamic filtering approach to unsupervised hierarchical feature extraction from timeseries data, in: 2015 American Control Conference (ACC), IEEE, 2015, pp. 5824–5829.
 (27) S. Sarkar, S. Sarkar, K. Mukherjee, A. Ray, A. Srivastav, Multisensor information fusion for fault detection in aircraft gas turbine engines, Proc IMechE Part G: J Aerospace Engineering 227 (2012) 1988–2001.
 (28) S. Sarkar, S. Sarkar, N. Virani, A. Ray, M. Yasar, Sensor fusion for fault detection & classification in distributed physical processes, frontiers in Robotics and AI  Sensor Fusion and Machine Perception.
 (29) X. Jin, Y. Guo, S. Sarkar, A. Ray, R. M. Edwards, Anomaly detection in nuclear power plants via symbolic dynamic filtering, Nuclear Science, IEEE Transactions on 58 (1).
 (30) S. Chakraborty, S. Sarkar, S. Gupta, A. Ray, Damage monitoring of refractory wall in a generic entrainedbed slagging gasification system, Proceedings of I Mech E Part A: Journal of Power and Energy 222 (8) (October, 2008) 791–807.

(31)
C. Liu, Y. Gong, S. Laflamme, B. Phares, S. Sarkar,
Bridge damage
detection using spatiotemporal patterns extracted from dense sensor network,
Measurement Science and Technology 28 (1) (2017) 014011.
URL http://stacks.iop.org/09570233/28/i=1/a=014011  (32) G. Hart, Nonintrusive appliance load monitoring, Proceedings of IEEE 80 (12).
 (33) M. Zeifman, K. Roth, Nonintrusive appliance load monitoring (nialm): Review and outlook*, International Conference on Cosumer Electronics (2011) 1–27.
 (34) A. Cominola, M. Giuliani, D. Piga, A. Castelletti, A. Rizzoli, A hybrid signaturebased iterative disaggregation algorithm for nonintrusive load monitoring, Applied Energy 185 (2017) 331–344.

(35)
M. B. Kennel, M. Buhl,
Estimating good
discrete partitions from observed data: Symbolic false nearest neighbors,
Phys. Rev. Lett. 91 (2003) 084102.
doi:10.1103/PhysRevLett.91.084102.
URL http://link.aps.org/doi/10.1103/PhysRevLett.91.084102  (36) A. Subbu, A. Ray, Space partitioning via hilbert transform for symbolic time series analysis, Applied Physics Letters 92 (2008) 084107.
 (37) I. Chattopadhyay, Causality network, arXiv:1406.6651v1[cs.LG].
 (38) http://wind.nrel.gov/webnrel/.
 (39) K. Murphy, Hidden markov model (hmm) toolbox for matlab, 2005, URL http://www. cs. ubc. ca/murphyk/Software/HMM/hmm. html.
 (40) R. Hendron, C. Engebrecht, Building america house simulation protocols (revised), Tech. rep., National Renewable Energy Laboratory (NREL), Golden, CO. (2010).
 (41) Z. Ghahramani, M. I. Jordan, Factorial hidden markov models, Kluwer Academic Publishers, Boston, MA (1997).

(42)
J. Z. Kolter, T. Jaakola, Approximate inference in additive factorial hidden markov models with application in energy disaggregation, In Proceedings of the International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands (2012) 1472 – 1482.
 (43) W. J. Cook, W. H. cunningham, W. R. Pulleyblank, A. Schrijver, Combinatorial Optimization, 5th Edition, Vol. 21, Springer, 53113 Bonn, Germany, 2011.
 (44) N. Batra, J. Kelly, O. Parson, H. Dutta, W. Knottenbelt, A. Rogers, A. Singh, M. Srivastava, Nilmtk: An open source toolkit for nonintrusive load monitoring, 5th International Conference on Future Energy Systems(ACM eEnergy), Cambridge UK (2014) 1 –14.
 (45) M. Grant, S. Boyd, Y. Ye, Cvx: Matlab software for disciplined convex programming (2008).
Comments
There are no comments yet.