1. Introduction
Energy management is known as one of the most crucial problems in the vision of smart building (Manic et al., 2016). On globe scale, 30% of total energy consumption and 60% of electricity consumption are spent on buildings (Manic et al., 2016). In order to reduce the carbon emission and enhance environment sustainability, it is highly demanding to redesign and deploy the new generation of building management systems (BMS), which aims to minimize the building energy consumption by controlling and scheduling the components in the buildings based on accurate models on energy behavior.
Among the components controlled by BMS in modern commercial buildings, chiller plant is recognized as the most energy hungry component. In Singapore, statistics reveal that about one third of the building energy is spent by airconditioning equipments in households (SES, 2016)
. Despite of the long history of chiller plant optimization techniques, it remains a challenging problem to generate the desirable cooling load for demands from the building while minimizing the total energy consumption. The challenges mainly come from two difficulties, including the difficulty on modelling a nonstable dynamic physical system, as well as the difficulty on accurate measurement of the running status of the chiller plant. Conventional optimization techniques developed in mechanical engineering community mostly rely on physical models of the equipments, based on oversimplified assumptions on the running conditions. Real world chiller plants are hardly compatible with these physical models, due to the variance coming from the difference on installation and ageing of equipments.
Until recently, thanks to the quick advances of sensor technologies, for the first time, chiller plant system can accurately monitor all of its components in real time, as is shown in Figure 1. The explosive growth of data collected from chiller plant, as well as the increasing analytical power, are now raising new interests on datadriven approaches on chiller plant optimization. By utilizing the historical and fresh data from the sensors, data analysis is expected to enhance the performance of the existing optimization techniques.
One key advantage of datadriven optimization approach is its high adaptivity to varying external conditions, ageing devices and human behaviors, all of which are known as important performance factors but hard to model using physical models only. To fully exploit such advantage, on the other hand, we must resort to powerful data analytical tools to fully capture these variables not covered by traditional physical models. Blind optimization over the massive data records, may not improve over dull rulebased strategies commonly used in existing building management systems (BMS) (Doukas et al., 2007). Instead, the success of datadriven approaches heavily relies on the appropriate integration of domain knowledge from experts and the valuable data insight unknown to domain experts. DeepMind (Dee, 2016), for example, is attempting to optimize the power for cooling service in Google’s data centers, based on the patterns of energy consumption linked to running status of the machines in the data center.
In this paper, our discussions on datadriven chiller plant optimization are in threefold. First, we present how we solve the chiller plant energy optimization problem following a carefully chosen technical roadmap, including three steps, i.e., active data enrichment, energy consumption prediction and realtime configuration search. Each subproblem solved in the steps of the roadmap is designed to fully exploit corresponding data available to the system, thus rendering more accurate models over the targets, such as future cooling load and chiller energy consumption. Second, we vertically decompose the whole chiller plant architecture into a number of modules, such that accurate prediction output of a module could be used as inputs to other modules. Each module is then independently analyzed by training models using the sensor data. Given the high accuracy of the data models over individual modules and the low dependency between the modules, thanks again to our domain expertise, the combination of the models generates excellent overall performance on every subproblem in our technical roadmap. Third, we discuss how our domain knowledge is used to guide our model selection. Instead of using complex deep learning models, e.g., Recurrent Neural Network
(Hochreiter and Schmidhuber, 1997), our system adopts certain simple yet useful models based on our understanding to the general running mechanisms of the equipments, avoiding the underfitting problems with overcomplicated deep learning models.To elaborate the details of our technique, the rest of the paper is organized as following. Section 2 reviews existing studies on chiller plant optimization and deep learning over time series domain. Section 3 discusses the data acquisition and preprocessing. Section 4 presents the models employed to predict cooling load and energy consumption. Section 5 introduces the realtime optimization system we implemented to apply our optimization algorithms. Section 6 empirically evaluate the usefulness of our models and system, and Section 7 finally concludes the paper.
2. Related Work
In this section, we review the existing studies on a variety of relevant research problems, including the models on the cooling loads and energy performance, control approaches using neural network and extreme seeking, as well as datadriven optimization techniques for other Internet of Things (IoT) systems.
Chiller plant optimization is a traditional research topic in mechanical engineering. A large number of research works in the literature attempt to model the behavior of chiller plant based on the physical rules of the cooling devices, e.g., (Zhao and Magoulès, 2012; Yang et al., 2005; Iwafune et al., 2014; BenNakhi and Mahmoud, 2004; Li et al., 2009; Li and Wen, 2014; Momtazpour et al., 2015). Most of these works, however, do not consider the varying factors, such as ageing equipments and indoor activities.
Neural networks are considered as an option for modeling in mechanical engineering community (Xu et al., 2005; Wang and Ma, 2008; Chow et al., 2002). These works directly include neural network in their prediction models. Such approach may not reflect the actual running mechanism beneath the equipments. Moreover, due to the existing control policy of chiller plant, the data collected by the sensor network usually only spans on a subspace of configuration space. Training over such data may cause high generalization errors. In our work, we solve these problems with active data enrichment and select models based on domain knowledge over the equipments.
Performance measurement is an equally important problem to optimization in chiller plant, in the sense that a systematic approach is needed to evaluate the improvement of new optimization strategies adopted by the chiller plant. This is not a simple task, because the external variables keep changing over time. Reduced energy consumption could be fully due to better weather but not good control optimization. It is therefore demanding to build a general baselining technique, which enables to estimate the energy consumption of a chiller plant if an old control strategy is employed, such as the methods proposed in
(Jelali, 2006; Salsbury and Alcala, 2015). Our technique proposed in this paper can also be used as a baselining approach. In our empirical evaluations, we use our well trained model for energy baselining.Extreme seeking is an active and dynamic search approach, designed to seek for optimal system configuration even when the system is constantly in unstable state, e.g., (Tyagi et al., 2006; Mu et al., 2016). However, existing results show that extreme seeking may not always find the optimum in the search space, because of the dynamic nature of the chiller plant system. We handle the problem simply by hard coding domain knowledge into the optimization procedure. Such simple strategy turns out to be highly effective on finding nearly optimal control decisions.
Recent years witness the quick advances of deep learning techniques. Recurrent neural network (RNN) (Sak et al., 2014; Graves et al., 2009; Hochreiter and Schmidhuber, 1997; Liang et al., 2016) and its variants are the most popular models for time series domain. However, RNN is well known for its high complexity and low training efficiency. An underfitted RNN model, i.e., without sufficient training, fails to fully exploit the power of the deep neural networks. In this paper, we solve the problem by decomposing the chiller plant system into small modules and choose appropriate simple models for individual modules based on our domain knowledge. We show that the accuracy performance is excellent and the training efficiency of the approach is much better than that of complex neural networks.
Internet of Things (IoT) technique is quickly growing, thanks to the cheap and reliable data acquisition methods enabled by new generation of sensor networks. Predictive maintenance, for example, is one of the most important applications of IoT. In (Jung et al., 2017), Jung et al. show that analytics over vibration sensors on motors and tubes can accurately predict the problems with the equipments. Such analysis supports timely replacement of the equipments, instead of traditional strategies with fixed replacement period. In our work, as is shown in the experiments, with accurate model on the running status of the equipments, our analysis also supports realtime fault diagnosis.
3. Preliminaries
Chilled waterbased cooling systems are commonly used to cool and dehumidify air in various types of large buildings, such as offices, industry sites, hospitals, schools and etc. (Resources, 2010). A typical chiller plant (chilled waterbased) is equipped with four types of energy consuming equipments, including chillers, cooling towers, condenser water pumps and chilled water pumps. In Figure 1, the example chiller plant consists of three chillers, three cooling towers, three condenser water pumps and three chilled water pumps. To better understand how chiller plant works, we present a simplified structure of chiller plant in Figure 2. There are two independent cycles in every chiller plant. The inner cycle pushes the chilled water from chillers to the indoor space of the building for air cooling. The chilled water absorbs the heats from the building and runs back to the chiller at higher temperature. Similarly, the outer cycle pushes the condenser water from chiller to the cooling tower, which emits the heat to the outdoor environment, with colder water coming back to the chiller. The heat exchange between the inner and outer cycles is executed by the chillers. The chilled water and condenser water flow in the cycles, powered by the chiller water pump and condenser water pump respectively. To simplify the presentations in the rest of the paper, we summarize the notations in Table 1 following most of the research works on chiller plant optimization.
Symbol  Description 

CH  Chiller 
CT  Cooling tower 
CWP  Condenser water pump 
CHWP  Chilled water pump 
CWFM  Condenser water flow model 
CHFM  Chilled water flow model 
CWTM  Condenser water temperature model 
VSD  Variable speed drive 
VSD speed of condenser water pump  
VSD speed of cooling tower fan  
VSD speed of chilled water pump  
kW  Kilowatt 
Chiller power in kW  
Cooling tower power in kW  
Condenser water pump power in kW  
Chilled water pump power in kW  
RT  Refrigeration ton 
RLA  Rated load amperage 
chilled water flow in/out chillers  
condenser water flow in/out chillers  
condenser water temperature into chillers  
chiller setpoint indicating the desired chiller water temperature  
dry bulb temperature  
relative humidity 
3.1. Data Collection
Generally speaking, chiller plant control system collects three types of data, including control data, external condition data and sensor data.
Control Data:In a chiller plant, equipments like cooling tower and water pumps are controlled by variablespeed drive (VSD) ^{1}^{1}1https://en.wikipedia.org/wiki/Adjustablespeed_drive , which is an inverter to control the speed of motors. Using Kaer’s Krealtime system, we record the change of the running parameters, as listed below:

VSD speed of condenser water pump in percentage

VSD speed of cooling tower fan in percentage

VSD speed of chilled water pump in percentage

Binary Configurations (On/Off) of chillers

Binary Configurations (On/Off) of condenser water pumps

Binary Configurations (On/Off) of chilled water pumps

Binary Configurations (On/Off) of cooling towers
External Condition Data: Besides control parameters, weather data are also collected including the relative humidity and the dry bulb temperature. One measurement is collected in every minute, in order to track immediate change of weather conditions.
Sensor Data: Sensors are deployed on all equipments in chiller plants. Specifically, we record the power of each equipment, i.e., chillers, pumps and cooling towers, the input and output temperatures of the chilled water and condensed water, the flow rate of the running water. We also record the power, cooling load in refrigeration tons (RTs) and heat balance of the system. All sensor readings are collected every minute. The details of sensor data are described in Table 1.
3.2. Technical Target and Roadmap
Based on high school physics, the cooling load generation of the chiller plant, known as refrigeration ton (or RT in short), is proportional to the flow rate of the chilled water as well as the temperature difference between output and input chilled water. Therefore, in order to generate the desired cooling load, the major tradeoff is on the chilled water flow rate and the temperate of the input chilled water into the building. The flow rate of the chilled water is mainly controlled by the chilled water pump, which increases the flow rate by consuming more electricity. The temperate of the input chilled water is decided by the power consumption of the chiller, as well as the condenser water pump and the cooling tower. Basically, the system could spend more energy either on speeding up chilled water pump, or lowering the temperate of the chilled water. The key of energy saving is to find the balance between these two factors, i.e., a configuration achieving the cooling load and minimizing total power consumption over all equipments.
In this paper, we mainly target to optimize the chiller plant energy consumption by applying microcontrol strategies over the equipments. As discussed in Section 3.1, there are two types of control parameters in a chiller plant: VSD speed of water pumps and cooling tower fans, and configurations (On/Off) of equipments. We do not consider the on/off decisions made over the equipments, i.e., chiller, pumps and cooling towers. This is because, in practice, most of the macrocontrol policies over the number of simultaneous running equipments are based on longterm cooling load estimations. Building managers easily make optimal macrocontrol decisions, by prescheduling the configurations of equipments on a daily basis. Based on our testing results, machine learning and data analysis do not outperform humans on equipment scheduling, mainly due to the limited scheduling options. In Figure 1, for example, there are only three chillers available to the plant, thus generating only seven possible running chiller combinations.
Based on our observation above, we focus on the optimization over microcontrol strategies, by fine tuning of the controlling parameters, i.e., VSD of the equipments. In order to fully optimize microcontrol, we follow the technical roadmap as is shown in Figure 3. There are three major technical components in our approach. In data proprocessing, we attempt to enrich the data to overcome the difficulty of generalization. In data modeling, we decompose the chiller plants into modules and build data models over the modules in an independent manner. There are two types of modules, used to capture power and relationships of equipments respectively, for decomposition purpose. In realtime optimization, we deploy the power model on a controller computer directly connected to the chiller plant, which makes decisions on parameters on the fly. The technical details of these components are available in the following sections.
4. Data Techniques
4.1. Data Enrichment in Preprocessing
The lack of generality in the data is an important problem, which could be easily overlooked. Simple data modelling over the existing chiller plant data may result in useless model with high generalization error. In an extreme case, a chiller plant always runs at a fixed configuration, e.g., fixed VSD speed for pumps and fans. By training over data from such chiller plant, the resulting data model is only applicable to the current or similar configurations, and does not generate meaningful prediction for other varying configurations. Figure 4 plots the data distribution over the cooling tower speed and the total system power, collected in fully controlled chiller plant in 2016 with a fixed VSD configuration setting (denoted as original data) and random VSD configuration (denoted as rich data) respectively. In the original data, the cooling tower fan is mainly operated at the speed between to of the maximum speed. With fixed VSD configuration settings for all devices, the total system power and cooling tower speed clearly span on a linear subspace. The results show that data model using fixed VSD configuration does not have much generalization capability when other configurations are used by the chiller plant.
To tackle the problem, we randomly update each control parameter of the chiller plant so as to explore the full range of values. As shown in Figure 4, the rich data spans a much larger space, providing more accurate insights into the behavior of the equipment and offering better opportunity for modeling and optimization.
4.2. Overall Model over Chiller Plant
Figure 5 depicts the overall idea of chiller plant decomposition for data modelling. A chiller plant is decomposed into modules, each of which corresponds to a block in Figure 5. Each block also represents a modulewise data model with the incoming edges as the input variables and the outcoming edges as output/prediction variable. The connections among the modules, i.e., the prediction results fed from one module to others, are designed based on our understanding to the mechanism beneath chiller plant. Therefore, the overall model reflect our domain knowledge over the equipments.
Specifically, there are two types of modules in the overall model. Rectangle blocks denote energy consumption models over single equipments. Round cornered blocks denote relationship models predicting particular properties of the system that affect two types of equipments. For instance, Condenser Water Flow Model (CWFM) is used to predict the flow rate of the condenser water running through the condenser water pumps and chillers. Chiller is singled out in the figure, because it is the most complex equipment in chiller plant system. Particularly, the model of the chillers utilizes the outputs from others, including the flow rate of the condenser water and the chilled water, and the temperature of the condenser water flowing into the chillers. Notice that, as shown in Figure 2, chillers are affected by condenser water and chilled water flowing in and out. However, temperature of the condenser water flowing out of the chillers, and temperatures of the chilled water flowing in/out of the chillers are not modeled (Figure 5). This is because they are captured in the cooling load of the system, which is also an input to the chiller model. In this work, we assume the current cooling load of the chiller plant is known and changes slowly over time.
In the rest of the section, we abuse CWP, CHWP, CT, CH, CHFM, CWFM and CWTM to denote the seven modules (Figure 5) respectively. In next subsection, we discuss how to pick up right data models for individual modules.
4.3. Models over Modules
The modules in a chiller plant model, as is shown in Figure 5, are treated in two different ways, according to the number of inputs: single input single output (SISO) and multiple input single output (MISO). Chillers, for example, adopts MISO model. Generally, chiller is the core and most complex equipment in a chiller plant, whose performance is affected by all other modules in the plant. All SISO modules follow Affinity Laws (aff, 2017). Based on this observation, we apply different models to the modules to match the complexity of the modules. In Table 2, we summarize the models employed in our system over individual modules. The rest of the subsection elaborates the models in detail.
Module  Type  Model Type  Prediction Variable 

CWP  SISO  Polynomial  Power 
CHWP  SISO  Polynomial  Power 
CT  SISO  Polynomial  Power 
CH  MISO  MLP  Power 
CHFM  MISO  MLP  Water flow 
CWFM  MISO  MLP  Water flow 
CWTM  MISO  MLP  Water temperature 
Affinity Laws The Affinity Laws of centrifugal pumps or fans describes the relationship between power and shaft speed with impeller diameter held constant (aff, 2017):

Power is proportional to the cube of shaft speed:
The affinity laws ensures that it is sufficient to use polynomial regression to model power of water pumps and cooling tower fans in a chiller plant. There is no need to apply more complex models.
Polynomial Regression for SISO Modules According to affinity laws of pumps and fans, we apply polynomial regression to model the heads and power of water pumps and cooling tower fans, the equations of which can be given as follows:
(1) 
where,
:  predicted value of power  
:  control parameter  
:  regression coefficient,  
:  error term 
The input and output of SISO models CT, CWP and CHWP are summarized as follows:
Model  

CT  
CWP  
CHWP 
MLP for MIMO Modules
A multilayer perceptron (MLP) is a feedforward artificial neural network (ANN) that learns a function to map input features
to a target output
. Figure 6 shows a one hidden layer MLP. We apply MLP to model the relationships between VSD speed and water flow rate and temperature. The input and output of each module is defined as follows.Model  x  

CHFM  , on/off of CHWPs,  
CWFM  , on/off of CWPs,  
CWTM  , on/off of CTs, , 
CHFM is to predict the flow rate of chilled water flowing in/out the chillers. The input features are VSD speed of chilled water pumps and configurations of chilled water pumps, i.e., on/off of chilled water pumps.
CWFM is to predict the flow rate of condenser water flowing in/out the chillers. The input features are VSD speed of condenser water pumps and configurations of condenser water pumps, i.e., on/off of condenser water pumps.
CWTM is to predict the condenser water temperature fed into the chillers. The input features are weather (dry bulb temperature and relative humidity ), VSD speed of cooling tower fans and configurations of cooling towers. Notice that does not depends on condenser water pump speed.
CH Similarly, we apply MLP to model power of chillers. The output of CHFM, CWFM and CWTM are fed into CH models for chiller power prediction. The input features of a chiller model CH are chilled water flow rate , condenser water flow rate , temperature of condenser water fed into chillers , the cooling load of the system and the setpoint . The output of CH is chiller power .
5. Realtime Power Optimization
The ultimate goal of this work is to minimize the total power consumption of a chiller plant, so as to minimize the operating cost, while still meeting the demanding cooling load. With all the models of the chiller plant, the power optimization problem can be formulated as follows:
subject to  
where is the set of control parameters . is the predicted total power of the chiller plant. The constraints on controllable are to make sure that the cooling load of each chiller does not exceed its maximum cooling load, and the constraints on predicted water flow and temperature , , , which are factors of chillers, are to prevent chillers from unnecessary performance fluctuation^{2}^{2}2These constraints require domain knowledge, and are therefore usually provided by the manufacturer of chillers.. Given weather conditions and the current state (e.g., cooling load, configurations of equipments^{3}^{3}3The configurations of equipments are followed predefined schedules. ) of the chiller plant, the optimizer tries to find the optimal control parameters , , that minimize the total power while satisfy all constraints. A derivativefree optimization method, constrained optimization by linear approximation (COBYLA) (Powell, 2007), is used to solve the optimization problem. In the realtime optimization applied on chiller plants, control parameters are updated every 23 minutes.
Notice here meeting demanding cooling load is not explicitly modeled as constraints in the optimization problem. The reasons are twofold. First, the optimizer usually update the control parameters by a small amount in realtime that have little effect on the cooling load. Second, when a chiller plant is designed, designers will add (quite) some margin to the maximum cooling load obtained from energy audit. This leaves a large space for optimization. Therefore, it rarely fails to meet the cooling load during optimization, especially when the configurations of chillers are not changed.
6. Empirical Evaluations
6.1. Settings
Data We evaluate our proposed power prediction models on real world data collected from the chiller plant shown in Figure 1, which supports cooling load service for a multibuilding campus in Singapore. The data set consists of 12,520 samples of 15 days from January 24, 2017 to February 16, 2017. The sensor data are recorded in every minute. We divide the dataset into five folds, each consisting of data from three consecutive days, for fivefold cross validation.
During our data collection, data enrichment scheme (Sec 4.1) is employed to enhance the generality of the data. To enrich data with minimal effect on normal chilling service, random update of control parameters is applied only a few times a day, lasting for half an hour each time. When the chiller plant is under testing, or some equipment is under repair or replacement, the data enrichment scheme is run for longer period of time, e.g., half a day.
Because chiller plant is not a stable system, outliers commonly occur in the data set, especially on readings from the sensors. In data preprocessing, we apply random sample consensus (RANSAC)
(Strutz, 2010) to filter out the outliers.Metric The mean absolute percentage error (MAPE) is used to evaluate the performance of proposed prediction models. MAPE is formally defined as follows:
(2) 
where is the actual value and is the prediction outcome.
Baselines We use DDO
to denote our own datadriven optimization approach, which uses polynomial regression (PR) for SISO modules and MLP for MISO modules. A one hidden layer MLP with 3 hidden nodes and logistic activation function is deployed. We compare DDO mainly with long short term memory (LSTM)
(Hochreiter and Schmidhuber, 1997), which is the stateoftheart model for time series data prediction. LSTM is implemented in Keras 1.2.2 with Theano 0.8.2. The batch size is set as 128. We use MAPE as the loss function and run 2,000 epochs for each model training. For SISO modules and chillers, we use onelayer LSTM with 8 hidden state nodes. For other MISO modules, a threelayer LSTM is trained with the
// hidden layer having hidden state nodes that aretimes of the input size. The hyperparameter settings of LSTM are tuned for best performance.
Hardware & Software All our algorithms are implemented in Python, and applied to the chiller plant using an IPC (industrial PC) running on Intel Core i56599TE with 8GB of RAM, using Ubuntu 14.04 LTS. The communication with the chiller plant is through a Building Automation and Control networks (BACnet).
6.2. Evaluation of Data Enrichment Scheme
We evaluate the effect of data enrichment on power prediction accuracy. Two PR models are trained using the data with and without enrichment in Figure 4 for predicting the power of cooling tower. The results are shown in Figure 7. The model trained using the original data deviates significantly from the true data points in VSD ranges outside the original data, and therefore has a much higher MAPE () than that () of the model trained using rich data. This indicates data enrichment is extremely necessary before applying any chiller plant modeling.
6.3. Evaluation of ModuleWise Models
Model  PR (DDO)  MLP  LSTM 

CHWP1  1.67  1.59  5.32 
CHWP2  2.10  2.16  6.27 
CHWP3  2.33  2.41  4.64 
AVG_CHWP  2.03  2.05  5.41 
CWP1  1.48  1.43  2.16 
CWP2  1.33  1.16  2.92 
CWP3  1.02  1.02  1.79 
AVG_CWP  1.28  1.20  2.29 
CT1  1.49  0.77  1.86 
CT2  0.94  0.94  2.23 
CT3  0.77  0.77  1.59 
AVG_CT  1.07  0.83  1.89 
Evaluation of SISO Models The power prediction results of SISO models – PR, MLP and LSTM are summarized in Table 3. For each module, we report the MAPE of each equipment (e.g., CT1/2/3) and the average MAPE over all equipments (e.g., AVG_CT). As shown in Table 3, PR in DDO and MLP perform comparably on all modules. PR achieves the smallest average MAPE on CHWP while MLP performs slightly better than PR on CWP and CT. We notice that MLP gets easily trapped in local optimum in training and delivers results with large variance among different models. Considering the difference in model complexity and training complexity, it appears that PR is a better choice for online prediction and optimization. LSTM performs worst on SISO modules. This is due to the lack of temporal dependency in the data. Especially after data cleaning operation that removes the outliers, LSTM has problem to learn the right updating rules over nonconsecutive records of the readings. Overall, by applying simple polynomial regression, we are able to achieve small average MAPEs on SISO modules.
MLP (DDO)  PR  LSTM  

CHFM  1.59  1.74  4.11 
CWFM  1.27  1.28  2.80 
CWTM  0.92  1.33  6.46 
CH1  1.82  3.26  4.22 
CH2  2.09  2.25  2.98 
CH3  2.23  2.70  3.45 
AVG_CH  2.05  2.73  3.55 
Evaluation of MISO Models The prediction results of MISO models are summarized in Table 4. For chilled water flow rate, condenser water flow rate, and condenser water temperature prediction, we report the MAPEs of MLP, PR and LSTM. The output of CHFM, CWFM and CWTM are fed into CH models for chiller power prediction. For chillers, we report the MAPE of each chiller and the average MAPE of each model. On all MISO modules, MLP (DDO) performs the best and achieves much smaller MAPEs than those of PR and LSTM. LSTM performs poorly on water flow rate and temperature prediction, which further affects its performance in chiller power prediction. On the contrary, with low MAPEs in water flow rate and temperature prediction, the error propagation of MLP from CHFM, CWFM, CWTM models to the chiller model is minimized. Our method DDO using MLP is able to capture the dynamics of chiller power quite well, achieving the smallest average MAPE at around 2%.
6.4. Evaluation of Total Power Prediction
DDO  PR  MLP  LSTM (blackbox)  

MAPE  1.86  2.24  1.81  2.25 
Combining all modulewise models, we are able to predict the total power of the chiller plant. The result is reported in Table 5. Because of the poor performance of LSTM on modulewise models, we separately train a blackbox LSTM model to better predict the total system power, directly over all original input variables. The input includes the VSD speeds of pumps and fans, configurations of equipments, weather, system cooling load and the setpoint . The MAPE of the blackbox LSTM is presented in Table 5. Among all the models, DDO and MLP perform the best with MLP achieves slightly smaller MAPE. However, the saving in training and optimization time by using PR to model SISO modules in DDO makes up the difference.
6.5. Evaluation of RealTime Optimization
We started to apply datadriven realtime optimization on a real world chiller plant in July 2016, and continually improved the optimization technique from August 2016. The optimization results are shown in Figure 8 and Figure 9. One of the challenges of performance evaluation is to appropriately normalize the results, in order to fairly compare the energy behavior of different strategies adopted at different time intervals. In our testing, we collect a group of historical data under operation without optimization in June 2016, indicated by ML baseline data in Figure 8. A power consumption prediction model is built over the ML baseline data, using weather and system cooling load as input features. It is then applied on new records to estimate the energy consumption since August (when the chiller plant is controlled by our DDO approach). We then calculate the difference of estimation and actual electricity consumption in percentage and plot them in figure 8. Obviously, from August 2016, there is a significant drop of energy cost over the estimation on operations without optimization. The average energy saving is between 5% to 10%. The energy efficiency is also consistently improved, based on the results in Figure 9, by using the same estimation approach. The saving is so significant that the chiller plant has saved thousands of dollars by using our DDO approach. Note that there are outlier numbers appearing in January 2017, which are due to equipment replacement.
7. Conclusion
In this paper, we present our datadriven optimization techniques and report our empirical evaluations of our techniques on realworld chiller plants. Different from existing machine learning approaches, we design the framework and choose the data models based on our domain knowledges. We show that complex machine learning models, such as popular Recurrent Neural Networks, may not be an optimal solution for highly dynamic and complex mechanical systems. Instead, simple models may better capture the actual mechanism within the equipments used in chiller plants. Moreover, active data enrichment is an effective solution to the generalization problem haunting existing approaches with data analysis only. The combination of these new but simple techniques enables our system to accurately capture the running status and dynamically optimize the chiller plant in real time, achieving significant power saving for energy hungry chiller plants.
In the future, we will explore on the following research directions. First, we will attempt to collect more data from sensors in smart buildings, including video feeds from surveillance cameras and audio data from microphones. Such data is helpful to the system to better capture the ongoing activities in the buildings, and finally facilitating better cooling load prediction. Second, we will look into the transfer learning techniques, in order to combine data from multiple chiller plants for more accurate diagnosis analysis. Such methods will be extremely useful, especially for rare fault problems only occurring to each chiller plant a few times in history.
Acknowledgements.
This research is funded by the Republic of Singapore’s National Research Foundation (NRF) through Building and Construction Authority (BCA)’s Green Buildings Innovation Cluster (GBIC) R&D Grant, BCA RID 94.17.2.8 (Application No : NRF2015ENCGBICRD001065).References
 (1)
 Dee (2016) 2016. DeepMind AI Reduces Google Data Centre Cooling Bill by 40%. https://deepmind.com/blog/deepmindaireducesgoogledatacentrecoolingbill40/. (2016). Accessed: 20170125.
 SES (2016) 2016. Singapore Energy Statistics 2016. https://www.ema.gov.sg/Singapore_Energy_Statistics.aspx. (2016). Accessed: 20170206.
 aff (2017) 2017. Affinity Laws. (2017). https://en.wikipedia.org/wiki/Affinity_laws
 BenNakhi and Mahmoud (2004) Abdullatif E BenNakhi and Mohamed A Mahmoud. 2004. Cooling load prediction for buildings using general regression neural networks. Energy Conversion and Management 45, 13 (2004), 2127–2141.

Chow
et al. (2002)
TT Chow, GQ Zhang,
Z Lin, and CL Song.
2002.
Global optimization of absorption chiller system by genetic algorithm and neural network.
Energy and buildings 34, 1 (2002), 103–109.  Doukas et al. (2007) Haris Doukas, Konstantinos D Patlitzianas, Konstantinos Iatropoulos, and John Psarras. 2007. Intelligent building energy management system using rule sets. Building and environment 42, 10 (2007), 3562–3569.
 Graves et al. (2009) Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber. 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence 31, 5 (2009), 855–868.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Iwafune et al. (2014) Yumiko Iwafune, Yoshie Yagita, Takashi Ikegami, and Kazuhiko Ogimoto. 2014. Shortterm forecasting of residential building load for distributed energy management. In IEEE International Energy Conference (ENERGYCON). 1197–1204.
 Jelali (2006) Mohieddine Jelali. 2006. An overview of control performance assessment technology and industrial applications. Control engineering practice 14, 5 (2006), 441–466.
 Jung et al. (2017) Deokwoo Jung, Zhenjie Zhang, and Marianne Winslett. 2017. Vibration Analysis for IoT Enabled Predictive Maitenance. In ICDE.

Li
et al. (2009)
Qiong Li, Qinglin Meng,
Jiejin Cai, Hiroshi Yoshino, and
Akashi Mochida. 2009.
Applying support vector machine to predict hourly cooling load in the building.
Applied Energy 86, 10 (2009), 2249–2256.  Li and Wen (2014) Xiwang Li and Jin Wen. 2014. Building energy consumption online forecasting using physics based system identification. Energy and Buildings 82 (2014), 1–12.
 Liang et al. (2016) Victor C. Liang, Richard T. B. Ma, Wee Siong Ng, Li Wang, Marianne Winslett, Huayu Wu, Shanshan Ying, and Zhenjie Zhang. 2016. Mercury: Metro density prediction with recurrent neural network on streaming CDR data. In ICDE. 1374–1377.
 Manic et al. (2016) Milos Manic, Dumidu Wijayasekara, Kasun Amarasinghe, and Juan J RodriguezAndina. 2016. Building energy management systems: The age of intelligent and adaptive buildings. IEEE Industrial Electronics Magazine 10, 1 (2016), 25–39.
 Momtazpour et al. (2015) Marjan Momtazpour, Jinghe Zhang, Saifur Rahman, Ratnesh Sharma, and Naren Ramakrishnan. 2015. Analyzing invariants in cyberphysical systems using latent factor regression. In SIGKDD. 2009–2018.
 Mu et al. (2016) Baojie Mu, Yaoyu Li, Timothy I Salsbury, and John M House. 2016. Optimization and sequencing of chilledwater plant based on extremum seeking control. In American Control Conference (ACC). 2373–2378.
 Powell (2007) Michael JD Powell. 2007. A view of algorithms for optimization without derivatives. (2007).
 Resources (2010) Energy Design Resources. 2010. Chiller Plant Efficiency Design Brief. (2010). https://energydesignresources.com/media/1681/edr_designbriefs_chillerplant.pdf
 Sak et al. (2014) Hasim Sak, Andrew W Senior, and Françoise Beaufays. 2014. Long shortterm memory recurrent neural network architectures for large scale acoustic modeling.. In Interspeech. 338–342.
 Salsbury and Alcala (2015) Timothy I Salsbury and Carlos F Alcala. 2015. Two new normalized EWMAbased indices for control loop performance assessment. In American Control Conference (ACC). 962–967.
 Strutz (2010) Tilo Strutz. 2010. Data fitting and uncertainty: A practical introduction to weighted least squares and beyond. Vieweg and Teubner.
 Tyagi et al. (2006) Vipin Tyagi, Harshad Sane, and Swaroop Darbha. 2006. An extremum seeking algorithm for determining the set point temperature for condensed water in a cooling tower. In American Control Conference (ACC).
 Wang and Ma (2008) Shengwei Wang and Zhenjun Ma. 2008. Supervisory and optimal control of building HVAC systems: A review. HVAC&R Research 14, 1 (2008), 3–32.
 Xu et al. (2005) Jun Xu, Peter B Luh, William E Blankson, Ron Jerdonek, and Khalil Shaikh. 2005. An optimizationbased approach for facility energy management with uncertainties. HVAC&R Research 11, 2 (2005), 215–237.
 Yang et al. (2005) Jin Yang, Hugues Rivard, and Radu Zmeureanu. 2005. Online building energy prediction using adaptive artificial neural networks. Energy and buildings 37, 12 (2005), 1250–1259.
 Zhao and Magoulès (2012) Haixiang Zhao and Frédéric Magoulès. 2012. A review on the prediction of building energy consumption. Renewable and Sustainable Energy Reviews 16, 6 (2012), 3586–3592.