1 Introduction
Commercial buildings have significant impacts on humans and the environment. They consume more than 19% of the total energy consumption in the US. Heating, Ventilation and Air Conditioning (HVAC) systems account for 27% of energy consumption and 45% of peak electrical demand in commercial buildings and represent a substantial energy use reduction opportunity book2010us . The main purpose of HVAC systems in commercial buildings is to provide satisfactory thermal conditions and indoor air quality for occupants working inside these buildings. Providing satisfactory thermal conditions in indoor environments is critical since it directly affects occupants’ comfort wagner2007thermal ; frontczak2012quantitative , health fisk1997estimates , and productivity leaman1999productivity ; wargocki1999perceived ; wyon2004effects ; tham2010room and has a significant effect on energy performance of buildings korkas2016occupancy ; yang2014thermal ; jazizadeh2018personalized ; liu2018model .
To predict thermal comfort in indoor environments, researchers have developed empirical models with data collected from climate chamber experiments and field studies, using large populations wang2018individual . These models have succeeded in predicting average comfortrelated responses; consequently, they were adopted in standards for operation of heating, ventilation, and airconditioning (HVAC) systems in office buildings en200715251 . However, numerous field studies have shown that individual occupants prefer different thermal conditions and as such general thermal comfort models cannot accurately predict individual thermal preferences ghahramani2015online ; jazizadeh2013human ; feldmeier2010personalized ; lee2017bayesian . These approaches fail to capture that thermal preferences vary from person to person and may change over time ghahramani2015online . As a result, typical HVAC systems operation cannot achieve high levels of satisfaction for all occupants. Moreover, because of the conservative control settings based on general comfort standards, HVAC energy use in office buildings remains significant despite the higher equipment efficiency and the use of improved controls. Also, to implement general thermal comfort model, such as the mostwidely used PMV model fanger1970thermal , one needs very specific input features (e.g., metabolic rate, clothing insulation, air speed) that are difficult to obtain in a realworld setting and, therefore, they are often assumed or simplified erickson2012thermovote . Finally, the parameters of the PMV model (e.g., functions, coefficients) are trained using the original data set (i.e., Fanger’s laboratory (chamber based) experimental data), and need to be updated to reflect the actual personalized thermal preferences of occupants in other settings.
Researchers have acknowledged these limitations and suggested solutions that incorporate building occupants in sensing and control frameworks and tune these systems based on personalized preferences to achieve optimal conditioning for improved occupant satisfaction and energy efficiency gao2013optimal ; gao2013spot ; jazizadeh2014user ; liu2007neural ; daum2011personalized ; ghahramani2016infrared . Results of initial studies have shown that tuning HVAC systems’ operation based on occupants’ feedback has the potential to increase their satisfaction and reduce energy consumption. For example, Ghahramani et al. ghahramani2016energy showed that selecting the daily optimal setpoint temperature in the range of , and resulted in average savings of 7.5%, 12.7% and 16.4%, respectively, compared to the baseline setpoint temperature of . Murakami et al. murakami2007field developed an automatic control logic which maintained a balance between occupants’ satisfaction and energy consumption and resulted in 20% energy savings without increasing the percentage of occupant dissatisfaction.
1.1 Current approaches for learning occupants’ personalized thermal preferences
Thermal preferences are seen to vary from person to person and may change over time due to seasonal variations or acclimation ghahramani2015online . Furthermore, what suits one group or a type of occupant may be uncomfortable for others wang2018individual ; nakano2002differences . A wide array of field studies have shown individual differences in thermal comfort due to different age, weight, gender, historical thermal experiences etc. karjalainen2012thermal ; indraganti2010effect ; indraganti2015thermal . For example, Fountain et al. fountain1996expectations argued that differences in individual occupant’s response to comfort queries (on a 7point thermal sensation scale) were frequently greater than 1 scale unit. In another study, Humphreys and Nicol humphreys2002validity
analyzed ASHRAE database of fieldstudies and found that thermal comfort responses given by individual occupants have a standard deviation of around 1 scale unit (on the 7point thermal sensation scale). Oneunit difference in ASHRAE’s 7point scale corresponds to approximately
difference in indoor air temperature values, further indicating that the individual differences between preferred indoor air temperature for occupants may be as large as .Due to these individual differences in thermal comfort, it is often very difficult to provide a thermal environment which is satisfactory for all of the occupants. Personalized thermal comfort modeling approaches aim to address this challenge by building predictive models for inferring individual occupant’s thermal satisfaction profile, instead of the average profile of a large population kim2018personal .The opportunities arising from development of personal comfort models has generated a lot of interest within the academic and industry communities. With the rapid development in sensing technology and increase in computational power, researchers are exploring new approaches to better predict individual occupant’s thermal comfort in office buildings kim2018pp
. Many studies have focussed on development of personalized models which estimate the model parameters to describe individual occupant’s thermal comfort status (expressed as preference, satisfaction, or sensation) based on data collected from the field. These models inferred personalized thermal comfort of occupants by establishing correlations between environmental measurements and occupants’ response to comfort queries (obtained via surveys). They employed different forms of surrogate models such as neural networks
liu2007neural rana2013feasibility ; jiang2016modelling , Gaussian processes cheung2017longitudinal ; guenther2018feature , fuzzy rules jazizadeh2014user, and Bayesian networks
lee2017bayesian ; lee2019inference to approximate individual occupant’s thermal preferences, resulting in improved data representation and predictive accuracy. The results obtained using these models show significant improvement in performance (2030% gain in terms of predictive accuracy) compared to conventional comfort models (PMV) kim2018pp . For example, Ghahramani et al. ghahramani2015online introduced an online Bayesian networkbased learning algorithm for modeling personalized thermal preferences. They tested their framework on 33 different subjects by collecting their responses to comfort queries over a course of approximately 3 months. The results showed an average accuracy of 70.14% (as compared to 56.1% accuracy obtained by PMV model on the same dataset). In daum2011personalized, Daum et al. proposed a learning method based on multinomial logistic regression and showed that personalized thermal satisfaction profiles were needed in order to predict occupant’s thermal sensation properly. In this study, authors collected data from 28 different occupants during a period of three years. They found that the individual satisfaction profiles evolved as new information (data) was collected and used, and that congruency of the proposed model increased from 0.66 to 0.87 after 90 votes were added. They concluded that, for obtaining converged satisfaction profiles, more than 90 (indoor air temperature based) data points were needed. In another study, Feldmeier and Paradiso
feldmeier2010personalized developed an occupantcentric personalized HVAC control system which inferred individual occupant’s preferences using linear discriminant analysis. They conducted this study from May through August of 2009.As seen from all these works, without sufficient survey data, the proposed personal comfort models would largely fail to infer individual occupantspecific thermal preference needs. However, securing data collection through surveys is difficult due to the cost associated with conducting the experiments, potential fatigue and eventual decay in participation kim2018personal ; kim2018pp . Having occupants responding to surveys for a long time is certainly not a practical solution that can be implemented in real buildings. A potential solution to address these problems is to deploy novel active learning methods, that are able to lean fast and efficiently using a small amount of data. Therefore, in this paper, an alternative robust online strategy/framework is proposed to actively collect the “most informative” data in order to describe individual occupants’ thermal preferences and infer their maximally preferred indoor air temperatures.
1.2 Research contributions
When it comes to development of smart thermostat in offices, our overarching goal is to develop a recommender system that makes temperature setpoint recommendations with the aim of maximizing some occupant’s satisfaction in the light of inherent uncertainty over occupants and indoor environment. Traditionally, big companies like Amazon, Netflix, Google, Facebook etc. have had deployed recommender systems to assist users when they search in large information spaces such as collection of products (books, movies, music), documents (news articles, Wikipedia articles) etc smith2017two ; gomez2016netflix ; ricci2015recommender . These recommender systems assist in saving a user’s efforts in searching for products by interacting with users in order to reduce uncertainty in their preferences. Preference elicitation (PE) is a crucial component of recommender systems that aims to make optimal recommendations to the users by actively asking them questions about their preferences guo2010gaussian ; guo2010real ; guo2011bayesian . An important requirement for PE framework is that they should be able to make optimal or near optimal recommendations based only on a small number of survey questions.
The objective of this paper is to present, for the first time (in the context of smart building applications), a PE framework which sequentially poses intelligent queries to a new occupant in order to optimally learn the indoor air temperature values which maximize his/her satisfaction. Our central hypothesis is that an occupant’s preference relation over indoor air temperatures can be described using a scalar function of these temperatures, which we call the “occupant’s thermal utility function.” The mathematical characteristics of our PE framework are that: (1) It maintains a flexible representation of an occupant’s thermal utility function using Gaussian processes (GP); (2) By adhering to a fully Bayesian paradigm, it handles uncertainty in a principled manner; (3) It sequentially selects queries that yield the maximum expected improvement in utility; and (4) It incorporates prior knowledge from different sources guo2011bayesian , e.g., that there exists a unique maximally preferred indoor air temperature. From a thermal comfort perspective, our framework comes under the paradigm of design of experiments for training personalized thermal preference models kim2018personal ; kim2018pp . The key practical characteristics of our PE framework are that: (1) It takes an individual occupant as the unit of analysis rather than population or groups of people; (2) It uses direct feedback from occupants to train and suggests new temperature to query next; (3) It prioritizes costeffective and easilyobtainable data (we only use indoor air temperature as a predictor which affects occupant’s satisfaction); (4) It has the capacity to adapt as new data is introduced into the modeling framework. A detailed comparison of our approach with respect to existing personalized thermal comfort models is shown in the Table 1. This PE framework is an important step towards the development of intelligent HVAC systems which would be able to respond to individual occupants’ personalized thermal comfort needs. In order to encourage the use of our PE framework and ensure the reproducibility of our results, we publish an implementation of our work named GPPrefElicit^{2}^{2}2https://github.com/nawalgao/GPPrefElicit in the Python language .
To summarize, this paper starts with an introduction of the PE framework used for inferring the maximally preferred indoor air temperature values in Section 2. In Section 3, the performance of this framework is evaluated (by showing how this framework would work in case of synthetic occupants). In Section 4, the results of an experimental study (using our newly developed PE framework) for inferring maximally preferred indoor air temperatures is presented. We then discuss some of the limitations of this PE framework in Section 5. Section 6 summarizes our findings and concludes the paper.
Model  Data  Model inputs  Model output  Methodology  Model evaluation  
Occupant’s response  Measurements^{a}  
cheung2017longitudinal 


Ta, RH, CO 




lee2017bayesian 


Ta, RH, MRT, Va 




ghahramani2015online 


Ta 




jazizadeh2013human 


Ta 

Fuzzy rules 


rana2013feasibility 


Ta, RH  ASHRAE 7point thermal sensation scale 

Overall accuracy using campus dataset = 80%  
daum2011personalized 

ASHRAE 7point thermal sensation scale  Ta 

Logistic regression 





Ta 




Note: Ta = indoor air temperature, RH = relative humidity, MRT = metabolic rate, Va = air velocity, CO = carbon dioxide level
2 Methodology
2.1 Problem formulation
We assume that each occupant has a welldefined preference relation over the set of all possible indoor thermal environments. If this preference relation satisfies a set of reasonable preference axioms morgenstern1953theory , then we can encode it as a scalar function of the thermal environment taking values in a set . That is, for a given occupant there exists a function such that the statement “the occupant prefers the thermal environment to the thermal environment ” is equivalent to the mathematical statement furnkranz2010preference . For example, since we associate the thermal environment of a room with its air temperature in degrees C, is a real function of indoor air temperature and a preference of over is equivalent to . We call , the thermal utility function of the occupant. The maximally preferred indoor air temperature is then given as:
(1) 
where the maximization takes place over the set of all possible indoor air temperatures .
The problem presented in Equation 1 is hard because information about is only indirectly available to us gonzalez2017preferential . One way around this limitation is to ask the occupant to compare between two different thermal environments of the room sui2018advancements ; busa2018preference ; houlsby2012collaborative ; dewancker2018sequential . Indeed, xiong2018inferring and awalgaonkar2018design followed exactly this approach to infer visual preferences of occupants in private office spaces. However, our pilot studies revealed that this approach is not suitable for eliciting thermal preferences, reason being that, as the time required to change the steady state indoor air temperature is of the order of half an hour (in our experimental setup), it becomes very likely that the occupant forgets the initial temperature and pairwise comparisons between two consecutive thermal environments of the room is not viable in such a scenario. For this reason, we have resorted to instantaneous thermal preference queries of the form “prefer warmer,” “prefer cooler,” “satisfied ” which we interpret as statements about the derivative of the utility function, i.e., they mean that is “increasing,” “decreasing,” “constant” respectively. An addendum is that such queries have low cognitive load chajewska2000making .
2.2 Outline of Preference Elicitation framework
The proposed PE framework operates as follows:

An occupant working in an office space is exposed to indoor air temperature in . After 30 minutes, we ask the occupant about his/her thermal preference in the form of a query, “How satisfied are you with current thermal conditions?”

If the occupant responds “I’m satisfied with current condition”, we record discrete variable response as . If the occupant responds “I prefer warmer” we record variable to be equal to . Lastly, if the occupant responds “I prefer cooler” we record variable to be equal to .

Using all data collected so far, we update our beliefs about the occupant’s utility function , see Sec. 2.3.

We identify the indoor air temperature in that maximizes the expected improvement in the occupant’s utility, see Sec. 2.4.

We implement the thermal condition and we GOTO 1.
2.3 Learning an occupant’s thermal utility function
Assume that we have observed number of thermal preference queries so far, which are given as:
(2) 
where are the observed indoor air temperature and is the occupant’s response to the queries. In this section, we discuss how to make use of thermal preference data to update our beliefs about the utility function . For notational simplicity, we skip the subscript and we write instead of , instead of , and instead of . We follow a Bayesian approach where we put a suitable prior on the space of ’s, condition it on
and sample from the resulting posterior using a variant of Markov chain Monte Carlo. The technical details are outlined below.
From previous research, we know that an occupant’s thermal utility function is smooth, it has a unique maximum, and that “extreme” indoor air temperatures are rarely preferred by individual occupants lee2017bayesian ; gunay2018modelling ; gunay2018development . Mathematically, we assume that is infinitely differentiable, and that it is unimodal. The latter means that there exists a temperature in such that is monotonically increasing for temperatures smaller than and monotonically decreasing for temperatures greater than . Of course, this implies that the derivative of is positive for temperatures less than and negative for temperatures greater than . We present examples of such utility functions in Sec. 3. To encode this prior information, we construct a hierarchical Bayesian model for unimodal functions based on Gaussian processes andersen2017bayesian . We encode the prior belief about the unimodality of the utility function via virtual derivative observations on that would satisfy the above defined inequality constraints riihimaki2010gaussian .
2.3.1 Encoding our beliefs about the smoothness of the thermal utility
Our model assigns a zeromean GP prior on ’s, i.e.,
(3) 
where is a positive definite function known as the covariance function, and are its parameters. The smoothness of function samples from a GP prior is directly connected to the smoothness of the covariance function rasmussen2006gaussian ; banerjee2003smoothness . Therefore, to enforce that sampled ’s are infinitely differentiable, we use the squared exponential covariance function:
(4) 
where
: the signal variance
and lengthscale. Following a fully Bayesian modeling approach, we need to place priors over these hyperparameters
such that they reflect our beliefs about them i.e., what do we believe about the length of the ‘wiggles’ in utility function? How sensitive is the utility function to change in any of the input features? How much farther away is the true utility function from its mean function values? Since we are quite uncertain about answers to these questions, we reflect this uncertainty by assigning uninformative Gamma prior distributions over signal variance and lengthscale . We assume that and are a priori independent, i.e., and we choose:(5) 
where
is probability density of a Gamma r.v. (random variable) with shape parameter
and scale parameter thom1958note . We set hyperparameters , , and reflecting our uncertain beliefs.A GP defines a probability measure on the space of utility functions such that any finite collection of function values follows a multivariate Gaussian distribution
rasmussen2006gaussian . For example, by assigning a GP prior on the space of ’s, the joint probability density over the utility function values at all the observed indoor air temperature , denoted by vector , is the multivariate zeromean Gaussian , where the covariance matrix , by definition, has elements:(6) 
where is the covariance operator conditional on . In B.1 and B.2, we show how GP would work in a conventional regression problem (see Fig. 12(a) and Fig. 14(a)).
Furthermore, the derivative of is also a GP since differentiation is a linear operator rasmussen2006gaussian . This makes it possible to include derivative observations in the GP model, or to compute predictions related to it wu2017bayesian ; wu2017exploiting ; eriksson2018scaling . More importantly,
, seen as a twooutput function of indoor air temperature, is a twooutput Gaussian process, i.e., all finite dimensional joint distributions of
and values are multivariate Gaussian. The covariance function of the process is given as follows (see riihimaki2010gaussian ):(7)  
(8) 
For example, with these definitions, the joint distribution of function values and function derivatives is the multivariate zeromean Gaussian:
(9) 
where is the matrix defined by:
(10) 
with as in Eq. (6), computed using Eq. (7) and computed using Eq. (8). This result generalizes to any collection of utility and utility derivatives values.
2.3.2 Encoding our beliefs about the unimodality of the utility
Assume that is a unimodal function of indoor air temperature and let in be the unique maximally preferred indoor air temperature. Then, since is continuously differentiable, it holds that for and for . In other words, as we move through the interval , the utility derivative must flip signs from positive to negative only once. It is not trivial to model this constraint directly since, apart from the single sign flipping, is arbitrary. In B.1, we show how a unimodal Gaussian process model (UGP) andersen2017bayesian would work in a conventional regression setting. The idea is twofold. First, we introduce an easytomodel latent GP that satisfies the same sign flips as , and second we force the signs of and to agree. We discuss these two points sequentially.
Let be a zeromean latent GP with a squared exponential covariance function, hyperparameters , and hyperparameter prior . We assign the same uninformative Gamma prior distribution over as in Eq.(5). We want to force to flip signs at most once in . The easiest way to encode this information is using a monotonicity constraint (see B.1). In particular, it suffices to force to be monotonically decreasing, which is equivalent to forcing the derivative to be negative throughout . A common approach is to enforce the monotonicity constraint using a set of virtual derivative observations. Basically, instead of using “ is negative throughout ”, we use the weaker, but more practical, constraint “ is negative on a finite set of indoor air temperatures in .” The set should cover as much of as possible. In this work, we take . Let and be the dimensional vectors of values and derivatives of at the set points . The conditional prior joint distribution of and is:
(11) 
where the first part of the right hand side is the equivalent of Eq. (9). The second part of the right hand side of Eq. (11) puts higher probability to negative . Here,
is the cumulative distribution function of the standard normal distribution, also called probit link function and is given as:
. The positive parameter controls how fast the probit transitions from zero to one. We take , which practically turns into a step function.Second, we force the signs of and to agree. Similarly to the previous paragraph, we weaken the constraint “the signs of and agree throughout to the more practical “the signs of and agree on a finite set of points .” To achieve this, we simply introduce the set of virtual observations . We connect to the sign of through the generative model:
(12) 
where is the probability mass density of a Bernoulli r.v. with probability of success , and is a positive parameter. The parameter controls the strength of the connection between the and the sign of . In the limit of , the two match perfectly. In the limit of , there is no connection between the two, i.e., is completely random. For any finite, nonzero the strength of the connection between the signs depends on the magnitude of . The two signs match well when the absolute value of is large. As expected, the largest uncertainty about the sign is at the point where crosses zero, i.e., at the maximally preferred temperature. Note that can be absorbed by the signal variance of . Thus, without loss of generality, we take .
Finally, we condition the original GP to satisfy the virtual observations . This yields a prior model for that satisfies the unimodality constraint. Unfortunately, because of the nonlinear nature of the sign operator, the resulting process is not a GP. Nevertheless, the joint prior of the utility and utility derivative values at arbitrary points is a modified multivariate Gaussian of the form. This is enough for characterizing the posterior through sampling. In particular, the joint prior of , , and is
(13) 
The first part of the right hand side is a trivial extension of Eq. (9). The second part of the right hand side of Eq. (13) acts as an indicator function. It give higher probability when the signs of and match and lower when they do not. We use .
2.3.3 The likelihood of the data
The response to a preference query gives us information about the sign of derivative of utility function at the indoor air temperature where we ask these queries. Essentially, the observed data are noisy observations of the sign of derivatives . The likelihood function is a model of the measurement process and it establishes the connection between and . We define:
(14) 
where is a positive parameter controlling the level of the noise. The proposed likehood encodes the following intuitive characteristics. First, the likelihood is high when and have the same sign. Second, the likelihood is low when and have opposite signs. Third, as gets close to zero, any answer is equally probable. Without loss of generality, we take a the signal variance of can absorb it.
2.3.4 The posterior state of knowledge
The posterior probability density of all model parameters is given by Bayes rule:
(15) 
where all the terms of the right hand side have been already introduced in the previous sections. Note, that it is possible to analytically marginalize over the virtual observations getting:
(16) 
To get the marginalized posterior of any variable, one just needs to integrate out all other variables. In particular, the joint posterior probability density of the utility values at the virtual indoor air temperature values and the hyperparameters is given by:
(17) 
2.3.5 Making predictions at unobserved indoor air temperatures
Now that we have quantified our posterior beliefs regarding the utility function values at virtual indoor air temperatures (see Eq.(17)), we shift our attention to prediction of utility function at an arbitrary indoor air temperature value in
. This posterior predictive distribution over utility function values
is:(18) 
where, going from the first to the second line, we used Bayes rule. The first term inside the integral of the second line of Eq. (18) is the posterior probability density of conditioned on the observations , the virtual observations , and the hyperparameters , while the second term is given in Eq. (17). Since the virtual points cover the input space densely, the information contained in overshadows the information in . In other words, to a good approximation, the effect of the observations can be dropped. Mathematically, when we know that and are jointly distributed as specified in Eq. (17), then
and, thus, Eq. (18) becomes:
(19) 
Finally, is analytically available:
(20) 
where the mean is:
(21) 
and the variance is:
(22) 
All the quantities in Eqs. (21) and (22) involve the prior covariance of with hyperparameters . In particular, the matrix is the crosscovariance matrix between and , the matrix is the covariance matrix of , and the scalar is the a priori variance of , i.e., .
2.3.6 Characterizing our state of knowledge about the maximally preferred indoor air temperature
In this section we summarize our state of knowledge about the maximally preferred indoor air temperature values. To this end, we define the random variable representing the location of the maximally preferred indoor air temperature:
(23) 
The uncertainty in is, of course, induced by our uncertainty about the utility . Our state of knowledge is neatly captured by the probability density which is formally given by
(24) 
where is the conditional expectation over the stochastic process conditioned on . In what follows, we derive an efficient approximation of this probability density so that it can be characterized by sampling.
Instead of working directly with the random variable , we define a discretized version of it that considers only the utility values at the virtual temperatures:
(25) 
This suffices, because as, as the number of virtual temperatures , converges to
almost surely. Subsequently, instead of probability density function
, we characterize the probability mass function defined by:(26) 
We approximate the second line of Eq. (26) by building a histogram from samples from the marginalized posterior .
Eq. (26
) can be used to decide when to stop the preference elicitation process. In this work, we monitor the 95% credible interval, i.e., the range of indoor air temperature
for which(27) 
and one of the criterion for stopping the elicitation process is when becomes smaller than a specific threshold. For the purpose of this paper, we have set this temperature threshold to be .
2.3.7 Sampling from the posterior and posterior predictive distributions
The posterior probability density is analytically intractable and it can only be characterized via sampling. We use Hamiltonian Monte Carlo (HMC) duane1987hybrid to sample from Eq. (16). We implemented the model using the GPflow 0.4.0 package matthews2017gpflow
, a GP library that uses TensorFlow for its core computations and Python for its front end. After analyzing the traces, autocorrelation, and the effective sample size (ESS)
gilks1995markov of all quantities, we decided to burn the first 5,000 samples, and gather a total of samples by keeping one sample out of every 3 samples gelman2013bayesian . In what follows, we denote HMC samples from the joint posterior conditioned on by for . We use these samples to carry out any integration over the posterior using sampling averages.2.4 Selecting the next thermal preference query
In this paper, we develop a PE framework that uses the previously seen thermal preference data to select the next query to pose to the occupant by maximizing the expected improvement in utility. Symbolically, we select the next indoor air temperature to query by solving
(28) 
where denotes the set of all the available indoor air temperature values reachable from the current indoor air temperature , denotes the expected utility improvement of a hypothetical query at given the data (to be defined below). In practice, the set of reachable indoor air temperatures is finite and, thus, the optimization problem of Eq. (28) can be solved by a simple search algorithm. In our studies, is a grid of separated indoor air temperatures reaching below and above the current indoor air of the room . See Subsection 3.2 for examples.
We define EUI by generalizing the expected improvement (EI) acquisition function (for a detailed discussion of EI see the Bayesian global optimization literature snoek2012practical ). Traditionally, EI is defined to be the expected marginal gain with respect to the current best observed quantity of interest. However, the traditional definition of EI is not directly applicable because the quantity of interest, the utility function, is not directly observed in our problem.
Therefore, in order to tackle this problem, we develop a contextspecific EI that exploits the semianalytical tractability of the approximation to the pointpredictive posterior we derived in Eq. (19). We start by conditioning on the virtual observations and the hyperparameters , and we proceed to integrate them out using samples from their posterior, see Eq. (18) and Sec. 2.3.7. Our current best (conditional) estimate of the utility over the observed indoor air temperatures is given by:
(29) 
where is the conditional expectation operator which we evaluated using Eq. (20). Thus, the (conditional) improvement in utility of a hypothetical observation in is
(30) 
The expected (conditional) improvement in utility is then:
(31) 
where
(32) 
and is the probability density function of the standard normal. Finally, we integrate over the joint posterior of and to get:
(33) 
where in the last step we used a sampling average approximation of the integral, see Sec. 2.3.7.
We present a concise algorithmic representation of the proposed PE scheme in Algorithm 1.
3 Validation of the methodology
3.1 Synthetic occupant data generation
In order to validate that our PE framework can indeed discover the maximally preferred indoor air temperature value, we first run our PE framework using three synthetic occupants’ preference data. For generating a thermal preference dataset of synthetic occupants,we take the following approach:

We assume that we have synthetic occupants working in private offices whose utilities for different indoor air temperatures are as shown in the Fig. 1. We want to learn the indoor air temperatures which would provide maximum thermal satisfaction to these occupants. For synthetic occupant 1, the maximally preferred indoor air temperature is ; for synthetic occupant 2, the maximally preferred indoor air temperature is and for synthetic occupant 3, the maximally preferred indoor air temperature is .

We assume that we can only query thermal preference responses (“How satisfied are you with current thermal condition?”) from these occupants.

The outcome (response of the synthetic occupant) for a given indoor air temperature is generated as described in subsubsection 2.3.3, i.e., the outcome is drawn from the distribution , where the response takes values (prefer cooler), (satisfied) and (prefer warmer). We start with an initial indoor air temperature of and elicit thermal query related responses for all the three occupants. At this thermal indoor air temperature, each of the three synthetic occupants prefer warmer temperatures .
3.2 The framework in action
For selecting the next indoor air temperature to query, the setup is as follows:

As mentioned earlier, to find the next query we search over a grid of  separated indoor air temperatures reaching below and above the current indoor air temperature of the room. For example, if the current indoor air temperature value is , then our framework would look for next temperature to query from the set of available indoor air temperatures .

We start each elicitation process with a query at . We continue to run this elicitation process until we converge towards the maximally preferred indoor air temperature, see the end of the Sec. 2.3.6. All of the indoor air temperatures queried along with their associated thermal preference responses, EUI values, ratio of EUI value at previous iteration and EUI value at current iteration (EUI ratio) for synthetic occupant 1, occupant 2 and occupant 3 are shown in Table 2. The posterior predictive distributions over utility function and maximally preferred indoor air temperature values are shown in Figs. 2–4.
Query #  Synthetic occupant 1  Synthetic occupant 2  Synthetic occupant 3  
Temp. ()  Response  EUI  EUI Ratio  Temp. ()  Response  EUI  EUI Ratio  Temp. ()  Response  EUI  EUI Ratio  
1  21  1  0.465    21  1  0.534    21  1  0.537   
2  23.5  1  0.285  1.631  23.5  1  0.1963  8.30  23.5  1  0.238  2.22 
3  26  1  0.01  28.5  23  1  0.295  0.665  22.5  1  0.009  26.4 
4  24.5  1  0.016  0.625  23.5  1  0.210  1.40  21.5  1  0.010  0.9 
5  25.5  1  0.009  2  23  1  0.234  0.897  22  1  0.005  2 
6  25  0  0.009  1  23  1  0.221  1.05  22  1  0.005  1 
7  25  0  0.006  1.5  23.5  1  0.231  0.956  22.5  1  0.005  1 
8  25  0  0.007  0.857  23.5  1  0.008  28.875  22  1  0.005  1 
9  25  0  0.008  0.875  23.5  1  0.005  1.6  22.5  1  0.005  1 
10  25  0      23.5  1      22  1     
(a) utility samples ()  (b) utility samples ()  
(c) utility samples ()  (d) utility samples ()  (e) max. preferred temp. () 
(f) utility samples ()  (g) utility samples ()  (h) max. preferred temp. () 
(i) utility samples ()  (j) utility samples ()  (k) max. preferred temp. () 
(l) utility samples ()  (m) utility samples ()  (n) max. preferred temp. () 
(a) utility samples ()  (b) utility samples ()  
(c) utility samples ()  (d) utility samples ()  (e) maximally preferred temp. () 
(f) utility samples ()  (g) utility samples ()  (h) maximally preferred temp. () 
(i) utility samples ()  (j) utility samples ()  (k) maximally preferred temp. () 
(a) utility samples ()  (b) utility samples ()  
(c) utility samples ()  (d) utility samples ()  (e) maximally preferred temp. () 
(f) utility samples ()  (g) utility samples ()  (h) maximally preferred temp. () 
(i) utility samples ()  (j) utility samples ()  (k) maximally preferred temp. () 
(l) utility samples ()  (m) utility samples ()  (n) maximally preferred temp. () 
As we can see from Figs. 2–4, our PE framework searches through all of the available indoor air temperature values to converge towards the temperature that maximizes the thermal satisfaction/utility of the occupants. For synthetic occupant 1, the maximum preferred indoor air temperature value was and our framework was able to infer it in six queries (Fig. 2). In case of synthetic occupant 2, due to the constraints we have put on the framework, i.e., to only search for temperatures values with minimum temperature difference of 0.5, our framework converges to the maximally preferred indoor air temperature value of instead of actual maxima at . This happens in six queries to the occupant (Fig. 3). Next, the process converges to indoor air temperature value of for synthetic occupant 3 (actual maxima is at ) after collecting five active thermal preference queries (Fig. 4).
As shown in Figs. 2–4, all of the inferred posterior predictive utility samples are unimodal. Because of this strict unimodality constraint, the EUI acquisition function governing the search for maximally preferred indoor air temperature value corrects the boundary overexploration effect which is typically seen in conventional Bayesian global optimization problems siivola2017correcting . Intuitively, once we have observed that a given occupant responds “prefers warmer” at , the model becomes sufficiently confident that this occupant is going to respond “prefers warmer” at temperature values as well. Similarly, if the occupant responds that he “prefers cooler” at , then our model becomes confident that this occupant is also going to respond “prefers cooler” at temperature values. Because of the unimodality constraint, in this case, the model would place high probability of the maximally preferred indoor air temperature of being in between and .
3.3 Performance of Expected Improvement based Preference Elicitation Framework
In this subsection, we analyze the performance of our newly developed PE framework and illustrate some of its key properties. Put simply, we seek to answer the following questions:

How long does it take for PE framework to converge towards the maximally preferred indoor air temperature values?

How does changing the initial preference query posed to the occupant affect the convergence of PE framework?

How is the EUI based PE framework performance compared to just randomly searching the input space? Does it converge faster towards the maximally preferred indoor air temperature values?
3.3.1 Convergence towards maximally preferred indoor air temperature values
As discussed in the previous subsection (see Section 3.2), each of the PE run starts with one initial thermal preference query to the occupants at and a total budget of 10 queries are posed to the occupants. We report the posterior distribution over maximally preferred indoor air temperature values at each step of the PE runs. The boxplots of these inferred posterior distribution over maximally preferred indoor air temperature values (for each of the synthetic occupants) are illustrated in Fig. 5
. Minimum and maximum values of inferred preferred indoor air temperature values are used to define the whiskers of the boxplots. The interquantile range (IQR) for visualization of boxplots is calculated as the difference between the first and third quantiles and 1.5 times IQR is used to define the outliers. As seen from these subfigures, our PE framework converges to the maximally preferred indoor air temperature values (
Fig. 4(a) shows convergence of PE runs to in case of synthetic occupant 1 ; Fig. 4(b) shows convergence of PE runs to in case of synthetic occupant 2 and Fig. 4(c) shows convergence of PE runs to in case of synthetic occupant 3).3.3.2 Convergence analysis
In this subsubsection, we show the robustness of our PE framework to converge towards the maximally preferred indoor air temperature values (independent of the initial query posed to the occupants). That is, no matter the initial thermal preference query posed to the occupant (whether it be at or or ), in a perfectly ideal experimental setting, it will converge towards the maximally preferred indoor air temperatures (in less than 10 queries) to occupants. We run the PE scheme (for all the occupants) three times, starting from initial comfort query at , and . We report the mean of inferred posterior maximally preferred indoor air temperature values at each step of the PE runs (for all the three occupants, with different initial preference query) in Fig. 6. As seen from these subfigures, no matter from where we start the PE runs, our algorithm will eventually converge towards the maximally preferred indoor air temperature values (in less than 10 queries).
3.3.3 Expected Improvement in Utility (EUI) vs. Random Search (RS)
In this subsubsection, we address the question of “how fast does the EUI based PE framework converge towards the maximally preferred indoor air temperature values (as compared to just randomly searching (RS) over the indoor air temperature values)?” For evaluating this comparison between EUI vs. RS, we introduce the concept of averaged Euclidean distance (AED) performance metric.
AED between two vectors gives a measure of “similarity” between the two vectors. In this problem, we calculate the averaged Euclidean distance between the inferred maximally preferred indoor air temperature (at each step of the elicitation runs) and true maximally preferred indoor air temperature values. We run both EUI and RS based elicitation 3 times (trials) with different initial thermal preference query posed to the occupants (same for both EUI and RS based exploration) and we report the average performance across all these trials. The lower the value of AED metric, the better the given framework and vice versa. It is given as awalgaonkar2018design :
(34) 
where are HMC samples approximating posterior distribution over maximally preferred indoor air temperature values and is the true maximally preferred indoor air temperature ( in case of occupant 1 ; in case of occupant 2 and in case of occupant 3). Fig. 7 shows the performance of EUI approach against randomized data collection (as we add more and more thermal comfort queries). The results are consistent across all the three occupants, that is, EUI exploration approach has consistently been proven better than randomized data collection (since it converges in less number of thermal preference queries to the occupants). For more indepth analysis of the convergence properties of expected improvement acquisition function, we refer readers to the work done by vazquez2010convergence .
3.4 Performance of Unimodal Gaussian Process Preference Learning Model
As we have iterated numerous times in the previous sections, the objective of this paper is to sequentially pose intelligent queries to occupants in order to optimally learn the indoor air temperature which would maximize their satisfaction. However, after analyzing the posterior distribution over utilities and inferred maximally preferred indoor air temperature values, a question which naturally arises in our minds is “can the UnimodalGP based preference learning model be used in for classifying prefer warmer/ prefer cooler conditions?” Indeed, due to the strict unimodality constraints we have imposed on the structure of utility function, one positive/favorable sideeffect of running our UEI based PE scheme is that the trained UnimodalGP can be further used to classify prefer warmer vs. prefer cooler conditions.
In order to validate if our systematic approach provides higher accuracy for classifying prefer warmer vs. prefer cooler conditions, we compared the performance of our proposed UnimodalGP preference learning model against two wellknown classification techniques : logistic regression and support vector machine (SVM) with linear basis function. Each of the abovementioned classification models has one or several hyperparameters that need to be properly tuned for providing their best performance during classification. We do not discuss the process of hyperparameter tuning in this paper. Instead, for learning more about hyperparameter optimization of machine learning models, we refer readers to work done by bishop2006pattern . In order to achieve fair comparison between these models, we provided two sets of training data: (1) 5 datapoints based on EUI search (first 5 rows of Table 2) and (2) 5 randomly selected (RS) thermal preference queries to the models, trained them and then checked the classification results on a testing test (different from the training data). Specifically, we used hitrate accuracy (HRA) as the measure of performance of different classification models (same as in ghahramani2015online ). HRA is used as a statistical measure of how well a binary classification model correctly identifies or excludes a condition. Put simply, HRA is defined as the average chance of correct prediction. Higher the value of HRA, the better the performance of model is. In Table 3, we report the 3fold validation HRA across all the above mentioned models. As it can be seen in Table 3, hit rate accuracy of the proposed UnimodalGP (trained on data collected using EUI based PE framework) was relatively higher than the other classification models. The relatively higher accuracy of the proposed model is due to the strict unimodality prior constraints imposed on the structure of utility functions as well as due to the collection of most informative thermal preference query data (as elicited by our PE framework). However, as promising as this may be, we also like to note that one should be cautious while interpreting Table 3, as the HRA metrics values were obtained using synthetic occupants’ (ideal) preference data whose true utilities were unimodal, favoring our unimodal GP model and this might not always be the case when concerned with real occupants.
Model  Hit rate accuracy (%)  
Synthetic occupant 1  Synthetic occupant 2  Synthetic occupant 3  
Average  Standard deviation  Average  Standard deviation  Average  Standard deviation  
Unimodal GP (trained on EUI data)  100%  0%  97.33%  2.5%  100%  0% 
Logistic Regression (trained on EUI data)  100%  0%  81.33%  6.6%  74.66%  5.24% 
SVM (trained on EUI data)  93.33%  4.10%  46%  8.48%  56.66%  5.73% 
Unimodal GP (trained on RS data)  70%  5.65%  68.66%  13.19%  72.66%  4.10% 
Logistic Regression (trained on RS data)  64.66%  11.46%  73.33%  6.6%  69.33%  6.8% 
SVM (trained on RS data)  64.66%  11.46%  78%  3.26%  69.33%  6.8% 
4 Experimental results
4.1 Experiment Setup
We conducted realtime experiments in three identical southfacing private offices (3.3m x 3.7m x 3.2m high) in a building located in West Lafayette, Indiana. Fig. 7(a) shows a general view of this building and the monitored offices. The offices have one exterior curtain wall facade with 54% windowtowall ratio and highperformance glazing units with selective lowemissivity coating (normal visible transmittance: 70%, normal solar transmittance: 33%). Darkcolored motorized interior roller shades are installed in the offices with a total normal visible transmittance of 2.53% and an openness factor of 2.18%. Each office has two electric lighting fixtures with two 32W T5 fluorescent lamps (total of 128 W). The heating and cooling delivery to the spaces is achieved through a Variable Air Volume (VAV) system with a central air handling unit (AHU), which supplies conditioned air to the offices at a constant temperature, but variable flow rate. Each office has a VAV box with zone damper that can modulate the supply air flow rate in the cooling mode and a reheat coil to increase the supply air temperature as needed to achieve the setpoint temperatures. The cooling and heating source (chilled water and steam) are provided from the campus plant. Furthermore, a Building Management System (BMS) is operated through the installed Tridium JACE controllers and Niagara/AX software framework, which in addition to a variety of internetenabled features provides the ability to monitor, control and automate all of the building systems. During the course of this study, the air flow rate, supply air temperature, reheat coil mode, occupancy, setpoint temperature, shading position, and electric light level were monitored. The sensors/actuators for these variables were connected with the BMS using BACnet protocol. The indoor air temperature (Jtype thermocouples, resolution: , accuracy: 0.4%) was measured at seating height and on two sides of the occupant’s regular work position (see Fig. 7(b)). They were placed in a closed proximity (less than 1 m) to the test subjects. The average reading of two was used to reduce the influence of spatial temperature distribution. In addition, we monitored, the relative humidity (BAPIStat4, resolution: 1%, accuracy: 2%), air velocity (SENSOR anemometer, resolution: 0.01 m/s, accuracy: 1%) and globe temperature (SMART black globe temperature sensor, resolution: , accuracy: 0.5%) near the desks, transmitted solar irradiance (LICOR 200SL pyranometer, resolution: 0.1 W/m2, accuracy: 3%) on the facade. The sensors for these variables were connected with the National Instrument wireless data acquisition system, which communicated with the BMS using Modbus protocol. The instruments used were installed in configurations that would not interfere with occupants’ positions or tasks. In this study, we control the dimming level of the electric lights to obtain 300 lux on the workplane and fix the shade position at 25% to eliminate the potential visual environment impacts on the thermal preference of testsubjects. During the experiment, we deployed an automated control setup with no manual overrides allowed. This study was approved by the Institutional Review Board (IRB Protocol #: 1503015873).
We conducted this study over the span of 23 days in October 2018. Overall, six testsubjects (20  35 years old; three males and three females), not familiar with this research, participated in this field study. Each participant occupied one office everyday between 9:00 AM to 4:00 PM. Upon arrival in the morning (9:00 AM), we set the indoor air temperature to the minimum temperature value that was achievable during that particular day (for most of the occupants, this temperature was ). We asked all the occupants to perform their usual work routine (computerrelated work, reading, writing etc.) during the day. Also, we advised them to wear clothes they would normally wear while working inside offices (to marginalize out the impact of clothing on their thermal preferences). They were free to take breaks to have lunch, use washrooms, etc., that is, breaks one would normally take while working in offices. This setup was used to create realistic occupancy dynamics and allows for marginalizing out its impact on thermal preferences. During the experiments, the setpoint indoor air temperature of the offices were automatically adjusted by the research team without giving override access to the participants. After 30 minutes, when the participants adopted to the steady state thermal conditions (also supported in findings by ghahramani2015online ), we asked them to answer a short webbased questionnaire (see Fig. 7(c)), and based on their responses, the PE algorithm determined the next indoor air temperature to test for the participants.
Before conducting the actual field studies with real occupants, we analyzed the impact of temperature difference constraints, temperature range to query over, the time difference between consecutive queries, etc., by running pilot experiments using expert occupants (authors of this paper and other researchers from our group). After conducting these preliminary pilot studies, the indoor air temperatures we decided to test over were from to (supported in previous studies by guenther2019feature ; lee2017bayesian ), with the minimum perceivable temperature difference set to be (supported in a previous study by wang2018individual ). That is, we are going to test for all queries . Also, to avoid drastic difference in indoor air temperature values (i.e. to avoid going from to and vice versa), we constrained the search to temperatures from the current value.