Estimation of Evaporator Valve Sizes in Supermarket Refrigeration Cabinets

by   Kenneth Leerbeck, et al.

In many applications, e.g. fault diagnostics and optimized control of supermarket refrigeration systems, it is important to determine the heat demand of the cabinets. This can easily be achieved by measuring the mass flow through each cabinet, however, that is expensive and not feasible in large-scale deployments. Therefore it is important to be able to estimate the valve sizes from the monitoring data, which is typically measured. The valve size is measured by an area, which can be used to calculate mass flow through the valve – this estimated value is referred to as the valve constant. A novel method for estimating the cabinet evaporator valve constants is proposed in the present paper. It is demonstrated using monitoring data from a refrigeration system in a supermarket consisting of data sampled at a one-minute sampling rate, however it is shown that a sampling time of around 10-20 minutes is adequate for the method. Through thermodynamic analysis of a two-stage CO2 refrigeration system, a linear regression model for estimating valve constants was developed using time series data. The linear regression requires that transient dynamics are not present in the data, which depends on multiple factors e.g. the sampling time. If dynamics are not modeled it can be detected by a significant auto-correlation of the residuals. In order to include the dynamics in the model, an Auto-Regressive Moving Average model with eXogenous variables (ARMAX) was applied, and it is shown how it effectively eliminates the auto-correlation and provides more unbiased estimates, as well as improved the accuracy estimates. Furthermore, it is shown that the sample time has a huge impact on the valve estimates. Thus, a method for selecting the optimal sampling time is introduced. It works individually for each of the evaporators, by exploring their respective frequency spectrum.



page 7


Model-Free Tests for Series Correlation in Multivariate Linear Regression

Testing for series correlation among error terms is a basic problem in l...

Beta regression control chart for monitoring fractions and proportions

Regression control charts are usually used to monitor variables of inter...

VEST: Automatic Feature Engineering for Forecasting

Time series forecasting is a challenging task with applications in a wid...

Fault detection and diagnosis of batch process using dynamic ARMA-based control charts

A wide range of approaches for batch processes monitoring can be found i...

Estimating the Expected Value of Sample Information across Different Sample Sizes using Moment Matching and Non-Linear Regression

Background: The Expected Value of Sample Information (EVSI) determines t...

Exhaustive search for sparse variable selection in linear regression

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse app...

Daily milk yield correction factors: what are they?

Cows are typically milked two or more times on a test day, but not all t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


OLS Ordinary Least Square
Auto-Regressive Moving Average model with
eXogenous variables
MIMO Multiple Inlet Multiple Outlet
CI Confidence Interval

1 Introduction

To take advantage of system modeling and automation, the use of digital twins (a digital representation of a physical system) are increasingly evolving [Yuchen, BACHER20111511]. With the increasing amount and resolution of data being collected from all kinds of sources, the potential and applications are expanded, opening up new research questions to be answered. This paper focus specifically on estimation of evaporator valve constants (a measure of the valve area when fully opened) of the evaporators in supermarket refrigeration cabinets, to enable a better understanding of these systems on a large scale. Without correct valve constants, the energy demand for each evaporator cannot be determined without measuring the mass flow directly which is expensive and practically infeasible.

Grey box modeling – characterized by being a combination of white-box (physically based models) and black-box (purely data-driven models) – of supermarket refrigeration systems is a research area of great interest and potential. The models can be used in for applications in optimized control and fault detection. Thus, we achieve physical insight of the parameters of the overall and complex system using measured time series data, [MARUTA20131090, KRISTENSEN20041431]. The method has been well tested in many applications, see e.g. [BROK2019494, KIM2018359]

for refrigeration systems applications. A promising approach to data-driven modeling and optimization of refrigeration systems is with neural networks and predictive control presented in

[KIZILKAN201111686] to increase the efficiency of the compressor in a refrigeration system. This is later followed-up in [pr8091106], where an energy reduction of 17% was achieved on a one-stage system. However, supermarkets’ refrigeration systems are usually two-stage systems with two temperature levels, one for frozen goods and one for refrigerated goods, each with its own compressor equipment, which significantly complicates the models.

Data-driven modelling of supermarket refrigeration systems can be done by modelling single components; compressor, condenser or evaporator separately [Perez, Cecchinato, Willatzen] – or the system may be modelled as a near-complete refrigeration system with several components integrated [Aprea, Wang, Zhou]. Detailed data-driven models of a complete system was developed in [Glenn], using sub-space models – a method for model parameterization of non-linear Multiple Inlet Multiple Outlet (MIMO) systems [Katayama]. In that work, the cooling cabinets were not considered individually. For single component models, the parameterization can be done in a higher detail and will have fewer parameters per model compared to multi-component model. A clever way to use both principles, is to first parameterize separate models of the main components (e.g. cooling cabinets, compressors and the condenser) – this is done in [Ehsan], using Predictive Error Minimization for parameter estimation. Afterwards the complete system was modelled using the estimated parameters. The model was used to develop control strategies for providing energy flexibility aiming to integrate supermarket refrigeration systems in smart grids.

Figure 1: Exemplifying flow sheet of supermarket CO2 refrigeration system. The valve constant for evaporator is denoted with . The and are the number of Medium -and Low Temperature evaporators respectively.

Even though the models after parameter estimation may generate good predictions, there is no guarantee that the parameters are reasonable from a physical perspective. It is quite often the case that there are too many parameters to estimate, such that the models become over-parameterized and unidentifiable. In the modelling of cooling cabinets in [Ehsan], this is the case. The model can predict the temperature which is the only thing needed for control. However, if the parameters are not physically correct, we have no knowledge about the energy demand.

The problem with many previously proposed models is their lack of practicality due to high complexity. In [GreyBox], this issue is addressed, arguing that simpler models are necessary for automatic large-scale deployment. Furthermore, the issue related to constantly changing parameters in the system model due to varying amounts of goods on store was addressed. Here, a fixed valve constant is used (known from specifications), the valve constants will in most cases be unknown for automatic large-scale deployment. Therefore, the present paper focuses on estimating these valve constants using a data driven approach. First, a mass balance is formulated as a linear regression model – with the valve constants being the model coefficients. Thus, the Ordinary Least Squares solution (OLS) is used to estimate the valve constants. Furthermore, a method for selecting the optimal sampling time is suggested and applied for each evaporator, it is based on their respective frequency spectrum. One assumption which must hold for the linear regression model to provide unbiased estimates is that there are no transient dynamics in the modeled data. By analysing residuals it is found that they have a significant and high auto-correlation – which means that the dynamics were not modeled. To account for this, an ARMAX (Auto Regressive Moving Average with eXogenous variables) model is applied for more accurate estimates of the valve constants. The methods presented in this paper opens up for a fully automatized method for modelling supermarket refrigeration systems on a large scale with limited access to knowledge of the individual supermarkets, only monitoring data, which is typically available, is used.

This is essentially determined by the cabinet valve constant – a measure of the specific evaporator valve size, varying from cabinet to cabinet. Therefore, it is very useful to know these valve constants thereby enabling a correct modelling of the energy demand from each cabinet. In the present paper, two methods are presented for estimating the valve constants of a refrigeration system. First an estimation method where knowledge of the compressor stroke volume is required – here, a mass balance is formulated and a linear regression method is used to estimate the valve constants. For the second method, we have no knowledge of the compressor stroke volume – here a novel algorithm is presented, using prior knowledge of the different valve sizes that can possibly be installed to estimate the stroke volume, k-means clustering and an optimization algorithm. Finally, a residual model is applied for estimation of the valve sizes much in line with those estimated in the first method. This opens up for a fully automatized method for modelling supermarket refrigeration systems on a large scale with limited access to system data for individual supermarkets.

Good models of the energy demand in supermarket refrigeration systems are of great importance in developing fault detection systems, cabinet categorization and energy optimization control strategies.

[A]Mass flow [A]Pressure [B]recReceiver [B]Evaporator [B]liqLiquid phase [A]Density [A]Opening degree (0-1) [A]Valve constant [A]Valve constant [A]fVirtual compressor frequency (load as percentage of maximum capacity 0-1) [A]Volumetric efficiency (0-1) [A]Compressor stroke volume [B]gasGas phase [A]Gas quality (0-1) [A]Enthalpy [B]LTLow temperature [B]MTMedium temperature [B]compCompressor rack [B]gcGas cooler

2 The system

The refrigeration system under study is illustrated in Figure 1. Temperature and pressure measurements are accessible at all numbered points on the figure. The process is the following. After the receiver, at stage "1", the refrigerant is liquefied and split into the medium temperature (MT) string and the low temperature (LT) string, where expansion valves drops the pressure to the desired saturation temperature. The evaporators are controlled with valves using either hysteresis (open/close) or modulating (continues open) control – valve has a corresponding valve constant, denoted ,. After the evaporators at stage "3a" and "3b", the refrigerant is superheated to avoid any droplets from entering the compressor. At stage "4a", "4b" and "5", after the low pressure compressor rack, the pressure is the same, but the enthalpy varies as the MT string and by-pass (bp) string from the receiver connects. The refrigerant now enters the high pressure compressor rack and continues through the condenser to a sub-cooled state throttled to stage ”8” and split in a liquid and a gas fraction – and the cycle repeats. Note, at higher ambient temperatures the refrigerant does not condense, but the gas is throttled and this result in a lower liquid and a higher gas fraction at “8”.

This research focus on modelling the evaporators, hence anything between stage "6" and "8" will not be discussed further.

3 Methodology

In this section, first, under the assumption that there are no transient dynamics in processes, the steps in setting up a linear regression model for estimation of the valve constant are described, and second, the applied OLS estimation technique is presented, and finally to include linear dynamics an ARMAX model [madsen2007time] is presented.

The valve constant is used to determine the refrigerant mass flow through evaporator by


where is the receiver pressure at time and is the density of the liquefied refrigerant, which is found as a function of using CoolProp [coolprop]. For the ’th cabinet is the evaporator pressure at time , is the opening degree of the valve at time , given as a value between 0 and 1 and is controlled either using hysteresis (open/close) or modular (continues open) control. Hence, the valve constant, , is defined as an area equivalent to the area of the valve opening when the valve is fully open.

The mass balance at stage "4a" (Figure 1) can be written as


where the is the mass flow going through the medium pressure (temperature) compressor at time and is the by-pass mass flow entering from the receiver at time . The right side mass flows are simply the sum of all the mass flows through the LT and MT cabinets, where and are the total number of LT and MT cabinets, respectively.

The mass flows on the left side of Equation 2 are calculated by


where f is the virtual compressor frequency (load as percentage of maximum capacity) at time , , is the displacement volume which is specified for the compressor, and finally the mass flow from the by-pass valve is calculated by


where is the gas quality of the refrigerant from the by-pass valve as a function of the enthalpy in the gas cooler, , and the receiver pressure at time .

In order to set up the a linear regression model with the valve constants as coefficients, we let


such that the right side of Equation 2 can be rewritten as




From the overview in Figure 1, it is clear that

For the MT cabinets, we can setup the following mass balance


where is the number of MT cabinets. We can calculate the mass flow from the by-pass valve as


where is the gas quality of the refrigerant from the by-pass valve as a function of the enthalpy in the gas cooler, , and the receiver pressure . This can be combined to the expression


. The linear regression model can by set up by


where is the number of observations and , and thus can be written as


where the errors are random variables in the vector


which must fulfill and i.i.d. Note, that is capital, since it is the model output and thus a vector of random variables.

The parameter vector can be estimated from the OLS regression that minimize


which has the OLS solution


Furthermore, modelling time series variables with a linear regression model will, if there are any dynamics present, lead to auto-correlation of the errors. This violates the assumption of i.i.d and consequently, the coefficient estimates can be biased and statistics not valid. Thus, to account for this, we will consider an ARMAX model. Using the back-shift operator, , this model can be written as


where and . The model is refered to as an ARMAXp,q model, where parameters and determines the order of the AR and MA part, respectively. Note, that there are no lags in input variable, , in this formulation, hence it is not a full ARMAX model since it is constrained to a model order of the input polynomial set to zero, i.e.  according to [TSA]. For the parameter estimation, the R function arimax from the TSA package is used. It is estimating the parameters using based the maximum likelihood method.


Considering an AR model the prediction errors are obtained as

Figure 2: of the cooling room MT the 29th of October 2013.
Figure 3: The frequency spectrum for each of the evaporators.

4 Case Study

4.1 Data and sampling time

The data used consists of 1-minute time series covering an eight months period from November 2013 to June 2014. The values for one cabinet during one night is illustrated in Figure 3. If the raw 1-minute data is used, a lot of redundant high frequency dynamics are present in the data that will lead to a poor model with high auto correlation. Thus we need to explore the impact of the sampling time on the model. From the frequency spectrum shown in Figure 3 we observe that most of the cabinets have a clear peak. We can extract the inverse frequency at the peak for each cabinet – that is the most often occurring cycle time for the valve opening. The peak cycle times identified are listed in Table 1. Note, cabinet MT has an extremely flat curve compared to all the other evaporators, hence its 14 min. cycle time is not as significant. This is because it uses modulating control (continuously valve opening) thus it has no clear peak in the frequency spectrum. Furthermore, it is noticed that MT has the largest cycle time of 83 minutes. However, its spectrum peak is low compared to the other evaporators – it is a booster evaporator for MT (they are both located in the refrigeration room) and continuously supplies the cooling that MT cannot. The rest of the evaporators have cycles varying from 20 min. to 40 min., where MT, MT and MT are clustered around 40 min. – these are three different evaporators in the same cooling cabinet. For a good model, we will need to have at least, and ideally, two measurements for each cycle (one open and one closed). Because the evaporators have such different cycle times, there is no sampling time that will fit all the evaporators optimally. For instance, a 20-minute sampling time is best for MT, MT and MT, whereas a 10-minute sampling time is best for MT, LT and LT

. For this study, we will therefore examine sampling time in the range of 1 min. to 30 min. The re-sampling from 1-minute sampling time to higher sampling times is carried out by averaging in the time periods, such that aliasing is avoided and a better signal-to-noise ratio in the re-sampled data is obtained


In order to evaluate if the estimated valve constants, , have realistic values they are compared to the specified valve constants [Fredslund] (see Table 2).

The 10-minute sampling time data is illustrated in Figure 4. Notably, we have two large gaps of missing data in November and December 2013, however, this will not affect the results for linear regression. Examining closely the smoothed average curve for , it begins to increase in May, suggesting some seasonality with increased load during the summer. Furthermore, there is a change in the operation of LT around February 2014. Prior, there was larger fluctuation and the regulation was forcing a thermostat cutout often, which is just the normal operation of a hysteresis control when the temperature reaches the cutout temperature. After February, the cabinet never reaches the cutout temperature and the valve is triggered by the pressure control instead.

83.17 32.19 39.92 41.58 41.58 21.23
14.68 31.19 25.59 20.37 30.24
Table 1: The dominant inverse frequency for each cabinet in minutes.
Figure 4: Daily averages of for each cabinet and .
Figure 5: Estimated valve constant, , for evaporator MT as a function of the sampling time using all the data – November 2013 to June 2014.

The operation of evaporators within the same cabinet is obviously quite similar, thus correlations might occur – this can be crucial for the model, as this can lead to problems with co-linearity in the estimation. When this happens the condition number of the matrix is lowered, making the determinant of it close to zero, and thus inverting it results in large numerical errors, see Equation 15. However, in the present case the co-linearity was not an issue (generally, a correlation coefficient indicates co-linearity [Dormann]

). To confirm further, the Variance Inflation Factor (VIF) is a widely used tool – a variant of stepwise regression used to detect multi co-linearity. A VIF less than 10 is an acceptable level

[Hair], in present case the VIF was less than 2.

5 Results

As discussed in Section 4, the sampling time potentially impacts the estimated parameters, thus this was investigated and the result of this is presented in this section – and how the suggested method for determining the optimal time resolution, identified based on their cycle frequencies, see Figure 3 and Table 1, lead to reliable estimates. First the results of applying the linear regression model are presented and thereafter the results from the ARMAX model.

5.1 Linear Regression Model

The estimated valve constants are presented as function of the sampling time in Figure 5. In each figure the red line marks the specified valve constant (see Table 2), the green line marks the identified optimal sampling time and the black line is the valve constant estimates (with the 95% CI) vs. the sampling time from 1 min. to 30 min.. The estimates clearly vary with the sampling time, however the estimator converge to a constant level with higher sampling times in most cases. Both the estimates for MT and MT seem to decrease after the sampling time exceeds their respective optimal sampling time. Furthermore, at low sampling times (< 10 minutes) the estimates are typically diverging. By observing the Auto Correlation Function (ACF) of the residuals for the 1-minute model and the 15-minute model presented in Figure 6 (top left and top right plot respectively), there is clearly significant auto-correlation for both models, which indicates the need for modelling dynamics with an ARMAX model.

Figure 6: Auto Correlation functions (ACF) for the linear regression models and ARMAX models using 1 minute and 15 minute sampling time.
Cabinet Valve
MT AKV 10-3 0.58765
MT AKV 10-5 1.48523
MT AKV 10-4 0.94185
MT AKV 10-5 1.48523
MT AKV 10-5 1.48523
MT AKV 10-5 1.48523
MT AKV 10-2 0.37191
LT AKV 10-3 0.58765
LT AKV 10-2 0.37191
LT AKV 10-2 0.37191
LT AKV 10-2 0.37191
Table 2: Valves installed in Fakta Otterup and their valve constants - derived from a previous study [Fredslund]

5.2 ARMAX Model

The ACF of the linear regression model using 1-minute sampling time, see Figure 6 top left plot, shows seasonality caused by hysteresis control with roughly 15 lags between the wave peaks. Now, consider the ACF of the 15-minute sampling time model (Figure 6, top right), there is a large negative correlation to lag 1, which implies the need for a moving average term. The positive correlation of lag 4 implies the need for auto-regressive terms. An ARMAX model removed all auto-correlation in the 15 min. sampling time model (Figure 6, bottom right). In this study, an ARMAX will be used for all sampling times, though the best ARMAX fit may be different for other sampling times. The 1-min. model still has some seasonal auto-correlation (Figure 6, bottom left), but it is much improved. Considering the resulting valve constant estimates, presented in Figure 7, firstly, it is found that the CIs are much more narrow compared with the CIs for the linear regression model. Secondly, the estimates vary with the sampling time and above a certain level the estimates converge.

Figure 7: Estimated parameters as a function of the sampling time using all the data - November 2013 to June 2014.
Figure 8: Linear regression model: Auto Correlation Function for the 1-minute model.
resolution model.
Figure 9: Linear regression model: Auto Correlation Function for the 15-minute model.
resolution model.
Figure 10: ARMAX model: Auto Correlation Function for the 1-minute model.
resolution model.
Figure 11: ARMAX model: Auto Correlation Function for the 15-minute model.
resolution model.

6 Discussion

The method for valve estimation provides useful results, however, the sampling time turned out to have a huge impact which must be handled to achieve reliable estimates. An issue with the method is that the evaporators have very different optimal sampling times given their different frequencies. Thus, there exist no single sampling time suitable for all the evaporators. However, an approach for identifying an optimal sampling time for each evaporator was presented and is was shown that estimates, at their respective optimal sampling time, fits fairly well with the specified valve constants. Notice, for the linear regression model the optimal sampling time is at a point on the graphs (in Figure 5), where the valve constant estimates either begin to converge, or they are found in a local optimum. For the ARMAX model, the sampling time is far less important – the valve constant estimates do not vary much using sampling times above min.. However, we still see generally more accurate results around the optimal sampling time. These effects should be studied in further research including multiple supermarkets.

Models using sampling times below min. provided generally very different results and the residuals were also highly correlated (See Figure 6 left plots for the 1 minute models) – both in the linear regression model as well as the ARMAX model. For models using sampling times min., the results were much more uniform in both the linear regression model and the ARMAX model. Problems with the residuals are however still present for the linear regression model – these are fixed in the ARMAX model.

It is clear that the CIs are much more narrow for the ARMAX models, suggesting a generally better model. The estimates for the ARMAX models using a sampling time above min. are also less dependent on the sampling time than for the linear regression model. However, for the cabinet MT, there are still some rather large estimate variations. This is because the cabinet uses modulating control rather than hysteresis control as all the others use. Therefore, the more accurate valve constant estimates may be an average of the estimates from the ARMAX models using sampling times between 15 and 20 min. to even out the variation. We may calculate an average of the estimates from models using sampling times around the optimal sampling time, if this exists.

The method has only been tested on one refrigeration system with mixed controls – one evaporator used modulating – and the rest used hysteresis control. It will be interesting to expand the study to include supermarkets that only use modulating control and supermarkets that only use hysteresis control. Consistency among the evaporators is expected to give better results.

This study used 8 months of data for the results – it is, however, possible to calculate the estimates with less data, but the uncertainty will increase. As much data as possible should be used for the estimates, however, periods where faults have occurred should be removed to ensure right estimates.

7 Conclusion

Models for estimating the valve constants in supermarket refrigeration systems are presented and results are obtained from data collected in a regular supermarket in Denmark. A mass balance is formulated leading to a linear regression model from which the valve constants can be estimated – and it is extended to an ARMAX model, which can describe the system dynamics. The estimated results are analysed and it is shown that reliable values can be obtained, but that several factors must be taken into account. Firstly, co-linearity in the inputs can cause issues, but this is not an issue in the case study – potentially it can be a problem for other systems, and should be checked e.g. with the Variance Inflation Factor. Furthermore, the coefficient estimates vary with the sampling time and it is shown that no single sampling time is optimal for all evaporators, because they have different cycle times. However, for models using sampling times above 15 min., the valve estimates become more stable and less dependent on the sampling time. Some variation is still found, hence some smoothing of the estimates as a function of the sampling time can be beneficial, however an approach to this must be developed with data from multiple supermarkets. The suggested approach for an optimal sampling time is shown to be a good indicator for the best estimates. Auto-correlation is highly present in the linear regression models, thus to account for this, an ARMAX is applied. After applying the ARMAX model, the CI for the valve estimates becomes more narrow, suggesting a more accurate model. Furthermore, the estimates are also slightly less dependent on the sampling time than with the linear regression model. Thus it is found, that the ARMAX model is preferable over the linear regression model, however still studies including a wide range of supermarkets must be carried out to further develop the methodology and study its full potential.

8 Acknowledgement & Funding

This document is the results of the research projects Digital twins for large-scale heat pumps and refrigeration systems (EUDP 64019-0570) and Flexibile Energy Denmark (FED) (IFD 8090-00069B).