Industrial batch processes are used to produce a wide range of products. This kind of process generates for each batch run time series representing multiple measurements of process variables. That peculiar data structure containing samples of time series and data with a strong dynamic feature makes it still very challenging to develop control charts based on time series models.
Traditional monitoring approaches do not take into account directly the time series nature of the data. Most of them decompose the tri-dimensional data array (batches variables time-instants) in a two-dimensional array (batches variables/time-instants), based on the precursor approach of Nomikos and MacGregor (1995)
. In this context, considering batches as sample replications, control approaches are proposed by using multivariate techniques (as Principal Components, Partial Least Squares, Discriminant Analysis, Support Vector machines, Neural Networks, etc) applied in the variable/time domain. These multivariate-based control charts are able to capture the dynamic data behavior in some way. We can mention a number of papers presenting improvements, applications and fruitful discussions in this direction inWold et al. (1998), Jia et al. (2010), Van den Kerkhof et al. (2012), Peng et al. (2014), Kuang et al. (2015), Wang et al. (2018), Peres et al. (2019), Li et al. (2020) and Wang et al. (2021).
Control charts based on time series models are well known in the context of continuous processes. These kinds of processes have an intrinsic two-way data structure (samples
variables) since there are no replications of measures in each sample, i.e, there is no time dimension. Those models are mainly aimed to deal with the serial sample correlation, by using traditional control charts for the uncorrelated residuals or the cumulative-based charts for the fitted values. In both cases, as the first step, the model is adjusted from historical in-control samples and, in the next step, future samples are monitored through those charts. To accomplish that goal there are a wide range of propositions using the Autoregressive Moving Average (ARMA) and Vector Autoregressive (VAR) models. The good illustration can be seen in a case study presented in [Montgomery, 2007], in which a Shewhart control chart is used to monitor the residuals of the AR(1) model. In the context of a multivariate continuous process, the same goal is accomplished by using the VAR model for uncorrelated data and the Hotelling-based control chart for monitoring the vector of residuals. Some papers presenting the ground theory and additional contributions in such direction can be found inSnoussi (2011), Pan and Jarrett (2012), Vanhatalo and Kulahci (2015), Leoni et al. (2015) and Simões et al. (2016). We can also find in the literature approaches for monitoring fitted values from those models. In this case, fitted values from new samples are monitored by using the exponentially weighted moving average (EWMA) based control charts. A good review of those procedures can be seen in Alwan and Roberts (1988), Jiang et al. (2000), Jiang and Tsui (2001), Jiang (2007), De la Torre Gutiérrez and Pham (2018), De la Torre-Gutiérrez and Pham (2019), Costa and Fichera (2021), Sheikhrabori and Aminnayeri (2021), Nguyen et al. (2021) and Jafarian-Namin et al. (2021).
There are a few works in the literature presenting approaches for batch process monitoring based on time series models. The reason why this topic indeed hasn’t been fully explored yet is that the time series inference theory is not based on replications of time series (each one bringing successive measurements of processes variables), as it is in batch process data. It is based on the variability in a time domain. This fact demands some adaptations of this theory in order to accommodate the coefficient estimates, taking into account the number of batch samples (batch to batch variability - batch domain analysis). The main problem is to build a single estimate for the model coefficients given a number of time series available, combining information from the batch and time domains. We highlight the work of Choi et al. (2008), which propose a set of charts based on the traditional Hotelling statistic for the VAR residuals and fitted vector of observations, obtained thought the adjusted VAR from the historical time series samples batches in a reduced variable space. In this approach, the VAR coefficient estimates are done by using the Partial Least Square regression technique instead of by using the VAR estimation theory. Wang et al. (2017) propose a group of control charts based on the 2D ARMA model. That model formulation try to capture the within-batch and batch-to-batch variability. For each batch they use the iterative step-wise regressions (SWR) and the least absolute shrinkage and selection operator (LASSO) to identify the model order and select the model coefficients. Future batch samples are monitored through control charts based in two stability index built from the coefficients estimates (the batch-to-batch and the within-batch index). Marcondes Filho and Valk (2020) presented a recent approach to deal with batch processes using VAR models focused on the VAR coefficients directly. In short, the VAR coefficients are estimated for each historical time series sample batch and by using a single estimate [as a combination of individual ordinary least square (OLS) estimates from each batch], the Hotelling and the Generalized Variance
Generalized Variancecontrol charts are used for monitoring new batches.
This paper proposes a new approach grounded in a group of control charts based on the classical ARMA model for dynamic monitoring and diagnostic of batch processes. Considering one variable at the time, we present a control approach based on the ARMA coefficients. The model coefficients are estimated for each historical time series sample batch and the control charts based on the modified Hotelling distribution are used to monitor new batches. Additionally, we extend this idea by using a group of control charts, one for each ARMA coefficient, based on the modified t-Stutent distribution, in order to help the diagnostic of disturbances detected by the Hotelling chart. There are two meaningful contributions in our proposition. The first one is that through the modified Hotelling and t-Stutent distributions the model coefficient estimates generated from the number of historical batches can be easily accommodated. There is no need to use any estimation method based on complex algorithms and so deal with convergence and computational time issues. Also, the derived exact distributions makes the control charts capable to detect disturbances of any level, even in a scenario in which there are few in-control batch samples available. The second one is that, unlike these mentioned approaches, we address the fault diagnose problem by using a group of t-Stutent control charts. We show through a simulated batch process that the proposed approach outperforms a competitor based on the model residuals (the most common approach used in the continuous and batch processes) and it is powerful in terms of disturbance diagnosis. Furthermore, this approach seems to work well when applied in a real data set.
In order to describe our proposition, the paper is organized as follows: Section 2 brings a detailed description of our methodology, including the basis of ARMA models, the Hotelling and t-Stutent modified statistics and the ARMA-based control charts. In Section 3, the proposed approach is illustrated through simulated batch data. Section 4 shows an application in a real data set. Conclusions are presented in Section 5.
2 ARMA-based control approach
2.1 ARMA model
The Autoregressive-Moving-Average (ARMA) model is widely used in time series analysis and forecasting due to the flexibility and suitable statistical properties. The class of ARMA model was popularized by Box et al. (1970) and is characterized by a simple and parsimonious formulation. It combines the Autoregressive (AR) model, which involves regressing a variable on its own lagged values, with moving averages (MA) model, which considers the error term as a linear combination of its own lagged terms. In general, we can write the ARMA model as follows:
where the error term is a white noise process (WN) with zero mean and variance, noted by . This formulation is usually referred as ARMA() since lags of the return are used as well as lags of the error term to specify the linear functional form to be estimated. Let’s consider the vector of parameters
Assuming that the process defined in (1) is causal and invertible, than for time-series of length sampled from this process, the asymptotic distribution (in ) of the OLS estimators , where , is given by Theorem 8.11.1 in Brockwell and Davis (1991). Under suitable conditions, it follows that
where plays the role of the variance and covariance matrix of the estimators and means asymptotic convergence in distribution, when increases.
where is the corresponding element of the main diagonal of the .
2.2 Hottelling and t-Student adjusted distributions
Following the aim of our work, we now assume the scenario with many trials available from an ARMA process, i.e., data samples representing time series. This is a typical context of a batch process that generates data representing trajectories of a variable to be considered under monitoring. In the next Section, the set of ARMA-based control charts for this kind of process will be proposed. They are based on the adaptation of the classical Hotelling statistic Montgomery (2007). In the two theorems below we demonstrate the distribution of the quantities that are the ground of our approach.
Let’s consider time series of length from an ARMA() process in (1), the vector of parameters defined in (2) and the vector of OLS estimates for the sample, both vectors of length . Assume that is an element . Than,
where , and is the Student distribution with degrees of freedom.
Note that are independent and identically distributed (
) random variables withand , for . We must remember the following results of univariate distributions (Mood et al. (1974) ):
If is normally distributed, than .
Let and be independent random variables, where and , than .
Now, we can write
The quantities and can be unfolded like:
The variables and are independent and, for large , and . Consequently, or . From (i),
By it follows that,
In case of data coming from a normally distribute variable, the distribution in (5) becomes an exact distribution rather than asymptotic one.
The same as in Theorem 1. ∎
where and .
Note that are random vectors with and , for . We must remember the following results of multivariate distributions (Johnson et al. (2002)):
If is normally distributed, than ( is the Wishart distribution);
and are independent;
It follows from and that .
Now, let’s find the distribution of :
It follows from (3) that
Finally, from the equation above we can write like:
Now we can note that the quantity
has the following probability distribution:
In case of data coming from a normally distributed variable, the distribution in (6) becomes an exact distribution rather than asymptotic one.
The same as in Theorem 2. ∎
2.3 Dynamic ARMA-based control charts
Consider a historical data set of batches yielding products compliant with specifications. For each batch we have a time series representing the trajectory of one variable, measured at time-instants, from the process under normal regime (in-control sample batches). Let’s assume that the variable dynamics can be described by the ARMA process.
In order to find a reference distribution of the ARMA () coefficient estimates in Phase , we firstly save the OLS vector of estimates for each batch. Considering that and , in the next step, we build the unique estimates of the mean and the covariance of by combining the individual estimates like:
These estimates hold relevant information about the variable dynamic (serial correlation) of the process operating in a normal regime. In Phase we propose one approach based on the modified Hotelling statistic to monitor the future batch samples. We have shown in Theorem 2 that
where . Scores above the percentile in imply that the variable dynamics in a new batch are different to their expected behaviour for the in-control process. Once the chart pointed out a batch sample out of limit, we can investigate the coefficients most affected by using the control chart based on the modified t-Student distribution. We have shown in Theorem 1 that
where and are elements of the vectors and , respectively. is an element of the main diagonal of . Scores above the percentile in can signalize a disturbance in the dynamic caused by the changing in a specific coefficient.
3 Simulation study
In this Section, we generate batch processes in which the dynamic is described by an ARMA(,) model. In order to illustrate our method, we present a Monte Carlo simulation using varieties of this model, including combinations of . Both models with intercept term. The model with the highest number of parameters in this study is an ARMA(2,2), explicitly written as
with the vector of parameters . Table 1 show the set of ARMA parameters for in-control process and the simulation settings. In phase , we considering scenarios with a wide range of disturbances in the intercept term and in the AR/MA part of the model, represented by and , respectively.
|ARMA||ARMA||Disturbed||Disturbance||# Batches||# Batches||Batch||Run|
|0.5||30, 50,||500||100, 200,||1000|
We generate scenarios including different numbers of batches with different time-length (from 100 to 1000). Each scenario was replicated 1000 times. In phase we do variate the number of batches from 30 to 100. The (8) and (9) charts were setting to the false alarm probability of . In phase 500 batches were generated in each scenario. The rate of batches beyond the control () and the index (Average Run Length) were adopted to evaluate the chart’s performance, where . The is the average number of batches until a false alarm (for , ), i.e., points above control limits in the process without disturbances (in-control process). In contrast, is the average number of samples until an out-of-control batch falls outside the control limits. The former is a measure of the chart’s sensibility.
As a benchmark approach, we consider the usual way to build time series-based control charts for monitoring the residuals from the fitted model in phase Montgomery (2007) by adapting this methodology for the case of batch processes. We use as the variable the residual mean from the residuals for each batch, where for the ARMA() model with intercept. We know that if a new batch comes from the in-control process, . We set the limits of the residual control chart using the probability of false alarm of . The EWMA approach Montgomery (2007) is used in order to improve the power of the chart to detect disturbances representing small changes in the residual mean. Simulations and calculations were conducted using R Core Team (2020).
. The tables show the mean and standard deviation ofvalues for each disturbance. The scenarios in Tables 2 and 3 are very similar in terms of results and so they will not be commented apart. These Tables include disturbances in AR coefficient and MA coefficient , each one at the time. The observed is close to the chosen nominal value of , which is consistent with the fact that no disturbance was introduced in the process. The values show that the chart outperforms the in detecting disturbances of different intensities. Additionally, we notice that the degree of detection in chart increases faster as the perturbations get more intense. Even for the higher values of disturbances, the performance of our approach remains better than the residual-based charts. We emphasize here the power of the proposed approach (based on estimates of correlations in ) to capture information about process dynamics.
Table 4 shows the in-control process based on the ARMA(1,1) model and disturbances included in the intercept parameter , in which represent levels of change in the process mean, i.e., those time series are generated from different mean drifts. The is well suitable to capture this kind of change, as expected. Even being considered as the underdog in this scenario, we noticed the performance close to the residual one for the highest number of reference batches with the highest time-instants in any disturbances. In other words, even for the changes in the mean, instead of in the data dynamic, our approach seems to work well.
Tables S1 to S4 in the Supplementary Material show the results of Monte Carlo Simulations for ARMA(2,2), AR(1) and MA(1), respectively. They are very similar compared to the study presented in Tables 2 to 4.
3.1 Monitoring and diagnosis case
This Section is aimed to show the full potential of our approach. For a simulated ongoing batch process we display the group of chart and charts for the individual coefficients in order to show its performance to detect and diagnose the imposed disturbances.
Let’s consider the industrial process generating batches of 200 time-length following the ARMA(1,1) model, with , and set as in Table 1. We built the charts using and considered 30 in-control batches as the reference. We simulate 20 new batches with two levels of disturbances in the parameter: (i) a moderate level (from =0.2 to =0); and (ii) a intense level (from =0.2 to =0.6).
Tables 1 and 2 show the group of control charts for the moderate and intense level of disturbance, respectively. In Table 1 we noticed that the multivariate starts to signalize a number of points out of the limits just after the moderate disturbance has imposed (after the batch). It reinforces a good performance in the simulated study shown in Table 2, as expected. Additionally, from the charts there is a good tip about the source of the disturbance, since the chart for points out some points above the limits. The other two charts remain with nearly all points randomly running within the charts boundaries, as it should be, since there are no disturbances imposed in and .
In Table 2 it becomes even more pronounced since the level of disturbance is higher. We can see nearly all points out of limits in the and chart for . The for show a few points outside the limits after the batch. Those few false alarms are likely to be due to coefficient covariance. In general, these charts seem to work really well to signalize and diagnose the source of disturbance.
In order to illustrate the applicability of our methodology we consider a real dataset of time series representing measurements of engine noise from Time-Series-Classification-Website . We can understand each time series as one batch sampled from an industrial process. Let’s consider only the training dataset which has 3271 batches, each one with 500 time instants, sampled from the process operating under two different conditions, labeled as +1 and -1 with 1755 and 1846 number of batches, respectively. In this application we assume that the group labeled as +1 is the reference group, i.e., sample batches coming from the process operating in a standard condition.
The chart on the left on Figure (3) shows the average behaviour of the autocorrelation function (ACF) for all batches according to their group. The colours green and red represent the reference group (labeled as +1) and the monitoring group (labeled as -1), respectively. We note that the main feature is the cyclical behavior of data in both groups. Following the aim of our methodology, in Phase data from the reference group are modeled by using the class of ARMA models. We know that this class of models is suitable to capture the time series dynamic rather than other features like cycles, trends, etc. For that reason we chose an AR model of high order to capture the cycles as the dynamics. Here an AR(12) was adopted in order to model this feature since the order 12 was necessary and sufficient for getting uncorrelated and normally distributed residuals. The Ljung–Box test pointed out that 100% of the residuals are uncorrelated, non significant to presence of correlation and 95% of the residuals were normally distributed according to the Shapiro-Wilk test for normality, using a significance level of 5%.
Although we have 12 AR coefficients in the model for the reference group, we noticed in Figure (3
) on the right chart that the large portion of them is possibly not significantly different from zero. Thus, proceeding an individual t-test for each coefficient and verifying their statistical significance at a level of 5%, only the first three coefficients of the fitted AR model were significant in 95% of batches. These results are summarized in Table5. For this reason, the control chart was built with the first three coefficients. Its seems to be a good choice insofar as Figure (3) shows a clear visual difference between the means of the adjusted coefficients from group +1 and -1 just in those 3 first AR coefficients (represented by green and red lines).
In order to show the performance of our approach we take randomly in-control batches (i.e., from the ones labeled as +1) as the reference batches to fit the AR(12) model and build the chart. We do variate the study by using values of 10, 30, 100, 200, 300, 500 with 200 replications each. The control limits are set with false alarm probability () of 0.10, 0.05 and 0.01. The remaining 1755 - batches are used to evaluate the empirical false alarm probability. The 1846 batches labeled as -1 are used to evaluate the power of chart.
|False alarm probability ()|
Table 6 summarize the results of chart. The and are the rate of false alarm and disturbed batches detected, respectively. We noticed that the observed (highlighted in the gray line) is closer to the chosen nominal value of as the number of samples increases, which is consistent with the theoretical distribution derived in Theorem 2. The values show the performance of chart to signalize the out-of-control (labeled as -1) batches. As we expected the degree of detection increases as the number of reference batches in phase increases. Even for the very small number of batches compared to the overall number of batches available labeled as +1, the shows a good rate of detection for each false alarm probability .
This paper introduced a new approach to deal with batch processes through a set of ARMA-based control charts. Through in-control batch samples available we fitted the ARMA model and built a group of charts based on the coefficient estimates from historical in-control batches. The modified Hotelling and t-Stutent distributions can easily accommodate those estimates and a decision rule was made for monitoring future samples. Additionally, the modified t-Student charts help to look for the source of disturbances.
The simulated batch process generating samples of time series from an ARMA model was presented. The chart outperforms the traditional competitor based on the residuals for detecting changes of any level in the process dynamic. Furthermore, we have shown how powerful are the individual charts to identify the source of disturbances imposed in the process.
The applicability of our approach was illustrated through a real data set in which the good performance is clearly noticed, even for a very small sample reference batches compared to the overall number of in-control batches available.
Finally, it’s important to noticed that we can built the group of and charts from any ARMA (,) sub model, including only AR() or MA() component with the order less or equal to or , with and without intercept. It opens the applicability of this approach to a wide range of batch processes.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
- Supplementary material:
R-functions containing all methods developed in this article (will be available in the dvqcc package at CRAN).
Dataset used in the application and corresponding script (zip).
- Time-series modeling for statistical process control. Journal of business & economic statistics 6 (1), pp. 87–95. Cited by: §1.
- Forecasting and control. Time Series Analysis 3, pp. 75. Cited by: §2.1.
- Stationary time series. In Time Series: Theory and Methods, pp. 1–41. Cited by: §2.1.
- Dynamic model-based batch process monitoring. Chemical Engineering Science 63 (3), pp. 622–636. Cited by: §1.
- Economic-statistical design of adaptive arma control chart for autocorrelated data. Journal of Statistical Computation and Simulation 91 (3), pp. 623–647. Cited by: §1.
- Identification of patterns in control charts for processes with statistically correlated noise. International Journal of Production Research 56 (4), pp. 1504–1520. Cited by: §1.
A control chart pattern recognition system for feedback-control processes. Expert Systems with Applications 138, pp. 112826. Cited by: §1.
- An integrated quality, maintenance and production model based on the delayed monitoring under the arma control chart. Journal of Statistical Computation and Simulation, pp. 1–25. Cited by: §1.
On-line batch process monitoring using batch dynamic kernel principal component analysis. Chemometrics and Intelligent Laboratory Systems 101 (2), pp. 110–122. Cited by: §1.
- Some properties of the arma control chart. NONLINEAR analysis-theory METHODS & APPLICATIONS 47 (3), pp. 2073–2088. Cited by: §1.
- Average run length computation of arma charts for stationary processes. Communications in Statistics-Simulation and Computation 30 (3), pp. 699–716. Cited by: §1.
- A new spc monitoring method: the arma chart. Technometrics 42 (4), pp. 399–410. Cited by: §1.
- Applied multivariate statistical analysis. Vol. 5, Prentice hall Upper Saddle River, NJ. Cited by: §2.2.
- Multivariate fault isolation via variable selection in discriminant analysis. Journal of Process Control 35, pp. 30–40. Cited by: §1.
- The skipping strategy to reduce the effect of the autocorrelation on the t2 chart’s performance. The International Journal of Advanced Manufacturing Technology 80 (9-12), pp. 1547–1559. Cited by: §1.
- Latent variable iterative learning model predictive control for multivariable control of batch processes. Journal of Process Control 94, pp. 1–11. Cited by: §1.
- Dynamic var model-based control charts for batch process monitoring. European Journal of Operational Research 285 (1), pp. 296–305. Cited by: §1.
- Introduction to statistical quality control. John Wiley & Sons. Cited by: §2.2, §3.
- (1974), introduction to the theory of statistics. McGraw-Hill. Cited by: §2.2.
- The effect of autocorrelation on the shewhart-rz control chart. arXiv preprint arXiv:2108.05239. Cited by: §1.
- Multivariate spc charts for monitoring batch processes. Technometrics 37 (1), pp. 41–59. Cited by: §1.
Why and how to use vector autoregressive models for quality control: the guideline and procedures. Quality & Quantity 46 (3), pp. 935–948. Cited by: §1.
- ASCS online fault detection and isolation based on an improved mpca. Chinese Journal of Mechanical Engineering 27 (5), pp. 1047–1056. Cited by: §1.
- Fault detection in batch processes through variable selection integrated to multiway principal component analysis. Journal of Process Control 80, pp. 223–234. Cited by: §1.
- R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. External Links: Cited by: §3.
- Maximum likelihood estimation of the change point in stationary state of auto regressive moving average (arma) models, using svd-based smoothing. Communications in Statistics-Theory and Methods, pp. 1–19. Cited by: §1.
- Synthetic charts to control bivariate processes with autocorrelated data. Computers & Industrial Engineering 97, pp. 15–25. Cited by: §1.
- SPC for short-run multivariate autocorrelated processes. Journal of Applied Statistics 38 (10), pp. 2303–2312. Cited by: §1.
-  () Time series classification website. Note: http://www.timeseriesclassification.com/description.php?Dataset=FordA(Accessed on 07/08/2021) Cited by: §4.
- Dynamic model-based fault diagnosis for (bio) chemical batch processes. Computers & chemical engineering 40, pp. 12–21. Cited by: §1.
- The effect of autocorrelation on the hotelling t2 control chart. Quality and Reliability Engineering International 31 (8), pp. 1779–1796. Cited by: §1.
- Dynamic hypersphere based support vector data description for batch process monitoring. Chemometrics and Intelligent Laboratory Systems 172, pp. 17–32. Cited by: §1.
- Efficient iterative dynamic kernel principal component analysis monitoring method for the batch process with super-large-scale data sets. ACS omega 6 (15), pp. 9989–9997. Cited by: §1.
- Stability monitoring of batch processes with iterative learning control. Advances in Mathematical Physics 2017. Cited by: §1.
- Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemometrics and intelligent laboratory systems 44 (1-2), pp. 331–340. Cited by: §1.