1 Introduction
A linear regression model is a commonly used statistical tool to analyze the relationship between the response (dependent) and some explanatory (independent) variables. In these models, the error term is usually considered a zeromean and constantvariance random variable independent of the other error terms. However, observations collected over time are often autocorrelated rather than independent. Examples of this data abound in economics, medicine, environmental monitoring, and social sciences. The study of the dependence among observations is of considerable practical interest
(see tsay2005analysis; box2015time).A stochastic model that has been successfully used in many realworld applications to deal with autocorrelated structures in the error term is the autoregressive (AR) model. In the AR model, the current value of the process is expressed as a finite linear combination of previous values of the process and a stochastic shock of disturbance. The shock of disturbance is also known as innovation in the time series literature. In general, it is assumed that the disturbance is normally distributed. For example,
beach1978maximum suggested a maximum likelihood (ML) estimation method to estimate the parameters of AR(1) error term regression models. ullah1983estimationused the twostage PraisWinsten estimator for a linear regression model with firstorder autocorrelated disturbances. The authors considered a Gaussian distribution for the innovations to derive the largesample asymptotic approximation for the variancecovariance matrix.
alpuim2008efficiencyproposed a linear regression model with the sequence of error terms following a Gaussian autoregressive stationary process, in which the parameters of the model are estimated using ML and ordinary least squares (OLS) estimation. Nevertheless, in statistical modeling of datasets involving errors with outliers and a stronger possibility of extreme values, the normality assumption is not satisfied. In these cases, heavytailed distributions may be considered. In this context,
tuacc2018robust proposed considering thedistribution with known degrees of freedom as an alternative to the Normal distribution and applying the conditional maximum likelihood (CML) method to find the estimators of the model parameters.
Moreover, modeling data with AR errors can be challenging when some observations are censored or missing. In the first case, values can occur out of the range of a measuring instrument, and in the second, the value is unknown. Censored time series data are frequently encountered in environmental monitoring, medicine, and economics. In the literature, there are some works related to censored autoregressive linear regression models with Normal innovations. For example, zeger1986regression proposed a Gaussian censored autocorrelated model with parameters estimated by maximizing the full likelihood using the quasiNewton and the EMalgorithms. The authors pointed out that the quasiNewton approach is preferable because it estimates the variancecovariance matrix for the parameters. park2007censored
developed a censored autoregressive moving average (CENARMA) model with Gaussian white noise, in which the algorithm works iteratively imputing the censored or missing observations by sampling from the truncated Normal distribution. Then the parameters are estimated by maximizing an approximate full likelihood function. Besides,
wang2018quasi suggested a quasilikelihood method using the completeincomplete data framework. Recently, schumacher2017censored considered an autoregressive censored linear model with the parameters estimated via the SAEM algorithm in which the conditional mean is used to impute the censored observations. Lastly, liu2019parameter proposed a coupled MCMCSAEM algorithm to fit an AR regression model with Student innovations and only missing data, i.e., no censoring was considered.During the bibliographic review, no studies were found that consider the Student
distribution for the innovations in censored autoregressive models on a likelihoodbased perspective. Hence in this work, we propose an Expectation Maximization (EM)type algorithm to estimate the parameters of a censored regression model with autoregressive errors and innovations following a Student
distribution. The SAEM algorithm was employed to obtain the ML estimates because of the difficulties in computing the Estep in the EM algorithm (dempster1977maximum). As it was demonstrated by jank2006implementing, the computational effort of SAEM is much smaller and achieves convergence in just a fraction of the simulation size when compared to the Monte Carlo EM algorithm (wei1990monte), thanks to a recycling of the simulated variables from one iteration to the next.The paper is organized as follows. Section 2 shows a brief description of autoregressive regression models of order and the SAEM algorithm. Section 3 is devoted to formulate the censored autoregressive model with Student innovations. It also provides the expressions used to estimate the parameters of the proposed model, the method used to estimate the observed information matrix (louis1982finding), and the expression to make predictions. Section 4 displays the results of two simulation studies carried out to examine the asymptotic properties of the estimates, and to demonstrate the robustness of the model. Section 5 shows the estimates for the parameters of the total phosphorus concentration dataset, available in the R package ARCensReg (schumacher2016package)
, together with the analysis of quantile residuals. Section
6 concludes with a discussion.2 Preliminaries
We begin by defining some notations and introducing the basic definitions and algorithms used throughout this work. The Normal distribution with mean and variance is denoted by , the notation for a variate Normal distribution with mean and variancecovariance matrix is , and a truncated multivariate Normal distribution with truncation region is denoted by
. The notation for a random variable with Gamma distribution is given by
, where is the shape parameter and is the rate parameter. The Student distribution with location parameter , scale parameter and degrees of freedom is denoted by . A random variablewith uniform distribution on the interval
is denoted by . The expression “” means independent and identically distributed,represents the transpose of a matrix or a vector,
is the gamma function evaluated at point , and is an increasing sequence of field such that , with being a sequence of random variables. We use the index to represent a specific vector or matrix, i.e., and where is the th row of matrix X.2.1 The autoregressive regression model of order
Consider a classical linear regression model with the difference that the errors are autocorrelated as a discretetime autoregressive process; thus, the model for an observation at time is given by
(1)  
(2) 
where represents the dependent variable observed at time , is a vector of known covariables, is a vector of unknown regression parameters to be estimated, and is the regression error. The regression error follows an autoregressive model as given in Equation 2, where is a vector of autoregressive coefficients and is a shock of disturbance with distribution . The is also known as the innovation in the time series literature (see, for instance, box2015time, schumacher2017censored, and wang2018quasi).
Now suppose that in Equation 2 follows a Student distribution with location parameter 0, scale parameter and
degrees of freedom, whose probability density function (PDF) will be given by
(3) 
Then the model defined by Equations 13 will be called the autoregressive linear regression model of order (AR) hereinafter. Moreover, the distribution of might be written using a scale mixture of Normal (SMN) distributions as showed in kotz2004multivariate, where can be written as , where , , with being independent of . Consequently, the conditional distribution of given is Normal with zero mean and variance , i.e.,
2.2 The SAEM algorithm
dempster1977maximum introduced a general approach to the iterative computation of maximum likelihood (ML) estimates in problems with incomplete data. This algorithm is called the ExpectationMaximization (EM) algorithm. Let be the complete data, where is the observed data and is the incomplete data. Let be the complete loglikelihood function. The EM algorithm proceeds as follows:

[leftmargin=0.5cm]

Estep: Let be the estimate of at the th iteration. Compute the conditional expectation of the complete loglikelihood function .

Mstep: Maximize with respect to to obtain .
There are situations where the Estep has no analytic form. To deal with these cases, wei1990monte proposed to replace the expectation in the computation of with a Monte Carlo (MC) approximation based on a large number of independent simulations of the missing data. This algorithm was called the Monte Carlo EM (MCEM) algorithm. As an alternative to the computationally intensive MCEM algorithm, delyon1999convergence proposed a scheme that splits the Estep into a simulation step and an integration step, while the maximization step remains unchanged. The SAEM algorithm is performed as follows:

[leftmargin=0.5cm]

Estep:

Simulation: Draw samples of the missing data from the conditional distribution .

Stochastic approximation: Update by
where is a decreasing sequence of positive numbers such that and , is also known as the smoothness parameter (kuhn2005maximum).


Mstep: Update as .
This process is iterated until some distance between two successive evaluations of the actual loglikelihood function becomes small enough.
3 Proposed model and parameter estimation
This section is devoted to formulating the censored autoregressive linear model. Here we specify the loglikelihood function, the algorithm used to overcome the parameter estimation problem, expressions to approximate the standard errors, and expressions to predict at unobserved times when the values of some covariables (if needed) at these times are available.
3.1 The censored ART() model
Assume that the response variable
, in Equation 1, is not fully observed for all times . We observe instead , where represents either an observed value or the limit of detection (LOD) of a censored variable, and is the censoring indicator defined as(4) 
Thereby the model defined by Equations 14 will be called censored autoregressive regression model of order (CAR). In this context, is leftcensored if and , it is rightcensored if and , it is interval censored if and , and represents a missing value if and .
To compute the loglikelihood function of the CAR model, we ignore the marginal distribution of the first observations and we denote by the vector of observed data and by the vector of censored or missing observations, with . Note also that the distribution of conditioning on all the preceding data only depends on the previous observations. Then, given and has a Student distribution with location parameter , scale parameter and degrees of freedom, where is a matrix with the covariates for and is a vector with the covariates related to , i.e.,
Thus the observed loglikelihood function can be computed by
(5) 
3.2 Parameter estimation via the SAEM algorithm
The observed loglikelihood function in Equation 5 involves complex expressions, making work directly with very difficult. Hence to obtain the ML estimates of , we apply an EMtype algorithm considering the hierarchical representation in the SMN distributions mentioned in Subsection 2.1. Treating and as hypothetical missing data, and augmenting those to the observed data vector , where and , the resulting complete data is given by the vector , and consequently, the complete loglikelihood function can be written as
with , , , and being a constant.
Let be the current estimate of . Then the conditional expectation of the complete loglikelihood given the observed data, without the constant is
where
such that , , , , , , and , for all .
It is easy to observe that the Estep reduces to the computation of , , , , , , and , for all . Because of the difficulties in the Estep, we work with a variation of the EM algorithm called the SAEM algorithm. Assuming that is the current estimate of , the SAEM algorithm proceeds as follows:
Step E1 (Sampling). In the simulation step, we generate samples from the conditional distribution of the latent variables via the Gibbs sampling algorithm. Thus, we consider the observed and censored/missing part of y separately, and rearrange the location vector and scale matrix parameters in the conditional distribution of according to the following scheme:

[leftmargin=0.5cm]

Step 2: Sample from , whose elements are independent and Gamma distributed as with , , and , .
Step E2 (Stochastic Approximation). Since the sequence for is available, we replace the conditional expectations in by the following stochastic approximations:
The variable will be considered as suggested by galarza2017quantile, i.e., if and if , where is the maximum number of iterations and is a cutoff point which determines the percentage of initial iterations with nomemory. If , the algorithm will have a memory for all iterations and hence will converge slowly to the ML estimates, and needs to be large. If
, the algorithm will be memoryfree, it will converge quickly to a solution neighborhood, and the algorithm will initiate a Markov chain leading to a reasonably wellestimated mean after applying the necessary
burnin and thinning steps. A number between 0 and 1 will assure an initial convergence in distribution to a solution neighborhood for the first iterations and an almost sure convergence for the rest of the iterations.The selection of and could affect the speed of convergence of the SAEM algorithm. A graphical approach can monitor the convergence of the estimates for all parameters and determine the values for these constants, as lavielle2014mixed suggested. An advantage of the SAEM algorithm is that, even though it performs an MCMC Estep, it requires a small and fixed sample size , making it much faster than the MCEM algorithm.
Step CM (Conditional Maximization). The conditional maximization step is carried out, and is updated by maximizing over to obtain a new estimate , which leads to the expressions:
(6)  
(7)  
(8)  
(9) 
with for all .
3.3 Standard error approximation
The Fisher information matrix is a good measure of the amount of information a sample dataset provides about the parameters, and it can be used to compute the asymptotic variance of the estimators. louis1982finding developed a procedure for extracting the observed information matrix when the EM algorithm is used to find the ML estimates in incomplete data problems.
Let and be the first and negative second derivatives of the completedata loglikelihood function with respect to the parameter vector of the complete data, respectively. Let and be the corresponding first and negative second derivatives of the loglikelihood function of the observed data. Then the observed information matrix is given by
(10) 
delyon1999convergence adapted the method proposed by louis1982finding to compute the observed information matrix when the SAEM algorithm is used to estimate the parameters. For this, it is necessary to compute the auxiliary terms
(11)  
in which is the vector of observations sampled at iteration for all and , where is the maximum number of iteration of the SAEM algorithm and is the number of MC samples for stochastic approximation. The inverse of the limiting value of can be used to assess the dispersion of the estimators (delyon1999convergence).The analytical expressions for the first and second derivatives of the complete data loglikelihood function are given in Appendix A.2.
3.4 Prediction
For generating predicted values from the CAR model, we denote by to the vector of random variables (RVs) with the observed values and by another vector of RVs of length whose realized values we want to predict from the observed ones.
Let be equal to , where is the vector of uncensored observations and is the vector of censored or missing observations. To deal with the censored values existing in , we use an imputation procedure that consists in replacing the censored values with the values obtained in the last iteration of the SAEM algorithm, i.e., , since elements of can also be updated during the Step E2 of the SAEM algorithm as
(12) 
with the same , and settings. The new vector of observed values will be denoted by .
Now, supposing all values in are completely observed, and the explanatory variables for are available, the forecasting procedure will be performed recursively (box2015time) as follows
where is substituted by and is the estimate obtained via the SAEM algorithm. So .
4 Simulation Study
In this section, we examine the asymptotic properties of the SAEM estimates through a simulation study considering different sample sizes and levels of censoring. A second simulation study is performed to demonstrate the robustness of the estimates obtained from the proposed model when the data is perturbed.
4.1 Asymptotic properties (Simulation study 1)
This simulation aims to provide empirical evidence about the consistence of the ML estimates under different scenarios. Hence three datasets with 300 MC samples were generated. The sample size for each dataset was , and . The data was generated from the model specified by Equations 1 and 2, considering a Student distribution for the innovations (’s) with and . The rest of the parameters were set as , , and the covariables at time were , with and .
The idea is to have four scenarios with an average level of censoring/missing of 0%, 5%, 20%, and 35%, respectively. The first case is related to the data without censoring (original data), for the other cases were considered 1.60, 3.45, and 4.30 as LODs, i.e., values lower than the LOD are substituted by the LOD, and additionally 20% of the desired censored rate corresponds to observations randomly selected considered as missing.
Average  Parameters  

LOD  level of  Measure  
censoring  5.00  0.50  0.90  0.40  0.12  2.00  4.00  
100  –  0%  MCMean  4.988  0.501  0.937  0.409  0.093  1.779  4.150 
IMSE  0.325  0.141  0.565  0.086  0.086  0.531  2.612  
MCSD  0.334  0.149  0.567  0.090  0.091  0.465  2.089  
Comments
There are no comments yet.