Described in several text books and used in a wide range of applications, the linear minimum mean square error (LMMSE) estimator [kailath_sayed_hassibi_linear_estimation_2000, kay1993fundamentals]
is one of the fundamental estimation methods of signal processing. Being a Bayesian estimation approach, the parameters of interest are modeled as random variables with some joint probability density function (pdf), based on some background knowledge. The LMMSE estimator is the optimal estimator out of all the possible linear (more precisely affine) estimators in terms of minimizing the mean squared error (MSE), and it only depends on the mean and covariances. If the assumed covariance matrices are inaccurate, which is generally the case for real-world problems, then the performance of the computed LMMSE estimator can be suboptimal. In this article, we focus on characterizing such performance degradation.
We consider the underlying system , where
is the observed output vector,is the matrix of regressors, is some unknown noise vector and the vector denotes the unknown model parameters which we want to estimate. We model and as random vectors, and propose an LMMSE estimation framework which allows us to systematically study the MSE when only a subset of the columns in are available for estimation. In particular, the mismatched estimator is based on the assumed system , where the assumed number of unknowns (length of ) is smaller than the number of unknowns in the underlying system (length of ). We model the regressors in as random variables and derive an analytical expression for the expected MSE of the low order LMMSE estimator, over the distribution of .
A range of methods have been proposed for robustness against uncertainties or model flaws in the LMMSE estimation. Methods to deal with covariance matrix uncertainties have been presented in [lederman_tabrikian_constrained_2006, mittelman_miller_robust_2010, Zachariah_Shariati_Bengtsson_Estimation_2014], and in [Liu_Zachariah_Stoica_Robust_2020] the effect of having missing features, i.e., unknowns, in the underlying model was investigated. Robustness have been also investigated under a classical estimation framework where the unknown is modelled as deterministic, such as for uncertainties in the regressors [eldar_bental_nemirovski_robust_2005]. Further model mismatch trade-offs in classical estimation settings have been studied, focusing on the relationship between model size and number of observations [breiman_how_nodate, belkin_two_2019]. In our setup, only parts of the regressors are available for estimation, and the respective models on and do not match the underlying models and , constituting a hybrid setting with uncertain regressors and a model mismatch in the unknowns and the noise.
We study the average MSE performance under a model mismatch and with an isotropic Gaussian model on the regressors. Our contributions can be summarized as follows: i) Our analytical results show that the MSE depends on the respective signal powers of , and , but not on the general covariance structure of the unknowns . ii) These results quantify how the MSE heavily depends on the relation between the number of samples, the underlying and the assumed model orders: If the number of samples is not sufficiently large, then the performance is not guaranteed to improve by increasing the number of samples or the assumed model complexity. In particular, lowering the assumed model order can improve the performance even when the number of samples is larger than the number of unknowns in the underlying system.
The rest of the paper is organized as follows: In Section II, we provide the problem formulation. In Section III, we present and discuss our main analytical results, which are numerically verified in Section LABEL:sec:numerical. Conclusions are summarized in Section LABEL:sec:conclusions.
Notation: We denote the Moore-Penrose pseudoinverse and the transpose of a matrix as and , respectively. The identity matrix is denoted as . The Euclidean norm and trace operator are denoted by and , respectively. We use the notation or to emphasize that the expectation is taken with respect to the random variable . For two column vectors , , we denote their covariance matrix by . For auto-covariance matrices, we write the subscript only once: .
Ii Problem Statement
Ii-a The Underlying System
The observations come from the following linear system
where denotes the unknowns, denotes the vector of observations, denotes the known matrix of regressors, and denotes the unknown noise. Here, and are modeled as zero-mean uncorrelated random vectors. Note that denotes the th row of , i.e., the regressors corresponding to the observation .
Consider the class of linear estimators, i.e., the estimators such that the estimate is a linear function of the vector of observations , with where . The mean squared error (MSE) associated with is given by
Under the linear model in (1), is found as
The linear minimum MSE (LMMSE) estimator, i.e., the matrix that minimizes the MSE over all , is given by [kailath_sayed_hassibi_linear_estimation_2000]
where we have used the fact that under (1), we have and . In (6) we have used the Moore-Penrose pseudoinverse, rather than the ordinary inverse, which, as discussed in [kailath_sayed_hassibi_linear_estimation_2000, Theorem 3.2.3] will minimize the MSE regardless of whether is singular or not.
Ii-B Model Mismatch and Assumed Model
In this paper, our focus is on estimation under a model mismatch. In particular, we consider the case that the LMMSE estimator relies on an incorrect signal model such that i) only a subset of the unknowns are assumed to be present in the system equation; ii) the assumed covariances are possibly inconsistent with the underlying system in (1). Let this subset of be denoted by and its complement (i.e., the elements of that are not in ) by , where . Let and denote the submatrices of consisting of the columns corresponding to the indices that are in and in , respectively.
The estimator uses the following partial model
where and the noise are assumed to be uncorrelated and zero-mean, and is known. Here, the respective assumed covariance matrices for and the noise are given by and . We have used the notation to emphasize that these covariance matrices are not necessarily the same as the ones that can be derived from (1). Hence, there is a model mismatch between (7) and (1). According to (7), other covariance matrices of interest are given by
Let be an estimate of , where . Then the corresponding MSE for is given by
We note that the estimator in (11) would be the true LMMSE estimator if the observations were in fact generated by the model in (7). However, this is not the case. Here, actually comes from the underlying system in (1), hence the true LMMSE estimate of , minimizing the MSE in (10), is
To summarize our setting, is generated by the system in (1), while the estimation is performed under the assumption that is generated by (7). Hence, the LMMSE estimator in (11) is used instead of the correct estimator in (12). In other words, we consider LMMSE estimation under a model mismatch.
In order to take into account the part of that is not estimated in this partial setting, i.e., , we also define the MSE associated with the whole vector under as
Note that the subscript in emphasizes that the error is over whereas refers to the error in the whole vector . Here, corresponds to the error associated with estimating with while setting the estimate of to .
Ii-C Expected MSE over Regressors
We are interested in the average behaviour of the MSE of the partial LMMSE estimator in (11) over regressor matrices . We model ’s as independent and identically distributed (i.i.d.) Gaussian random vectors, i.e., , with . The expected MSE over the distribution of ’s is given by
Note that this is the expected MSE associated with . Here is a function of (more precisely a function of , a submatrix of ), and varies with . We are interested in how the MSE varies for different choices of , i.e., the number of estimated parameters, and , i.e., the number of samples in . Hence, is defined as a function of these variables.
Here, we analyze the MSE from the perspective of repeated experiments using different matrices , hence we here model
as a random matrix. Nevertheless, note that while doing the LMMSE estimation,and are known in (6) and (11), respectively.
We similarly define the expected MSE associated with the whole vector as
As a part of our analysis of , we also compare it to the expected MSE associated with the full LMMSE estimator
where is the estimator in (6).
Iii Expected MSE under a Model Mismatch
The following result describes the generalization error associated with the partial LMMSE estimator in (11):
With , and , i.e., the noise is assumed to be zero, the partial LMMSE estimator in (11) has the expected MSE
with defined as
Proof: See Section LABEL:proof:thm:mse.
Theorem 1 quantifies the dependence of the expected MSE on the individual powers of , and the noise , i.e., , and . It also reveals that the error does not depend on the covariance between and , or on the general structure of .
Effect of : The factor , hence , can take extremely large values if the number of samples is too close to the number of estimated parameters . We observe that if both and are identically zero, then does not affect the MSE, however this is generally not the case.
We continue the discussion of the behaviour of by considering the following scenarios of versus : i) , and ii) .
i) : Here, the MSE component from is constantly zero, and in (21a) decreases monotonically with an increasing . Hence, if the noise level per sample does not increase with the number of samples, i.e., if doesn’t increase with , then the MSE monotonically decreases with increasing . Regarding the MSE’s dependency on , we will show in Corollary LABEL:col:n_SNR_dependency that if is not large enough, then the expected MSE is not guaranteed to improve with , under some additional constraints.
ii) : The result in Theorem 1 shows that the performance is not guaranteed to improve by having more samples. In (21b), we see that for , increases with , hence the expected MSE can also increase with . In particular, as we will illustrate with numerical examples in Section LABEL:sec:numerical, the power in must be significantly larger than the combined powers of and , in order for the MSE to decrease as increases. For such small , it is also not immediately apparent which choice of gives the lowest MSE. This insight is illustrated in the numerical examples in Section LABEL:sec:numerical.
The following corollary is a special case of Theorem 1 where the powers in and are directly proportional to and and the noise level per sample is constant: