 # Likelihood Estimation with Incomplete Array Variate Observations

Missing data is an important challenge when dealing with high dimensional data arranged in the form of an array. In this paper, we propose methods for estimation of the parameters of array variate normal probability model from partially observed multiway data. The methods developed here are useful for missing data imputation, estimation of mean and covariance parameters for multiway data. A multiway semi-parametric mixed effects model that allows separation of multiway covariance effects is also defined and an efficient algorithm for estimation is recommended. We provide simulation results along with real life data from genetics to demonstrate these methods.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A vector is a one way array, a matrix is a two way array, by stacking matrices we obtain three way arrays, etc, … Array variate random variables up to two dimensions has been studied intensively in

Gupta and Nagar (2000) and by many others. For arrays observations of or in general dimensions probability models with Kronecker delta covariance structure have been proposed very recently in (Akdemir and Gupta (2011), Srivastava et al. (2008a) and Ohlson et al. (2011)

). The estimation and inference for the parameters of the array normal distribution with Kronecker delta covariance structure, based on a random sample of fully observed arrays

can been accomplished by maximum likelihood estimation (Srivastava et al. (2008b), Akdemir and Gupta (2011), Srivastava et al. (2008a) and Ohlson et al. (2011)) or by Bayesian estimation (Hoff (2011)).

Array variate random variables are mainly useful for data that can naturally be arranged in array form. Some examples include two-three dimensional image-video data, spatial-temporal data, repeated measures data. It is true that any array data can be also be represented uniquely in the vector form and a more general covariance structure can be assumed for this vector representation. However, the model with the Kronecker structure is far more parsimonious and usually proves to be the better model. When the dimensions coincide with conceptually separable dimensions, we gain further insight about the individual dimensions.

The array variate data models and the estimation techniques we have mentioned above assume that we have a a random sample of fully observed arrays. However, in practice most array data come with many missing cells. The purpose of this article is to develop likelihood based methods for estimation and inference for a class of array random variables when we only have partially observed arrays in the random sample. In addition to the estimation of mean and covariance parameters for multi-way data we obtain estimated values for the missing cells, this provides a generalization of regression to multi-way data.

Another novelty in this article involves the definition and development of a multiway mixed effects model. This model is usefull for analyzing multiway response variables that depends on seperable effects and through it we can incorporate the known covariance structures along some dimensions of the response and we can estimate the unknown covariance components. In general the known covariance components are calculated using the variables that define the levels of the corresponding array dimensions.

The remaining of the article is organized as follows: In Section 2, we introduce the normal model for array variables. In Section 3, we introduce the updating equations for parameter estimation and missing data imputation. In Section 4, the basic ”Flip-Flop” algorithm is introduced. Section 5

, we define a semi-parametric array variate mixed model with Kronecker covariance structure and an efficient algorithm for the estimation of variance components is described. Examples illustrating the use of these methods are provided in Section

6, followd by our conclusions in Section 7.

## 2 Array Normal Random Variable

The family of normal densities with Kronecker delta covariance structure are given by

 ϕ(˜X;˜M,A1,A2,…Ai)=exp(−12∥(A−11)1(A−12)2…(A−1i)i(˜X−˜M)∥2)(2π)(∏jmj)/2|A1|∏j≠1mj|A2|∏j≠2mj…|Ai|∏j≠imj (1)

where are non-singular matrices of orders the R-Matrix multiplication (Rauhala (2002)) which generalizes the matrix multiplication (array multiplication in two dimensions) to the case of -dimensional arrays is defined element wise as

 ((A1)1(A2)2…(Ai)i˜Xm1×m2×…×mi)q1q2…qi
 =m1∑r1=1(A1)q1r1m2∑r2=1(A2)q2r2m3∑r3=1(A3)q3r3…mi∑ri=1(Ai)qiri(˜X)r1r2…ri

and the square norm of is defined as

 ∥˜X∥2=m1∑j1=1m2∑j2=1…mi∑ji=1((˜X)j1j2…ji)2.

Note that R-Matrix multiplication is sometimes referred to as the Tucker product or mode product (Kolda (2006)).

An important operation with the array is the matricization (also known as unfolding or flattening) operation, it is the process of arranging the elements of an array in a matrix. Matricization of an array of dimensions along its th dimension is obtained by stacking the dimensional column vectors along the th in the order of the levels of the other dimensions and results in a matrix.

The operator describes the relationship between and its mono-linear form where is the column vector obtained by stacking the elements of the array in the order of its dimensions; i.e., where

The following are very useful properties of the array normal variable with Kronecker Delta covariance structure (Akdemir and Gupta (2011)).

If then

###### Property 2.2

If then and

In the remaining of this paper we will assume that the matrices

are unique square roots (for example, eigenvalue or Chelosky decompositions) of the positive definite matrices

for and we will put

## 3 Updates for missing values and the parameters

Using linear predictors for the purpose of imputing missing values in multivariate normal data dates back at least as far as (Anderson (1957)). The EM algorithm (Dempster et al. (1977)) is usually utilized for multivariate normal distribution with missing data. The EM method goes back to (Orchard and Woodbury (1972)) and (Beale and Little (1975)). Trawinski and Bargmann (1964) and Hartley and Hocking (1971) developed the Fisher scoring algorithm for incomplete multivariate normal data. The notation and the algorithms described in this section were adopted from Jørgensen and Petersen (2012).

Let be a dimensional observation vector which is partitioned as

 [RM]x=[xrxm]

where and represent the vector of observed values and the missing observations correspondingly. Here

 [RM]

is an orthogonal permutation matrix of zeros and ones and

 x=[RM]′[xrxm].

The the mean vector and the covariance matrix of are given by

 [RM]E(x)=[μrμm]

and

 [RM]cov(x)[RM]′=[ΣrrΣrmΣmrΣmm]

correspondingly.

Let be a random sample of array observations from the distribution with density Let the current values of the parameters be

The mean of the conditional distribution of given the estimates of parameters at time can be obtained using

 rvec(ˆ˜Xlt)=rvec˜Mt+ΛtR′l(RlΛtR′l)−1(xrl−Rlrvec(˜Mt)). (2)

The updating equation of the parameter is given by

 rvec(˜Mt+1) = 1NN∑l=1rvec(ˆ˜Xlt). (3)

To update the covariance matrix along the kth dimension calculate

 ˜Z=(A−11)1(A−12)2…(Ak−1)−1)k−1(Imk)k(A−1k+1)k+1…(A−1i)i(ˆ˜Xt−˜M)

using the most recent estimates of the parameters. Assuming that the values of the parameter values are correct we can write, , i.e., where denotes the matrix obtained by stacking the elements of along the th dimension. Therefore, can be treated as a random sample of size from the -variate normal distribution with mean zero and covariance An update for can be obtained by calculating the sample covariance matrix for

 ˜Σkt+1 = 1N∏j≠kmjN∏j≠kmj∑q=1Z(k)qZ′(k)q. (4)

## 4 Flip-Flop Algorithm for Incomplete Arrays

Inference about the parameters of the model in (1) for the matrix variate case has been considered in the statistical literature (Roy and Khattree (2003), Roy and Leiva (2008), Lu and Zimmerman (2005), Srivastava et al. (2008b), etc.). The Flip-Flop Algorithm Srivastava et al. (2008b) is proven to attain maximum likelihood estimators of the parameters of two dimensional array variate normal distribution. In (Akdemir and Gupta (2011), Ohlson et al. (2011) and Hoff (2011)), the flip flop algorithm was extended to general array variate case.

For the incomplete matrix variate observations with Kronecker delta covariance structure parameter estimation and missing data imputation methods have been developed in Allen and Tibshirani (2010).

The following is a modification of the Flip-Flop algorithm for the incomplete array variable observations:

Algorithm for estimation 1:

Given the current values of the parameters, repeat steps 1 and 2 until convergence:

1. Update using (2),

2. Update using (3),

3. For update using (4).

###### Theorem 4.1

In sufficient number of steps the Flip-Flop algorithm will converge to parameter values that maximize the likelihood function for model in 1.

###### Proof 4.1

In the the first step of the algorithm, we calculate the expected values of the missing cells given the last updates of the parameters. In the second step, we calculate the value of the mean parameter that maximizes the likelihood function given the expected values of the response and the last updates for the covariance parameters. In the third step, for eack the likelihood function for is concave given the other parameters and the current expectation of the response, i.e., we can find the unique global maximum of this function with respect to

and a step that improves the likelihood is taken. Therefore, our algorthm is, in fact, a generalized expectation maximization (GEM) algorithm which will converge to the parameter values that maximize the likelihood function by the result in Dempster at. al. (????).

## 5 A semi-parametric mixed effects model

A semi-parametric mixed effects model (SPMM) for the response vector is expressed as

 y=Xβ+Zg+e (5)

where is the mean vector, is the design matrix for the random effects; the random effects are assumed to follow a multivariate normal distribution with mean and covariance

 (σ2gK00σ2eIn)

where is a kernel matrix. In general, the kernel matrix is a non-negative definite matrix that measures the known degree of relationships between the random effects. By the property of the multivariate normal distribution, the response vector has a multivariate normal distribution with mean and covariance where

The parameters of this model can be obtained maximizing the likelihood or the restricted likelihood (defined as the likelihood function with the fixed effect parameters integrated out (Dempster 1981) ). The estimators for the coefficients of the SPMM in (5) can be obtained via Henderson’s iterative procedure. Bayesian procedures are discussed in detail in the book by Sorensen & Gianola. Here, we will adopt an efficient likelihood based algorithm (the efficient mixed model association (EMMA)) that was described in Kang et al. (2007). Following their discussion, the log-likelihood for the SPMM in (5) can be written as

 ℓ(y,β,σg,λ=σ2e/σ2g)=12[−nlog(2πσ2g)−log|H|−1σ2g(y−Xβ)′H−1(y−Xβ)]

where The likelihood function is obtimized at

 ˆβ=(XH−1X′)−1X′H−1y

and

 ˆσ2g=(y−X^β)′H−1(y−X^β)n

for fixed values of Using the spectral decompositions of and where is the rank of and letting the log-likelihood function for at and can be written as

 l(λ) = 12[−nlog2π(y−X^β)′H−1(y−X^β)n−log|H|−n] (6) = 12[nlogn2π−n−nlog(n−q∑s=1η2sτs+λ)−n∑i=1log(ϵi+λ)]

which can be maximized using univariate optimization.

When there are more than one sources of variation acting upon the response vector we may want to separate the influence of these sources. For such cases, we recommend using the following multi-way random effects model based on the multi-way normal distribution in Definition 1.

###### Definition 5.1

A multi-way random effects model (AVSPMM) for the response array can be expressed as

 ˜Y∼ϕ(˜M(X),σ(K1+λ1Im1)1/2,(K2+λ2Im2)1/2,…,(Ki+λiImi)1/2) (7)

where is an dimensional mean function of the observed fixed effects and are dimensional known kernel matrices measuring the similarity of the levels of the random effects. The parameters of the model are and for If the covariance structure along the th dimension is unknown then the covariance along this dimension is assumed to be an unknown correlation matrix, i.e., we replace the term by a single covariance matrix say

The parameter is arbitrarily associated with the first variance component and measures the total variance in the variable explained by the similarity matrices represents the error to signal variance ratio along the th dimension. For the identibility of the model additional constraints on the covariance parameters are needed. Here, we adopt the restriction that the first diagonal element of the unknown covariance matrices is equal to one.

It is insightful to write the covariance structure for the vectorized form of the 2-dimensional array model: In this case,

 cov(rvec(Y)) = σ2(K2+λ2Im1)⊗(K1+λ1Im2) (8) = σ2(K2⊗K1+λ1K2⊗Im1+λ2Im2⊗K1+λ1λ2Im1m2).

If the covariance structure along the second dimension is unknown then the model for the covariance of the response becomes

 cov(rvec(Y)) = σ2(K2+λ2Im1)⊗Σ2 (9) = σ2(Σ2⊗K1+λ1Σ2⊗Im1).

It should be noted that the SPMM is related to the reproducing kernel Hilbert spaces (RKHS) regression so as the AVSPMM.

### 5.1 The mean and the covariance parameters

A simple model for the mean is given by

 ˜M=(β1)111×m2×m3×…×mi+(β2)21m1×1×m3×…×mi+…+(βi)i1m1×m2×m3×…×1 (10)

where the for are the coefficient vectors and the notation refers to an dimensional array of ones. Element wise, this can be written as

 (˜M)q1q2…qi=(β1)q1+(β2)q2+…+(βi)qi.

For the 2 dimensional arrays this model of the mean reduces the the one recommended in Allen and Tibshirani (2010). For the models for the fixed effects variables are implicitly the effects of levels of the separable dimensions and some of which might be excluded by fixing then at zero at the modelling stage.

Let be a random sample of array observations from the distribution with density where has the parametrization in (10). In this case, the variable

 ˜Z=(A−11)1(A−12)2…(A−1i)i(˜X−˜M(β1,β2,…βk=0,…,βi))

has density Let denote the matrix obtained by matricization of along the th dimension. Therefore the corresponding random sample provides a random sample of size from the -variate normal distribution with mean and covariance Hence, the likelihood estimator of is given by

 1N∏j≠kmjN∏j≠kmj∑q=1zq. (11)

Assume that the mean and all variance parameters other than are known. By standardizing the centered array variable in all but the th dimension followed by matricization along the same dimension and finally vectorization (denote this vector by ), we obtain a multivariate mixed model for which estimates for can be obtained efficiently by using a slight modification of EMMA ( Kang et al. (2007)) method. The distribution of the is

 ϕN∏ij=1mj(0,σ2(IN∏j≠kmj⊗Kk+λkI)).

Let The likelihood function is optimized at

 ˆσ2=z′(k)H−1kz(k)N∏ij=1mj

for fixed values of Using the spectral decomposition of and letting the log-likelihood function for at can be written as

 l(λ) = 12⎡⎣−n∗log2πz′(k)H−1kz(k)n∗−log|Hk|−n∗⎤⎦ (12) = 12[n∗logn∗2π−n∗−n∗log(n∗∑i=1η2iϵi+λk)−n∗∑i=1log(ϵi+λk)]

which can be maximized using univariate optimization. An additional efficiency is obtained by considering the singular value decomposition of a Kronecker product:

That is, the the left and right singular vectors and the singular values are obtained as Kronecker products of the corresponding matrices of the components. Therefore, we can calculate the eigenvalue decomposition of efficiently using

 Hk=(I⊗Uk)(I⊗(Dk+λkI))(I⊗Uk)′ (13)

where is the eigenvalue decomposition of and is the eigenvalue decomposition of

Algorithm for estimation 2:

Given the current values of the parameters, repeat steps 1 and 2 until convergence:

1. Update using (2),

2. Update using (11) using the imputed arrays ,

3. For update using (12) and (13) if is known, otherwise use (4).

## 6 Illustrations

###### Example 6.1

For this first example we have generated a random sample of dimensional array random variables according to a known array variate distribution. After that we have randomly deleted a given proportion of the cells of these arrays. The algorithm for estimation 1 was implemented to estimate the parameters and to impute the missing cells. Finally the correlation between the observaed values of the missing cells and the imputed values and the mean squared error (MSE) of the estimates of the overall Kronecker structured covariance matrix is calculated. We have tried sample sizes of and and the missing data proportions of and The correlations and the MSE’s were calculated for 30 independent replications and these results are presented in Figure 1. As expected, the solutions from our methods improve as the sample size increase or when the proportion of missing cells decrease. Figure 1: The boxplots of the correlations (left) and the MSEs (right) for varying values of the sample size and missing cell proportions. As expected the solutions from our methods improve as the sample size increase (top to bottom) or when the proportion of missing cells decrease (left to right).
###### Example 6.2

In an experiment conducted in Aberdeen during 2013, 524 barley lines from the North American Small Grain Collection were grown using combinations of two experimental factors. The levels of the first factor were the low and normal nitrogen and the levels of the second experimental factor were dry and irrigated conditions. The low nitrogen and irrigation combination was not reported. Five traits, i.e., plant height, test weight, yield, whole grain protein and heading date (Julian) were used here. We have constructed an incomplete array of dimensions from this data and induced additional missingness by randomly selecting a proportion () of the cells at random and deleting the recorded values in these cells (regardless of whether the cell was already missing). In addition, 4803 SNP markers were available for all of the 524 lines which allowed us to calculate the covariance structure along this dimension, the covariance structure along the other dimensions were assumed unknown. An additive mean structure for the means of different traits was used and all the other mean parameters related to the other dimensions were assumed to be zero. For each trait the correlation between the observed and the corresponding estimated values was calculated for 30 independent replications of this experiment with differing proportion of missing values and these are summarized in Figure 2. The results indicate that our methods provide a means to estimating the traits which were generated by the combined effect of genetics and environment. Figure 2: The accuracies for the scenario in Example 2 summarized with the boxplots. The number of missing cells is highest for the bottom figure and lowest for the top figure.
###### Example 6.3

In this example, we have used the data from an experiment conducted over two years. 365 lines from the spring wheat assocation mapping panel were each observed for three agronomical traits( plant height, yield, physiological maturity date) in two seperate year/location combinations under the irrigated and dry conditions. A relationship matrix was obtained using 3735 SNP markers in the same fashion as Example 2. However, since we wanted to study the effect of the number of different genotypes on the accuracies we have selected a random sample of genotypes out of the 365 where was taken as one of The phenotypic data was used to form a array. The entry in each cell as deleted with probabilities and Finally, within trait correlations between the missing cells and the corresponding estimates from the AVSPMM over 30 replications of each of the settings of this experiment are summarized by the boxplots in Figure 3. Figure 3: The accuracies for the scenario in Example 3 summarized with the boxplots. The number of missing cells decreases from left to right and p1 increases from top to bottom.
###### Example 6.4

This data involves simulations from a known AVSPMM model for a array, sample size . We demonstrate that the MSE for the overall covariance decreases with increasing where stands for the number of levels of the dimension for which the covariance structure is available in the estimation process. array, sample size After generating the array variate response we have deleted with probability or This was replicated 30 times. The correlations between the estimated response and the corresponding known (but missing) cells and the MSE betveen the estimated and the known covariance parameters are displayed in Figure 4. Figure 4: The figures on the left displays the MSE between the estimated and the known covariance parameters and the figures on the right display the correlations between the estimated response and the corresponding known (but missing) cells for p1=100,200,300 increasing downwards and probability of missingness .6,.5,.4,.2,.1. decreasing towards the right.

## 7 Conclusions

We have formulated a parametric model for array variate data and developed suitable estimation methods for the parameters of this distribution with possibly incomplete observations. The main application of this paper has been to multi-way regression (missing data imputation), once the model parameters are given we are able to estimate the unobserved components of any array from the observed parts of the array. We have assumed no structure on the missingness pattern, however we have not explored the estimability conditions.

The AVSPMM is a suitable model when the response variable is considered transposable. This allows us to separate the variance in the array variate response into components along its dimensions. This model also allows us to make predictions for the unobserved level combinations of the dimensions as long as we know the relationship of these new levels to the partially observed levels along each separate dimension.

The methods developed here use the assumption that the data is generated from a distribution with Kronecker delta covariance structure. The suitability of this model to any data set is questionable. The choice of model and determination of its order could be accomplished using a model selection criteria based on the likelihood function which is available in in this paper.

## References

• Akdemir and Gupta  D. Akdemir and A. K. Gupta. Array variate random variables with multiway kronecker delta covariance matrix structure. Journal of Algebraic Statistics, 2(1):98–113, 2011.
• Allen and Tibshirani  G.I. Allen and R. Tibshirani. Transposable regularized covariance models with an application to missing data imputation. The Annals of Applied Statistics, 4(2):764–790, 2010.
• Anderson  T.W. Anderson. Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the american Statistical Association, 52(278):200–203, 1957.
• Beale and Little  E.M.L. Beale and R.J.A. Little.

Missing values in multivariate analysis.

Journal of the Royal Statistical Society. Series B (Methodological), pages 129–145, 1975.
• Dempster et al.  A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1–38, 1977.
• Gupta and Nagar  A.K. Gupta and D.K. Nagar. Matrix Variate Distributions. Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics. Chapman and Hall, 2000.
• Hartley and Hocking  HO Hartley and RR Hocking. The analysis of incomplete data. Biometrics, pages 783–823, 1971.
• Hoff  P.D. Hoff. Hierarchical multilinear models for multiway data. Computational Statistics & Data Analysis, 55(1):530–543, 2011.
• Jørgensen and Petersen  Bent Jørgensen and Hans Chr Petersen. Efficient estimation for incomplete multivariate data. Journal of Statistical Planning and Inference, 142(5):1215–1224, 2012.
• Kolda  T.G. Kolda. Multilinear operators for higher-order decompositions. United States. Department of Energy, 2006.
• Lu and Zimmerman  N. Lu and D.L. Zimmerman. The Likelihood Ratio Test for a Separable Covariance Matrix. Statistics & Probability Letters, 73(4):449–457, 2005.
• Ohlson et al.  M. Ohlson, M. Rauf Ahmad, and D. von Rosen. The multilinear normal distribution: Introduction and some basic properties. Journal of Multivariate Analysis, 2011.
• Orchard and Woodbury  T. Orchard and M.A. Woodbury. A missing information principle: theory and applications. In Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 697–715, 1972.
• Rauhala  U.A. Rauhala.

Array Algebra Expansion of Matrix and Tensor Calculus: Part 1.

SIAM Journal on Matrix Analysis and Applications, 24:490, 2002.
• Roy and Khattree  A. Roy and R. Khattree. Tests for Mean and Covariance Structures Relevant in Repeated Measures Based Discriminant Analysis. Journal of Applied Statistical Science, 12(2):91–104, 2003.
• Roy and Leiva  A. Roy and R. Leiva. Likelihood Ratio Tests for Triply Multivariate Data with Structured Correlation on Spatial Repeated Measurements. Statistics & Probability Letters, 78(13):1971–1980, 2008.
• Srivastava et al. [2008a] MS Srivastava, T. Nahtman, and D. von Rosen. Estimation in General Multivariate Linear Models with Kronecker Product Covariance Structure. Research Report Centre of Biostochastics, Swedish University of Agriculture science. Report, 1, 2008a.
• Srivastava et al. [2008b] M.S. Srivastava, T. von Rosen, and D. Von Rosen. Models with a Kronecker Product Covariance Structure: Estimation and Testing. Mathematical Methods of Statistics, 17(4):357–370, 2008b.
• Trawinski and Bargmann  I.M. Trawinski and RE Bargmann. Maximum likelihood estimation with incomplete multivariate data. The Annals of Mathematical Statistics, 35(2):647–657, 1964.