I Introduction
Reduced Rank Regression (RRR) [1, 2]
is a multivariate linear regression model, where the coefficient matrix has a lowrank property. The name of “reduced rank regression” was first brought up by Izenman
[3]. Denote the response (or dependent) variables by and predictor (or independent) variables by , a general RRR model is given as follows:(1) 
where the regression parameters are , and and is the model innovation. Matrix is often called sensitivity (or exposure) matrix and is called factor matrix with the linear combinations called latent factors. The “lowrank structure” formed by
essentially reduces the parameter dimension and improves explanatory ability of the model. The RRR model is widely used in situations when the response variables are believed to depend on a few linear combinations of the predictor variables, or when such linear combinations are of special interest.
The RRR model has been used in many signal processing problems, e.g., array signal processing [4], state space modeling [5], filter design [6], channel estimation and equalization for wireless communication [7, 8, 9], etc. It is also widely applied in econometrics and financial economics. Problems in econometrics were also the motivation for the pioneering work on the RRR estimation problem [1]. In financial economics, it can be used when modeling a group of economic indices by the lagged values of a set of economic variables. It is also widely used to model the relationship between financial asset returns and some related explanatory variables. Several asset pricing theories have been proposed for testing the efficiency of portfolios [10] and empirical verification using asset returns data on industry portfolios has been made through tests for reduced rank regression [11]
. The RRR model is also closely related the vector error correction model
[12] in time series modeling and the latent factors can be used for statistical arbitrage [13] in finance. More applications on the RRR model can be found in, e.g., [14].Like the lowrank structure for factor extraction, rowwise group sparsity on matrix can also be considered to further realize predicting variable selection, which leads to the sparse RRR (SRRR) model [15]. Since can be interpreted as the linear factors linking the response variables and the predictors, the SRRR can generate factors only with a subset of all the predictors. Variable selection is very important target in data analytics since it can help with model interpretability and improve estimation and forecasting accuracy.
In [15], the authors first considered the SRRR estimation problem, where the group sparsity was induced via the group lasso penalty [16]. An algorithm based on the alternating minimization (AltMin) method [17] was proposed. However, the proposed algorithm has a double loop where subgradient or variational method is used for the inner problem solving. Such an algorithm can be very slow in practice due to the doubleloop nature where lots of iterations may be necessary to get an accurate enough solution at each iteration. Apart from that, besides the convex function for sparsity inducing, it is generally acknowledged that a nonconvex sparsityinducing function can attain a better performance [18] which is proposed to use for sparsity estimation in this paper.
In this paper, the objective of the SRRR estimation problem is given as the ordinary least squares loss with a sparsityinducing penalty. An orthogonality constraint is added for model identification purpose
[15]. To solve this problem, an efficient AltMinbased singleloop algorithm is proposed. In order to pursue lowcost updating steps, a majorizationminimization method [19] and a nonconvexity redistribution method [20] are further adopted making the variable updates become two closedform projections. Numerical simulations show that the proposed algorithm is more efficient compared to the benchmarks and the nonconvex function can attain a better estimation accuracy.Ii Sparse Reduced Rank Regression
The SRRR estimation problem is formulated as follows:
(2) 
where
is sample loss function and
is the rowwise group sparsity regularizer. The constraint is added for identification purpose to deal with the unitary invariance of the parameters [15]. We further assume a sample path is available from (1).The least squares loss for the RRR model is obtained by minimizing a sample norm loss as follows^{1}^{1}1In this paper, the intercept term has been omitted without loss of generality as in [15], since it can always be removed by assuming that the response and predictor variables have zero mean.:
(3) 
where and .
Sparse optimization [21] has become the focus of much research interest as a way to to realize the variable selection (e.g., the group lasso method). For a vector , the sparsity level is usually measured by the norm, i.e., . Practically, the norm is used as the tightest convex relaxation to approximate it as in [15]. Although it is easy for optimization and has been shown to favor sparser solutions, the norm can lead to biased estimation with solutions not as accurate and sparse as desired and produce inferior prediction performance [18]. Nonconvex regularizers sacrifice convexity but can have a tighter approximation performance and are proposed for sparsity inducing which outperform the convex norm. In this paper, two nonsmooth sparsityinducing functions denoted by are considered: the nonconvex Geman function [22] and the convex norm. Then, the rowwise group sparsity regularizer induced by is given as follows:
(4) 
where denotes the th row of and is from () and , which are shown in Figure 1.
Based on and , the problem in (2) is a nonconvex nonsmooth optimization problem due to the nonconvex nonsmooth objective and the nonconvex constraint set.
Iii Problem Solving Based on Alternating Minimization
The objective function in problem (2) has two variable blocks . In this section, an alternating minimization (a.k.a. twoblock coordinate descent) algorithm [17] will be proposed to solve it. At the th iteration, this algorithm updates the variables according to the following two steps:
(5) 
where are updates generated at the th iteration.
First, let us start with the minimization step w.r.t. variable when is fixed at , the problem becomes^{2}^{2}2For simplicity, is written as and likewise the fixed variables and/or in other functions will also be reduced in the following.
(6) 
where the “” means “equivalence” up to additive constants. This nonconvex problem is the classical orthogonal Procrustes problem (projection) [23], which has a closedform solution given in the following lemma.
Lemma 1.
[23] The orthogonal Procrustes problem in (6) can be equivalently reformulated into the following form:
where .
Let the thin singular value decomposition (SVD) be
, where and , then the optimal update is given by(7) 
Then, when fixing with , the problem for is
(8) 
which is a penalized multivariate regression problem. It has no analytical solution but standard nonconvex optimization algorithms or solvers can be applied to solve it. However, using such methods will lead to an iterative process, which could be undesirable in terms of efficiency. In addition, since the nonconvexity of this problem, if no guarantee for the solution quality can be claimed, the overall alternating algorithm in general is not guaranteed to converge to a meaningful point.
In this paper, the subproblem is solved via a simple update rule while guaranteeing convergence of the overall algorithm. We propose to update by solving a majorized surrogate problem for problem (8) [19, 24] written as
(9) 
where or simply is the majorizing function of at (). To get , we need the following results.
Lemma 2.
Observing that the first part in , i.e., the least squares loss , is quadratic in , based on Proposition 2, we can have the following result.
Lemma 3.
The function can be majorized at by
where , , and .
Proof:
The proof is trivial and hence omitted. ∎
Likewise, the majorization method can also be applied to the regularizer . But we first need the following result.
Proposition 4.
[20] The nonsmooth sparsityinducing function can be decomposed as
where is a smooth and concave function when . Specifically, for , ; and for , .
An illustrating example for Proposition 4 is given in Figure 2. And based on Proposition 4, we can accordingly decompose the rowwise group sparsity regularizer as
(10) 
where which exactly takes the form of classical group lasso and . For , we can have the following majorization result.
Lemma 5.
The function can be majorized at by
where with to be the gradient of at point and specifically
where denotes the th column of .
Proof:
The proof is trivial and hence omitted. ∎
Based on in Lemma 3 and in Lemma 5, we can finally have the majorization function for given as
(11) 
where . The result by using Lemma 3 and Lemma 5 is that we shift the nonconvexity associated with the nonconvex regularizer to the loss function, and transform the nonconvex regularizer to the familiar convex group lasso regularizer. It is easy to observe that the algorithm derivation above can be easily applied to the classical group lasso and at that case .
Finally, the majorizing problem for the subproblem is given in the following form:
(12) 
which becomes separable among the rows of matrix . The resulting separable problems can be efficiently solved using the proximal algorithms [25] and have closedform solutions which are given in the following lemma.
Lemma 6.
Iiia AltMinMM: Algorithm for SRRR Estimation
Based on the alternating minimization algorithm together with the majorization and nonconvex redistribution methods, to solve the original SRRR estimation problem (2), we just need to update the variables with closedform solutions alternatingly until convergence.
The overall algorithm is summarized in the following.
Iv Numerical Simulations
In order to test the performance of the problem model and proposed algorithm. Numerical simulations are considered in this section. An SRRR with underlying group sparse structure for is specified firstly. Then a sample path is generated.
We first examine the efficiency of our proposed AltMinMM algorithm when the sparsity regularizer is the group lasso penalty, i.e., which is adopted in [15]. We compare our algorithm with the AltMinbased algorithms with subproblem solved by subgradient method (AltMinSubGrad) and by variational inequality method (AltMinVarIneq) for the proposed problem in (2). The convergence result of the objective function value is shown in Fig. 3. It is easy to see that our proposed algorithm can have a faster convergence. It should be mentioned that although the first descent step can attain a better solution in the benchmark methods, since a lot of iterations can be required to get a accuracy enough solution, they show a slower convergence in general.
We further test the case when the regularizer is based on nonconvex Geman function, i.e., (). Since there is no benchmark in the literature, our proposed algorithm AltMinMM is compared with a benchmark where the convex subproblem is derived to be a tight majorized problem of the original problem by just majorizing the nonconvex term and is solved using . The objective function convergence result is shown in Fig. 3 and Fig. 4.
We also examine the estimation accuracy of the proposed formulation and algorithm. It is evaluated by computing the angle between the estimated factor matrix space and the true space denoted by for the th MonteCarlo simulation, with and . The angle is computed as follows [2]
. First, compute the QR decompositions
and . Next, compute the SVD of where the diagonal elements of is written as . Then, the minimum angle is given by . The averaged angle for MonteCarlo runs is given bywhere it can take values from (identical subspaces) to (orthogonal subspaces). We compared three cases which are RRR estimation (without sparsity), SRRR estimation with convex sparsityinducing function , and SRRR estimation with nonconvex sparsityinducing function . It is easy to say that, the SRRR problem formulation can really exploit the group sparsity structure in and the nonconvex function shows a better performance over the convex one.
V Conclusions
The SRRR model estimation problem has been considered in this paper. It has been formulated to minimize the least squares loss with a group sparsity penalty and considering an orthogonality constraint. A nonconvex nonsmooth sparsity function has been proposed. Efficient algorithm based on the alternating minimization method, the majorizationminimization method and the nonconvexity redistribution method has been developed with variables updated in closedform. Numerical simulations have shown that the proposed algorithm is more efficient compared to the benchmarks and the nonconvex regularizer can result in a better performance than the convex one.
References

[1]
T. W. Anderson, “Estimating linear restrictions on regression coefficients for multivariate normal distributions,”
The Annals of Mathematical Statistics, pp. 327–351, 1951.  [2] T. W. Anderson, Ed., An Introduction to Multivariate Statistical Analysis. Wiley, 1984.

[3]
A. J. Izenman, “Reducedrank regression for the multivariate linear model,”
Journal of Multivariate Analysis
, vol. 5, no. 2, pp. 248–264, 1975.  [4] M. Viberg, P. Stoica, and B. Ottersten, “Maximum likelihood array processing in spatially correlated noise fields using parameterized signals,” IEEE Transactions on Signal Processing, vol. 45, no. 4, pp. 996–1004, 1997.
 [5] P. Stoica and M. Jansson, “MIMO system identification: Statespace and subspace approximations versus transfer function and instrumental variables,” IEEE Transactions on Signal Processing, vol. 48, no. 11, pp. 3087–3099, 2000.
 [6] J. H. Manton and Y. Hua, “Convolutive reduced rank wiener filtering,” in Proc. the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), vol. 6. IEEE, 2001, pp. 4001–4004.
 [7] E. Lindskog and C. Tidestav, “Reduced rank channel estimation,” in Proc. 1999 IEEE 49th Vehicular Technology Conference,, vol. 2. IEEE, 1999, pp. 1126–1130.
 [8] Y. Hua, M. Nikpour, and P. Stoica, “Optimal reducedrank estimation and filtering,” IEEE Transactions on Signal Processing, vol. 49, no. 3, pp. 457–469, 2001.
 [9] M. Nicoli and U. Spagnolini, “Reducedrank channel estimation for timeslotted mobile communication systems,” IEEE Transactions on Signal Processing, vol. 53, no. 3, pp. 926–944, 2005.
 [10] G. Zhou, “Small sample rank tests with applications to asset pricing,” Journal of Empirical Finance, vol. 2, no. 1, pp. 71–93, 1995.
 [11] P. Bekker, P. Dobbelstein, and T. Wansbeek, “The APT model as reducedrank regression,” Journal of Business & Economic Statistics, vol. 14, no. 2, pp. 199–202, 1996.
 [12] Z. Zhao and D. P. Palomar, “Robust maximum likelihood estimation of sparse vector error correction model,” in Proc. the 2017 5th IEEE Global Conference on Signal and Information Processing, Montreal, QB, Canada, Nov. 2017, pp. 913–917.
 [13] ——, “Meanreverting portfolio with budget constraint,” IEEE Transactions on Signal Processing, vol. PP, no. 99, p. 1, 2018.
 [14] R. Velu and G. C. Reinsel, Multivariate reducedrank regression: theory and applications. Springer Science & Business Media, 2013, vol. 136.
 [15] L. Chen and J. Z. Huang, “Sparse reducedrank regression for simultaneous dimension reduction and variable selection,” Journal of the American Statistical Association, vol. 107, no. 500, pp. 1533–1545, 2012.
 [16] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
 [17] D. P. Bertsekas, Nonlinear programming. Athena scientific Belmont, 1999.
 [18] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.

[19]
Y. Sun, P. Babu, and D. P. Palomar, “Majorizationminimization algorithms in signal processing, communications, and machine learning,”
IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 794–816, Aug. 2016.  [20] Q. Yao and J. T. Kwok, “Efficient learning with nonconvex regularizers by nonconvexity redistribution.” Journal of Machine Learning Research, 2018.
 [21] F. Bach, R. Jenatton, J. Mairal, G. Obozinski et al., “Optimization with sparsityinducing penalties,” Foundations and Trends® in Machine Learning, vol. 4, no. 1, pp. 1–106, 2012.
 [22] D. Geman and G. Reynolds, “Constrained restoration and the recovery of discontinuities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 3, pp. 367–383, 1992.
 [23] J. C. Gower and G. B. Dijksterhuis, Procrustes problems. Oxford University Press Oxford, 2004, vol. 3.
 [24] M. Hong, M. Razaviyayn, Z.Q. Luo, and J.S. Pang, “A unified algorithmic framework for blockstructured optimization involving big data: With applications in machine learning and signal processing,” IEEE Signal Processing Magazine, vol. 33, no. 1, pp. 57–77, 2016.
 [25] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
Comments
There are no comments yet.