1 Introduction
1.1 Motivation
Many machine learning applications require processing large and high dimensional data. The data could be images, videos, kernel matrices, spectral graphs, etc., represented as an
matrix . The data size and the amount of redundancy increase rapidly when and grow. To make the analysis and the interpretation easier, it is favorable to obtain compact and concise low rank approximation of the original data . This lowrank approximation is known to be very efficient in a wide range of applications, such as: text mining [2, 27, 30], document classification [3], clustering [20, 32], spectral data analysis [2, 13][35], and many more.There exist many different low rank approximation methods. For instance, two wellknown strategies, broadly used for data analysis, are singular value decomposition (SVD)
[10] and principle component analysis (PCA) [12]. Much of realworld data are nonnegative, and the related hidden parts express physical features only when the nonnegativity holds. The factorizing matrices in SVD or PCA can have negative entries, making it hard or impossible to put a physical interpretation on them. Nonnegative matrix factorization was introduced as an attempt to overcome this drawback, i.e., to provide the desired low rank nonnegative matrix factors.1.2 Problem formulation
A nonnegative matrix factorization problem (NMF) is a problem of factorizing the input nonnegative matrix into the product of two lower rank nonnegative matrices and :
(1) 
where usually corresponds to the data matrix, represents the basis matrix, and is the coefficient matrix. With we denote the number of factors for which it is desired that . If we consider each of the columns of being a sample of
dimensional vector data, the factorization represents each instance (column) as a nonnegative linear combination of the columns of
, where the coefficients correspond to the columns of . The columns of can be therefore interpreted as the pieces that constitute the data . To compute and , condition (1) is usually rewritten as a minimization problem using the Frobenius norm:(NMF) 
It is demonstrated in certain applications that the performance of the standard NMF in (NMF) can often be improved by adding auxiliary constraints which could be sparseness, smoothness, and orthogonality. Orthogonal NMF (ONMF) was introduced by Ding et al., [9]. To improve the clustering capability of the standard NMF, they imposed orthogonality constraints on columns of or on rows of . Considering the orthogonality on columns of , it is formulated as follows:
(ONMF) 
If we enforce orthogonality on the columns of and on rows of , we obtain the biorthogonal ONMF (biONMF), which is formulated as
(biONMF) 
where
denotes the identity matrix.
1.3 Related work
The NMF was firstly studied by Paatero et al., [26, 1] and was made popular by Lee and Seung [18, 19]. There are several different existing methods to solve (NMF). The most used approach to minimize (NMF) is a simple MU method proposed by Lee and Seung [18, 19]. In Chu et al., [6], several gradienttype approaches have been mentioned. Chu et al., reformulated (NMF) as an unconstrained optimization problem, and then applied the standard gradient descent method. Considering both and as variables in (NMF), it is obvious that is a nonconvex function. However, considering and separately, we can find two convex subproblems. Accordingly, a blockcoordinate descent (BCD) approach [19] is applied to obtain values for and that correspond to a local minimum of . Generally, the scheme adopted by BCD algorithms is to recurrently update blocks of variables only, while the remaining variables are fixed. NMF methods which adopt this optimization technique are, e.g., the MU rule [18], the activesetlike method [16], or the PG method for NMF [21]. In [21], two PG methods were proposed for the standard NMF. The first one is an alternating least squares (ALS) method using projected gradients. This way, is fixed first and a new is obtained by PG. Then, with the fixed at the new value, the PG method looks for a new . The objective function in each least squares problem is quadratic. This enabled the author to use Taylor’s extension of the objective function to obtain an equivalent condition with the Armijo rule, while checking the sufficient decrease of the objective function as a termination criterion in a stepsize selection procedure. The other method proposed in [21] is a direct application of the PG method to (NMF). There is also a hierarchical ALS method for NMF which was originally proposed in [7, 11] as an improvement to the ALS method. It consists of a BCD method with single component vectors as coordinate blocks.
As the original ONMF algorithms in [20, 32] and their variants [33, 34, 5] are all based on the MU rule, there has been no convergence guarantee for these algorithms. For example, Ding et al., [9] only prove that the successive updates of the orthogonal factors will converge to a local minimum of the problem. Because the orthogonality constraints cannot be rewritten into a nonnegatively constrained ALS framework, convergent algorithms for the standard NMF (e.g., see [21, 15, 14, 17]) cannot be used for solving the ONMF problems. Thus, no convergent algorithm was available for ONMF until recently. Mirzal [24] developed a convergent algorithm for ONMF. The proposed algorithm was designed by generalizing the work of Lin [23] in which a convergent algorithm was provided for the standard NMF based on a modified version of the additive update (AU) technique of Lee [19]. Mirzal [24] provides the global convergence for his algorithm solving the ONMF problem. In fact, he first proves the nonincreasing property of the objective function evaluated by the sequence of the iterates. Secondly, he shows that every limit point of the generated sequence is a stationary point, and finally he proves that the sequence of the iterates possesses a limit point.
1.4 Our contribution
In this paper, we consider the penalty reformulation of (biONMF), i.e., we add the orthogonality constraints multiplied with penalty parameters to the objective function to obtain reformulated problems (ONMF) and (biONMF). The main contributions are:

We develop an algorithm for (ONMF) and (biONMF), which is essentially a BCD algorithm, in literature also known as alternating minimization, coordinate relaxation, the GaussSeidel method, subspace correction, domain decomposition, etc., see e.g. [4, 29]. For each block optimization, we use a PG method and Armijo rule to find a suitable stepsize.

The implemented algorithms are compared on the constructed synthetic datasets in terms of: (i) the accuracy of the reconstruction, and (ii) the deviation of the factors from orthonormality. Accuracy is measured by the socalled rootsquare error (RSE), defined as
(2) and deviations from orthonormality are computed using formulas (17) and (18) from Sect. 4. Our numerical results show that our algorithm is very competitive and almost always outperforms the MU algorithms.
1.5 Notations
Some notations used throughout our work are described here. We denote scalars and indices by lowercase Latin letters, vectors by lowercase boldface Latin letters, and matrices by capital Latin letters. denotes the set of by real matrices, and symbolizes the identity matrix. We use the notation to show the gradient of a realvalued function. We define and as the positive and (unsigned) negative parts of , respectively, i.e., . and denote the elementwise multiplication and the elementwise division, respectively.
1.6 Structure of the paper
The rest of our work is organized as follows. In Sect. 2, we review the wellknown MU method and the rules being used for updating the factors per iteration in our computations. We also outline the global convergent MU version of Mirzal [24]. We then present our PG method and discuss the stopping criteria for it. Sect. 4 presents the synthetic data and the result of implementation of the three decomposition methods presented in Sect. 3. This implementation is done for both the problem (ONMF), as well as (biONMF). Some concluding results are presented in Sect. 5.
2 Existing methods to solve (Nmf)
2.1 MU method of Ding [9]
Several popular approaches to solve (NMF) are based on socalled MU algorithms, which are simple to implement and often yield good results. The MU algorithms originate from the work of Lee and Seung [19]. Various MU variants were later proposed by several researchers, for an overview see [8]. At each iteration of these methods, the elements of and are multiplied by certain updating factors.
As already mentioned, (ONMF) was proposed by Ding et al., [9] as a tool to improve the clustering capability of the associated optimization approaches. To adapt the MU algorithm for this problem, they employed standard Lagrangian techniques: they introduced the Lagrangian multiplier (a symmetric matrix of size ) for the orthogonality constraint, and minimized the Lagrangian function where the orthogonality constraint is moved to the objective function as the penalty term . The complementarity conditions from the related KKT conditions can be rewritten as a fixed point relation, which finally can lead to the following MU rule for (ONMF):
(3) 
They extended this approach to nonnegative three factor factorization with demand that two factors satisfy orthogonality conditions, which is a generalization of (biONMF). The MU rules (28)(30) from [9], adapted to (biONMF), are the main ingredients of Algorithm 2.1, which we will call Ding’s algorithm.
Algorithm 1. Ding’s MU algorithm for (biONMF)
INPUT: ,

Initialize: generate as an random matrix and as a random matrix.

Repeat
(4) 
Until convergence or a maximum number of iterations or maximum time is reached.
OUTPUT: .
Algorithm 2.1 converges in the sense that the solution pairs and generated by this algorithm yield a sequence of decreasing RSEs, see [9, Theorems 5, 7].
If has zero vector as columns or rows, a division by zero may occur. In contrast, denominators close to zero may still cause numerical problems. To escape this situation, we follow [28] and add a small positive number to the denominators of the MU terms (4). Note that Algorithm 2.1 can be easily adapted to solve (ONMF) by replacing the second MU rule from (4) with the second MU rule of (3).
2.2 MU method of Mirzal [24]
In [24], Mirzal proposed an algorithm for (ONMF) which is designed by generalizing the work of Lin [23]. Mirzal used the socalled modified additive update rule (the MAU rule), where the updated term is added to the current value for each of the factors. This additive rule has been used by Lin in [23] in the context of a standard NMF. He also provided in his paper a convergence proof, stating that the iterates generated by his algorithm converge in the sense that RSE is decreasing and the limit point is a stationary point. In [24], Mirzal discussed the orthogonality constraint on the rows of , while in [25] the same results are developed for the case of (biONMF).
Here we review the Mirzal’s algorithm for (biONMF), presented in the unpublished paper [25]. This algorithm actually solves the equivalent problem (penONMF) where the orthogonality constraints are moved into the objective function (the socalled penalty approach), and the importance of the orthogonality constraints are controlled by the penalty parameters :
(penONMF) 
The gradients of the objective function with respect to and are:
(5) 
For the objective function in (penONMF), Mirzal proposed the MAU rules along with the use of and , instead of and , to avoid the zero locking phenomenon [24, Section 2]:
(6)  
(7) 
where is a small positive number.
Note that, the algorithms working with the MU rules for (penONMF) must be initialized with positive matrices to avoid zero locking from the start, but nonnegative matrices can be used to initialize the algorithm working with the MAU rules (see [25]).
Mirzal [25] used the MAU rules with some modifications by considering and in order to guarantee the nonincreasing property, with a constant step to make and grow in order to satisfy the property. Here, and are the values added within the MAU terms to the denominator of update terms for and , respectively. The proposed algorithm by Mirzal [25] is summarised as Algorithm 2.2 below.
3 PG method for (Onmf) and (biONMF)
3.1 Main steps of PG method
In this subsection we adapt the PG method proposed by Lin [21] to solve both (ONMF) as well as (biONMF). Lin applied PG to (NMF) in two ways. The first approach is actually a BCD method. This method consecutively fixes one block of variables ( or ) and minimizes the simplified problem in the other variable. The second approach by Lin directly minimizes (NMF). Lin’s main focus was on the first approach and we follow it. We again try to solve the penalised version of the problem (penONMF) by the block coordinate descent method, which is summarised in Algorithm 3.1.
Algorithm 3. BCD method for (penONMF)
INPUT: inner dimension , initial matrices .

Set .

Repeat

Fix and compute new as follows:
(8) 
Fix and compute new as follows:
(9) 


Until some stopping criteria is satisfied
OUTPUT: .
The objective function in (penONMF) is not quadratic any more, so we lose the nice properties about Armijo’s rule that represent advantages for Lin. We managed to use the Armijo rule directly and still obtained good numerical results, see Sect. 4.
We refer to (8) or (9) as subproblems. Obviously, solving these subproblems in every iteration could be more costly than Algorithms 2.1–2.2. Therefore, we must find effective methods for solving these subproblems. Similarly to Lin, we apply the PG method to solve the subproblems (8) – (9). Algorithm 3.1 contains the main steps of the PG method for solving the latter and can be straightforwardly adapted for the former.
For the sake of simplicity, we denote by the function that we optimize in (8), which is actually a simplified version (pure terms removed) of the objective function from (penONMF) for fixed:
In Algorithm 3.1, is the projection operator which projects the new point (matrix) on the cone of nonnegative matrices (we simply put negative entries to 0). Inequality (10) shows the Armijo rule to find a suitable stepsize guaranteeing a sufficient decrease. Searching for is a timeconsuming operation, therefore we strive to do only a small number of trials for new in Step 3.1. Similarly to Lin [21], we allow for any positive value. More precisely, we start with and if the Armijo rule (10) is satisfied, we increase the value of by dividing it with . We repeat this until (10) is no longer satisfied or the same matrix as in the previous iteration is obtained. If the starting does not yield which would satisfy the Armijo rule (10), then we decrease it by a factor and repeat this until (10) is satisfied. The numerical results obtained using different values of parameters (updating factor for ) and (parameter to check (10)) are reported in the following subsections.
Algorithm 4. PG method using Armijo rule to solve subproblem (9)
INPUT: , and initial .

Set

Repeat

Find a (using updating factor ) such that for we have
(10) 
Set

Set ;


Until some stopping criteria is satisfied.
OUTPUT:
3.2 Stopping criteria for Algorithms 3.1 and 3.1
As practiced in the literature (e.g. see [22]), in a constrained optimization problem with the nonnegativity constraint on the variable , a common condition to check whether a point is close to a stationary point is
(11) 
where is the differentiable function that we try to optimize and is the projected gradient defined as
(12) 
and is a small positive tolerance. For Algorithm 3.1, (11) becomes
(13) 
We impose a time limit in seconds and a maximum number of iterations for Algorithm 3.1 as well. Following [21], we also define stopping conditions for the subproblems. The matrices and returned by Algorithm 3.1, respectively, must satisfy
(14) 
where
(15) 
and is the same tolerance used in (13). If the PG method for solving the subproblem (8) or (9) stops after the first iteration, then we decrease the stopping tolerance as follows:
(16) 
where is a constant smaller then 1.
4 Numerical results
In this section we demonstrate, how the PG method described in Sect. 3, performs compared to the MUbased algorithms of Ding and Mirzal, which were described in Subsections 2.1 and 2.2, respectively.
4.1 Artificial data
We created two sets of synthetic data using MATLAB [31]. The first set we call biorthonormal set (BION). It consists of instances of matrix , which were created as products of and , where has orthonormal columns while has orthonormal rows. We created five instances of , for each pair and from Table 1.
Matrices
were created in two phases: firstly, we randomly (uniform distribution) selected a position in each row; secondly, we selected a random number from
(uniform distribution) for the selected position in each row. Finally, if it happens that after this procedure some column of is zero or has a norm below , we find the first nonzero element in the largest column of (according to Euclidean norm) and move it into the zero column. We created similarly.Each triple was saved as a triple of txt files. For example, NMF_BIOG_data_R_n=200_k=80_id=5.txt contains matrix obtained by multiplying matrices and , which were generated as explained above. With id=5, we denote that this is a 5th matrix corresponding to this pair . The second set contains similar data to BION, but only one factor () is orthonormal, while the other () is nonnegative but not necessarily orthonormal. We call this dataset uniorthonormal (UNION). All computations are done using MATLAB [31] and a high performance computer available at Faculty of Mechanical Engineering of University of Ljubljana. This is Intel Xeon X5670 (1536 hypercores) HPC cluster and an E52680 V3 (1008 hypercores) DP cluster, with an IB QDR interconnection, 164 TB of LUSTRE storage, 4.6 TB RAM and with 24 TFlop/s performance.
4.2 Numerical results for UNION
In this subsection, we present numerical results, obtained by Ding’s, Mirzal’s, and our algorithm for a uniorthogonal problem (ONMF), using the UNION data, introduced in the previous subsection. We have adapted the last two algorithms (Algorithms 2.2, 3.1) for UNION data by setting in the problem formulation (biONMF) and in all formulas underlying these two algorithms.
Recall that for UNION data we have for each pair from Table 1 five symmetric matrices for which we try to solve (ONMF) by Algorithms 2.1, 2.2 and 3.1. Note that all these algorithms demand as input the internal dimension , i.e. the number of columns of factor , which is in general not known in advance. Even though, we know this dimension by construction for UNION data, we tested the algorithms using internal dimensions equal to of . For
, we know the optimum of the problem, which is 0, so for this case we can also estimate how good are the tested algorithms in terms of finding the global optimum.
The first question we had to answer was which value of to use in Mirzal’s and PG algorithms. It is obvious that larger values of moves the focus from optimizing the RSE to guaranteeing the orthonormality, i.e., feasibility for the original problem. We decided not to fix the value of but to run both algorithms for and report the results.
For each solution pair returned by all algorithms, the nonnegativity constraints are held by the construction of algorithms, so we only need to consider deviation of from orthonormality, which we call infeasibility and define it as
(17) 
The computational results that follow in the rest of this subsection were obtained by setting the tolerance in the stopping criterion to , the maximum number of iterations to in Algorithm 3.1 and to 20 in Algorithm 3.1. We also set a time limit to seconds. Additionally, for and (updating parameter for in Algorithm 3.1) we choose and , respectively. Finally, for from (16) we set a value of .
In general, Algorithm 3.1 converges to a solution in early iterations and the norm of the projected gradient falls below the tolerance shortly after running the algorithm.
Results in Tables 2 and 3 and their visualisations on Figures 0(a)–Figures 0(f) and on Figures 1(a)–Figures 1(f) confirm expectations. More precisely, we can see that the smaller the value of , the better RSE. Likewise, the larger the value of , the smaller the infeasibility . In practice, we want to reach both criteria: small RSE and small infeasibility, so some compromise should be made. If RSE is more important than infeasibility, we choose the smaller value of and vice versa. We can also observe that regarding RSE the three compared algorithms do not differ a lot. However, when the input dimension approaches the real inner dimension , Algorithm 3.1 comes closest to the global optimum . The situation with infeasibility is a bit different. While Algorithm 2.1 performs very well in all instances, Algorithm 2.2 reaches better feasibility for smaller values of . Algorithm 3.1 outperforms the others for .
RSE of  RSE of Alg. 2.2  RSE of Alg. 3.1  
(% of )  Alg. 2.1  
50  40  0.3143  0.2965  0.3070  0.3329  0.3898  0.2963  0.3081  0.3425  0.3508 
50  60  0.2348  0.2227  0.2356  0.2676  0.3459  0.2201  0.2382  0.2733  0.2765 
50  80  0.1738  0.1492  0.1634  0.1894  0.3277  0.1468  0.1620  0.1953  0.2053 
50  100  0.0002  0.0133  0.0004  0.0932  0.2973  0.0000  0.0000  0.0000  0.0000 
100  20  0.4063  0.3914  0.3955  0.4063  0.4254  0.3906  0.3959  0.4083  0.4210 
100  40  0.3384  0.3139  0.3210  0.3415  0.3677  0.3116  0.3210  0.3488  0.3625 
100  60  0.2674  0.2462  0.2541  0.2730  0.2978  0.2403  0.2528  0.2801  0.2974 
100  80  0.1847  0.1737  0.1581  0.1909  0.2263  0.1629  0.1744  0.1959  0.2090 
100  100  0.0126  0.0532  0.0427  0.0089  0.1515  0.0000  0.0000  0.0000  0.0075 
200  20  0.4213  0.4024  0.4077  0.4080  0.4257  0.4005  0.4032  0.4162  0.4337 
200  40  0.3562  0.3315  0.3398  0.3401  0.3647  0.3270  0.3313  0.3497  0.3738 
200  60  0.2845  0.2675  0.2746  0.2748  0.2955  0.2573  0.2617  0.2812  0.3061 
200  80  0.1959  0.1958  0.2013  0.1996  0.2085  0.1773  0.1819  0.1960  0.2133 
200  100  0.0191  0.0753  0.0632  0.0622  0.0415  0.0000  0.0000  0.0069  0.0181 
500  20  0.4332  0.4120  0.4119  0.4120  0.4121  0.4092  0.4096  0.4197  0.4346 
500  40  0.3711  0.3506  0.3509  0.3507  0.3505  0.3430  0.3440  0.3537  0.3753 
500  60  0.3003  0.2919  0.2923  0.2916  0.2909  0.2756  0.2766  0.2845  0.3031 
500  80  0.2098  0.2186  0.2192  0.2207  0.2151  0.1931  0.1941  0.1999  0.2122 
500  100  0.0273  0.0822  0.0864  0.0853  0.0713  0.0002  0.0003  0.0002  0.0097 
1000  20  0.4386  0.4195  0.4194  0.4193  0.4195  0.4156  0.4160  0.4216  0.4324 
1000  40  0.3777  0.3641  0.3640  0.3638  0.3637  0.3545  0.3548  0.3588  0.3707 
1000  60  0.3070  0.3047  0.3055  0.3051  0.3036  0.2881  0.2880  0.2906  0.3006 
1000  80  0.2164  0.2265  0.2248  0.2254  0.2236  0.2024  0.2029  0.2050  0.2106 
1000  100  0.0329  0.0725  0.0772  0.0761  0.0709  0.0173  0.0030  0.0035  0.0035 
Infeas. of  Infeas. of Alg. 2.2  Infeas. of Alg. 3.1  
(% of )  Alg. 2.1  
50  20  0.0964  0.2490  0.0924  0.0155  0.0038  0.2298  0.0909  0.0154  0.0022 
50  40  0.0740  0.1886  0.0676  0.0131  0.0040  0.1845  0.0670  0.0135  0.0023 
50  60  0.0553  0.1324  0.0465  0.0068  0.0040  0.1245  0.0440  0.0091  0.0015 
50  80  0.0324  0.0964  0.0241  0.0053  0.0034  0.0789  0.0250  0.0069  0.0020 
50  100  0.0023  0.0257  0.0022  0.0023  0.0039  0.0000  0.0000  0.0000  0.0000 
100  20  0.0774  0.2624  0.1441  0.0258  0.0064  0.2588  0.1308  0.0258  0.0036 
100  40  0.0539  0.1754  0.0928  0.0168  0.0036  0.1654  0.0819  0.0182  0.0035 
100  60  0.0400  0.1205  0.0545  0.0102  0.0024  0.1109  0.0487  0.0138  0.0033 
100  80  0.0239  0.0890  0.0324  0.0062  0.0022  0.0623  0.0258  0.0083  0.0018 
100  100  0.0062  0.0452  0.0153  0.0009  0.0016  0.0002  0.0000  0.0000  0.0000 
200  20  0.0584  0.2157  0.1437  0.1433  0.0054  0.2087  0.1512  0.0348  0.0074 
200  40  0.0356  0.1379  0.1004  0.1000  0.0036  0.1240  0.0806  0.0207  0.0053 
200  60  0.0260  0.0955  0.0791  0.0793  0.0031  0.0754  0.0434  0.0143  0.0047 
200  80  0.0154  0.0657  0.0634  0.0629  0.0017  0.0416  0.0218  0.0080  0.0026 
200  100  0.0059  0.0412  0.0517  0.0512  0.0016  0.0002  0.0001  0.0002  0.0001 
500  20  0.0332  0.1587  0.1894  0.1908  0.1908  0.1475  0.1268  0.0436  0.0087 
500  40  0.0189  0.1155  0.1343  0.1349  0.1347  0.0770  0.0621  0.0227  0.0069 
500  60  0.0134  0.0889  0.1095  0.1102  0.1055  0.0412  0.0312  0.0123  0.0038 
500  80  0.0084  0.0656  0.0946  0.0954  0.0826  0.0300  0.0154  0.0061  0.0021 
500  100  0.0050  0.0499  0.0847  0.0853  0.0693  0.0249  0.0003  0.0001  0.0001 
1000  20  0.0211  0.1200  0.1344  0.1349  0.1350  0.1043  0.0970  0.0471  0.0097 
1000  40  0.0122  0.0863  0.0951  0.0954  0.0954  0.0542  0.0422  0.0199  0.0059 
1000  60  0.0073  0.0662  0.0776  0.0779  0.0779  0.0414  0.0205  0.0098  0.0037 
1000  80  0.0045  0.0539  0.0671  0.0675  0.0675  0.0336  0.0103  0.0047  0.0018 
1000  100  0.0040  0.0475  0.0600  0.0603  0.0604  0.0296  0.0066  0.0005  0.0003 
4.3 Numerical results for biorthonormal data (BION)
In this subsection we provide the same type of results as in the previous subsection, but for the BION dataset.
We used almost the same setting as for UNION dataset: , maxit , and time limit = . Parameters were slightly changed (based on experimental observations): and . Additionally, we decided to take the same values for in Algorithms 2.2 and 3.1, since the matrices in BION dataset are symmetric and both orthogonality constraints are equally important. We computed the results for values of from . In Tables 4–5 we report average RSE and average infeasibility, respectively, of the solutions obtained by Algorithms 2.1, 2.2, and 3.1. Since for this dataset we need to monitor how orthonormal are both matrices and , we adapt the measure for infeasibility as follows:
(18) 
RSE of  RSE of Alg. 2.2  RSE of Alg. 3.1  
(% of )  Alg. 2.1  
50  20  0.7053  0.7053  0.7053  0.7053  0.8283  0.7053  0.7053  0.7055  0.8259 
50  40  0.6108  0.6108  0.6108  0.6108  0.9066  0.6108  0.6108  0.6108  0.6631 
50  60  0.4987  0.4987  0.4987  0.5442  0.9665  0.4987  0.4987  0.4987  0.5000 
50  80  0.3526  0.3671  0.3742  0.4497  1.0282  0.3526  0.3796  0.3527  0.4374 
50  100  0.0607  0.1712  0.2786  0.5198  1.0781  0.1145  0.1820  0.2604  0.3689 
100  20  0.7516  0.7516  0.7516  0.7517  0.9070  0.7516  0.7516  0.7517  0.8224 
100  40  0.6509  0.6509  0.6509  0.7174  0.9779  0.6509  0.6509  0.6509  0.6514 
100  60  0.5315  0.5315  0.5315  0.5504  1.0401  0.5315  0.5315  0.5315  0.5352 
100  80  0.3758  0.3787  0.4106  0.4542  1.1082  0.3801  0.3888  0.3917  0.3898 
100  100  0.1377  0.1993  0.3311  0.4898  1.1734  0.0457  0.1016  0.2758  0.3757 
200  20  0.7884  0.7884  0.7884  0.7884  0.9499  0.7884  0.7884  0.7884  0.7888 
200  40  0.6828  0.6828  0.6828  0.6828  1.0325  0.6828  0.6828  0.6828  0.6828 
200  60  0.5575  0.5575  0.5575  0.5647  1.0938  0.5575  0.5575  0.5575  0.5610 
200  80  0.3942  0.3942  0.3965  0.5019  1.1618  0.3942  0.3942  0.3942  0.4373 
200  100  0.1447  0.1851  0.3014  0.5400  1.2297  0.0202  0.1429  0.2964  0.3315 
500  20  0.8242  0.8242  0.8242  0.8242  0.9956  0.8242  0.8242  0.8242  0.8243 
500  40  0.7138  0.7138  0.7138  0.7138  1.0679  0.7138  0.7138  0.7138  0.7138 
500  60  0.5828  0.5828  0.5828  0.6045  1.1534  0.5828  0.5828  0.5828  0.5828 
500  80  0.4121  0.4121  0.4203  0.5285  1.2160  0.4121  0.4121  0.4121  0.4334 
500  100  0.1405  0.1814  0.3401  0.5854  1.2822  0.0067  0.1059  0.2044  0.3378 
1000  20  0.8436  0.8436  0.8436  0.8436  1.0261  0.8436  0.8436  0.8436  0.8436 
1000  40  0.7306  0.7306  0.7306  0.7309  1.0916  0.7306  0.7306  0.7306  0.7306 
1000  60  0.5965  0.5965  0.5965  0.6121  1.1669  0.5965  0.5965  0.5965  0.5968 
1000  80  0.4218  0.4218  0.4256  0.5338  1.2389  0.4218  0.4218  0.4218  0.4397 
1000  100  0.1346  0.1635  0.3324  0.5755  1.3080  0.0096  0.0697  0.1661  0.2188 
Infeas. of  Infeas. of Alg. 2.2  Infeas. of Alg. 3.1  
(% of )  Alg. 2.1  
50  20  0.0001  0.0070  0.0036  0.0010  0.0068  0.0017  0.0021  0.0021  0.0026 
50  40  0.0000  0.0041  0.0021  0.0004  0.0056  0.0008  0.0012  0.0012  0.0014 
50  60  0.0000  0.0030  0.0009  0.0032  0.0038  0.0005  0.0008  0.0009  0.0009 
50  80  0.0000  0.0183  0.0030  0.0021  0.0028  0.0004  0.0202  0.0006  0.0013 
50  100  0.0355  0.0533  0.0127  0.0045  0.0027  0.0418  0.0478  0.0123  0.0021 
100  20  0.0001  0.0051  0.0024  0.0006  0.0063  0.0010  0.0012  0.0013  0.0016 
100  40  0.0000  0.0029  0.0017  0.0066  0.0040  0.0004  0.0006  0.0007  0.0007 
100  60  0.0000  0.0019  0.0008  0.0009  0.0027  0.0003  0.0004  0.0005  0.0005 
100  80  0.0000  0.0039  0.0048  0.0015  0.0021  0.0062  0.0149  0.0037  0.0006 
100  100  0.0606  0.0454  0.0105  0.0022  0.0018  0.0106  0.0228  0.0173  0.0028 
200  20  0.0002  0.0033  0.0019  0.0005  0.0043  0.0005  0.0007  0.0007  0.0007 
200  40  0.0001  0.0017  0.0010  0.0002  0.0027  0.0002  0.0003  0.0004  0.0003 
200  60  0.0001  0.0010  0.0005  0.0004  0.0019  0.0001  0.0002  0.0002  0.0004 
200  80  0.0000  0.0006  0.0006  0.0015  0.0014  0.0001  0.0001  0.0002  0.0013 
200  100  0.0425  0.0280  0.0064  0.0019  0.0015  0.0046  0.0224  0.0240  0.0034 
500  20  0.0001  0.0017  0.0011  0.0003  0.0025  0.0002  0.0003  0.0003  0.0003 
500  40  0.0001  0.0008  0.0005  0.0001  0.0016  0.0001  0.0001  0.0002  0.0002 
500  60  0.0000  0.0005  0.0003  0.0006  0.0013  0.0001  0.0001  0.0001  0.0002 
500  80  0.0000  0.0003  0.0009  0.0009  0.0008  0.0000  0.0001  0.0001  0.0016 
500  100  0.0258  0.0184  0.0045  0.0013  0.0007  0.0017  0.0101  0.0175  0.0053 
1000  20  0.0001  0.0010  0.0006  0.0002  0.0024  0.0001  0.0002  0.0002  0.0002 
1000  40  0.0000  0.0005  0.0003  0.0001  0.0009  0.0001  0.0002  0.0003  0.0002 
1000  60  0.0000  0.0003  0.0002  0.0004  0.0009  0.0003  0.0002  0.0003  0.0003 
1000  80  0.0000  0.0002  0.0005  0.0007  0.0006  0.0040  0.0001  0.0002  0.0020 
1000  100  0.0173  0.0117  0.0031  0.0009  0.0005  0.0043  0.0050  0.0121  0.0060 
Figures 2(a)–2(f) and 3(a)–3(f) depict RSE and infeasibility reached by the three compared algorithms, for . We can see that all three algorithms behave well, however, Algorithm 3.1 is more stable and less dependent on the choice of . It is interesting to see that does not have a big impact on RSE and infeasibility for Algorithm 3.1, a significant difference can be observed only when the internal dimension is equal to the real internal dimension, i.e., when . Based on these numerical results, we can conclude that smaller achieve better RSE and almost the same infeasibility, so it would make sense to use .
For Algorithm 2.2 these differences are bigger and it is less obvious which is appropriate. Again, if RSE is more important then smaller values of should be taken, otherwise larger values.
5 Concluding remarks
We presented a projected gradient method to solve the orthogonal nonnegative matrix factorization problem. We penalized the deviation from orthonormality with some positive parameters and added the resulted terms to the objective function of the standard nonnegative matrix factorization problem. Then, we considered minimizing the resulted objective function under the nonnegativity conditions only, in a block coordinate decent approach.
The method was tested on two sets of synthetic data, one containing uniorthonormal matrices and the other containing biorthonormal matrices. Different values for the adjusting parameters of orthogonality were applied in the implementation to determine good pairs of such values. The performance of our algorithm was compared with two algorithms based on multiplicative updates rules. Algorithms were compared regarding the quality of factorization (RSE) and how much the resulting factors deviate from orthonormality.
We provided an extensive list of numerical results which demonstrate that our method is very competitive and outperforms the others.
Acknowledgment
The work of the first author is supported by the Swiss Government Excellence Scholarships grant number ESKAS2019.0147. This author also thanks the University of Applied Sciences and Arts, Northwestern Switzerland for supporting the work.
The work of the second author was partially funded by Slovenian Research Agency under research program P20256 and research projects N10057, N10071, and J18155.
The authors would also like to thank to Andri Mirzal (Faculty of Computing, Universiti Teknologi Malaysia) for providing the code for his algorithm (Algorithm 2.2) to solve (ONMF). This code was also adapted by the authors to solve (biONMF).
References
 [1] (1995) Source identification of bulk wet deposition in finland by positive matrix factorization. Atmospheric Environment 29 (14), pp. 1705–1718. Cited by: §1.3.
 [2] (2007) Algorithms and applications for approximate nonnegative matrix factorization. Computational statistics & data analysis 52 (1), pp. 155–173. Cited by: §1.1.
 [3] (2009) Document classification using nonnegative matrix factorization and underapproximation. In 2009 IEEE International Symposium on Circuits and Systems, pp. 2782–2785. Cited by: §1.1.
 [4] (2016) Nonlinear programming. Athena scientific optimization and computation series, Athena Scientific. External Links: ISBN 9781886529052, Link Cited by: 1st item.

[5]
(2008)
Algorithms for orthogonal nonnegative matrix factorization.
In
2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence)
, pp. 1828–1832. Cited by: §1.3.  [6] (2004) Optimality, computation, and interpretation of nonnegative matrix factorizations. In SIAM Journal on Matrix Analysis, Cited by: §1.3.

[7]
(2007)
Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization
. InInternational Conference on Independent Component Analysis and Signal Separation
, pp. 169–176. Cited by: §1.3.  [8] (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. John Wiley & Sons. Cited by: §2.1.
 [9] (2006) Orthogonal nonnegative matrix tfactorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 126–135. Cited by: 3rd item, §1.2, §1.3, §2.1, §2.1, §2.1.
 [10] (1971) Singular value decomposition and least squares solutions. In Linear Algebra, pp. 134–151. Cited by: §1.1.
 [11] (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53 (2), pp. 217–288. Cited by: §1.3.
 [12] (2005) Principal component analysis. Wiley Online Library. Cited by: §1.1.
 [13] (2006) Nonnegative matrix factorization features from spectral signatures of aviris images. In 2006 IEEE International Symposium on Geoscience and Remote Sensing, pp. 549–552. Cited by: §1.1.
 [14] (2007) Fast newtontype methods for the least squares nonnegative matrix approximation problem. In Proceedings of the 2007 SIAM international conference on data mining, pp. 343–354. Cited by: §1.3.
 [15] (2008) Fast projectionbased methods for the least squares nonnegative matrix approximation problem. Statistical Analysis and Data Mining: The ASA Data Science Journal 1 (1), pp. 38–51. Cited by: §1.3.
 [16] (2008) Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM journal on matrix analysis and applications 30 (2), pp. 713–730. Cited by: §1.3.
 [17] (2008) Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In 2008 Eighth IEEE International Conference on Data Mining, pp. 353–362. Cited by: §1.3.
 [18] (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401 (6755), pp. 788. Cited by: §1.3.
 [19] (2001) Algorithms for nonnegative matrix factorization. In Advances in neural information processing systems, pp. 556–562. Cited by: §1.3, §1.3, §2.1.
 [20] (2006) The relationships among various nonnegative matrix factorization methods for clustering. In Sixth International Conference on Data Mining (ICDM’06), pp. 362–371. Cited by: §1.1, §1.3.
 [21] (2007) Projected gradient methods for nonnegative matrix factorization. Neural computation 19, pp. 2756–2779. Cited by: §1.3, §1.3, §3.1, §3.1, §3.2.
 [22] (1999) Newton’s method for large boundconstrained optimization problems. SIAM Journal on Optimization 9 (4), pp. 1100–1127. Cited by: §3.2.
 [23] (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Transactions on Neural Networks 18 (6), pp. 1589–1596. Cited by: §1.3, §2.2.
 [24] (2014) A convergent algorithm for orthogonal nonnegative matrix factorization. Journal of Computational and Applied Mathematics 260, pp. 149–166. Cited by: 3rd item, §1.3, §1.6, §2.2, §2.2, §2.2.
 [25] (2017) A convergent algorithm for biorthogonal nonnegative matrix trifactorization. arXiv preprint arXiv:1710.11478. Cited by: §2.2, §2.2, §2.2, §2.2, §2.2.
 [26] (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5 (2), pp. 111–126. Cited by: §1.3.
 [27] (2004) Text mining using nonnegative matrix factorizations. In Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 452–456. Cited by: §1.1.
 [28] (2004) Object characterization from spectral data using nonnegative factorization and information theory. In Proceedings of AMOS Technical Conference, Cited by: §2.1.
 [29] (2014) Iteration complexity of randomized blockcoordinate descent methods for minimizing a composite function. Mathematical Programming 144 (12), pp. 1–38. Cited by: 1st item.
 [30] (2006) Document clustering using nonnegative matrix factorization. Information Processing & Management 42 (2), pp. 373–386. Cited by: §1.1.
 [31] MATLAB version r2019a. Cited by: 3rd item, §4.1, §4.1.
 [32] (2003) Document clustering based on nonnegative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 267–273. Cited by: §1.1, §1.3.
 [33] (2008) Orthogonal nonnegative matrix factorization: multiplicative updates on stiefel manifolds. In International conference on intelligent data engineering and automated learning, pp. 140–147. Cited by: §1.3.
 [34] (2010) Orthogonal nonnegative matrix trifactorization for coclustering: multiplicative updates on stiefel manifolds. Information processing & management 46 (5), pp. 559–570. Cited by: §1.3.
 [35] (2006) Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification. IEEE Transactions on Neural Networks 17 (3), pp. 683–695. Cited by: §1.1.
Comments
There are no comments yet.