I Introduction
^{†}^{†}footnotetext: This material is based in part upon work supported by the NSF under grant number DMS1312831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.What do document clustering, recommender systems, and audio signal processing have in common? All of them are problems that involve finding patterns buried in noisy data. As a result, these three problems are common applications of algorithms that solve nonnegative matrix factorization, or NMF [2, 6, 7].
Nonnegative matrix factorization involves factoring some matrix , usually large and sparse, into two factors and , usually of low rank
(1) 
Because all of the entries in , , and must be nonnegative, and because of the imposition of low rank on and , an exact factorization rarely exists. Thus NMF algorithms often seek an approximate factorization, where is close to . Despite the imprecision, however, the low rank of and forces the solution to describe using fewer parameters, which tends to find underlying patterns in . These underlying patterns are what make NMF of interest to a wide range of applications.
In the decades since NMF was introduced by Seung and Lee [5], a variety of algorithms have been published that compute NMF [1]. However, the nondeterministic nature of these NMF algorithms make them difficult to test. First, NMF asks for approximations rather than exact solutions, so whether or not an output is correct is somewhat subjective. Although cost functions can quantitatively indicate how close a given solution is to being optimal, most algorithms do not claim to find the globally optimal solution, so whether or not an algorithm gives useful solutions can be ambiguous. Secondly, all of the algorithms produced so far are stochastic algorithms, so running the algorithm on the same input multiple times can give different outputs if they use different random number sequences. Thirdly, the algorithms themselves, though often simple to implement, can have very complex behavior that is difficult to understand. As a result, it can be hard to determine whether a proposed algorithm really “solves” NMF.
This paper proposes some test cases that NMF algorithms should solve verifiably. The approach uses very simple input, such as matrices that have exact nonnegative factorizations, that reduce the space of possible solutions and ensure that the algorithm finds correct patterns with little noise. In addition, small perturbations of these simple matrices are also used, to ensure that small variations in the matrix do not drastically change the generated solution.
Ii Perturbations of order
Suppose NMF is applied to a nonnegative matrix to get nonnegative matrices and such that . If is chosen to have an exact nonnegative factorization, then the optimal solution satisfies . Furthermore, if is simple enough, most “good” NMF algorithms will find the exact solution.
For example, suppose is a nonnegative square diagonal matrix, and the output and is also specified to be square. Let the diagonal matrix be denoted , where is an
dimensional vector, so that the diagonal entries
are . It is easy to show that and must be monomial matrices (diagonal matrices under a permutation) [3]. Ignoring the permutation and similarly denoting and , then for applicable . Such diagonal matrices were given as input to the known NMF algorithms described in the next section, and all of the algorithms successfully found exact solutions in the form of monomial matrices for and .One way to analyze the properties of an algorithm is to perturb the input by a small amount and see how the output changes. Formally, if the input gives output , then the output generated from can be approximated as . It is assumed that is sufficiently small that terms are negligible.
For the test case, the nonzero entries of were chosen to be the on the superdiagonal (the first diagonal directly above the main diagonal). This matrix is denoted as , where is an dimensional vector such that . The resulting matrix has entries on its main diagonal, entries on the superdiagonal, and zeroes elsewhere. It is assumed that all the vector entries and are of comparable magnitude.
Iii Results from Various Algorithms
Three published NMF algorithms were implemented and run with input of the form as described above. Algorithm 1 was the multiplicative update algorithm described by Seung and Lee in their groundbreaking paper [5], which was run for iterations in each test. Algorithm 2 was the ALS algorithm described in [1], and which was run for iterations as well. Algorithm 3 was a gradient descent method as described by Guan and Tao [4], and was run for iterations. These three algorithms were chosen because they were representative and easytoimplement algorithms of three distinct types. Many published NMF algorithms are variations of these three algorithms.
The experiments began with the simplest nontrivial case, in which is a matrix with only three nonzero entries, with fixed and , while was varied over several different values. Each of the algorithms used randomness in the form of initial seed values for and . The random seeds were held constant as varied. As a result, the outputs from the algorithms with different values of were comparable within each test case.
For the case, it is possible to enumerate all of the nonnegative exact factorizations of . Given that the factors and are also matrices, they can be written as shown below.
(2) 
Multiplying the matrices directly produces the the following four equations:
(3)  
(4)  
(5)  
(6) 
Recall that all entries must be nonnegative, so from equation (5), either or must be 0, and either or must be 0. Furthermore, it cannot be that because that would contradict equation (6), and it cannot be that because that would contradict equation (3). Thus two cases remain: and .
Substituting into equations (3), (4), and (6) and solving for , , and gives
(7) 
Likewise, substituting into (3), (4), and (6) and solving for , , and to gives
(8) 
Observe that these two solutions look similar. In fact, they differ merely by a permutation. In the first case, and have the same main diagonal and superdiagonal format as , and can be written in matrix notation as
(9) 
The second case can be written as , where is the permutation matrix .
All three of the algorithms tested gave solutions of this form 1000 times out of 1000, for each of several values of . The consistency of the solutions enabled further analysis. The change in the solution can be measured by the change in the three parameters , , and (ignoring the permutation if present). Figure 1 shows the change in each of the three parameters from the base case for several different values of when input into Algorithm 1. Each of the values is the arithmetic mean of the corresponding values generated from 1000 different random seeds.
Of course, the precise values depend on the distribution of randomness used. But notice that as approaches 0, the values of the three parameters become very nearly linear in . The results for Algorithms 2 and 3 were very similar  they also showed linearity of the parameters in , with comparable slopes.
However, was not always linear in , even for small . In some cases, the difference approached 0 much more quickly. To see why this occurred, consider that the entries in could have been chosen to be the parameters rather than the entries in . Also, recall that in the base case , in which , since both entries are off the diagonal. Thus, when either is linear in , they are of the form for some slope . Since the solution is exact, it can be deduced that
(10) 
Therefore, in the cases that approaches 0 very quickly, since approaches a large, stable value as approaches 0, must be nearly linear in . So in the cases that is not linear in , its symmetrical counterpart, , is. To simplify this complication out of the data, the parameters in were chosen when was closer to linearity in , and the parameters in were chosen when was closer to linearity in .
Curiously, although it was possible for and to “split” the nonlinearity so that both were somewhat linear, this rarely occurred. All three algorithms preferred to make one of them very close to linear at the expense of the other. When approached zero very rapidly, by equations (3) and (4), , and similarly, when is negligible, .
Next, different values for the entries of and were tried, so they had a range of entries rather than all 1’s. The algorithms all behaved similarly; up to permutation, they satisfied the following formula
(11) 
Note that equation (9) is just a special case of this equation in which . The same phenomena was also observed in which the algorithm usually made one of and be nearly linear in and the other approach zero rapidly, rather than having both entries be nonnegligible. As long as the entries of and are roughly on the order of 1, the algorithms operated similarly.
The next case examined set to be a matrix. Using similar logic to the case, it can be deduced that any exact factorization of is likely to be of the form
(12) 
Indeed, all three algorithms always gave solutions of this form. In fact, most of the time there were two more zero entries than necessary  either or , and either or . This is similar to the way that or often approached 0 rapidly in the case. To note another similarity to the case, whenever was significant and was not, was very close to  in similar situations was approximately .
As a result, there were 4 distinct configurations of the nonzero elements in the solutions, as given by Figure 2.
Type  equal to 0 

Type I  
Type II  
Type III  
Type IV 
Note that Type IV appears to be an inexact solution; since it has positive and , the entry at position in the product would have to be nonzero. However, both and , like all entries on the superdiagonal, are , so is , and is considered negligible. In fact, most of the solutions generated by the algorithms had nonzero values for entries that were supposed to be zero, but for this analysis anything below was considered negligible.
Each algorithm was run 100 times on the input with , , and . The solutions were categorized by the solution type in Figure 2. The distributions of the solutions by algorithm type are given in Figure 2. Note that some solutions did not have two negligible entries among , , , and , in which case the smaller entry was ignored for the sake of sorting  this accounted for about 20% of the three algorithms, the majority occurring in Algorithm 1.
It is significant to note that even the solutions that didn’t fall cleanly into a “type” still satisfied the pattern given in (12). It seems that an NMF algorithm should satisfy this pattern, but little more is required.
Next, entries in and , were changed as in the case. As long as the entries were (as opposed to or ), the behavior of the algorithms was similar.
Finally, larger than were examined. Several different sizes of matrices were tested, ranging from to , always keeping , , and square, with positive entries only on the main diagonal and the superdiagonal. The experiments followed the same general pattern; nonzero entries in and appeared only on the main diagonal and superdiagonal. Using similar logic to the and cases, it can be shown that these are the only exact solutions. However, in practice, as the matrices get larger, exceptions to this pattern become more common, particularly in Algorithm 3. The general rule seems to mostly hold (over half the time) until becomes around . Note, however, that because the runtime of the algorithms are cubic in the size of the matrix, at best, the sample size for large matrices is small.
Iv Proposed Tests for NMF Algorithms
Since all three algorithms, which cover a variety of approaches to NMF, had a lot in common in their solutions, it is propose that these inputs could be used as a test case of an NMF algorithm implementation. In this section, it is proposed how such test cases could be executed.
The test begins with input of the form
(13) 
is square, and preferably somewhere between and in size, although bigger inputs may be useful as well. The entries should vary between tests. Each test should start by using so that is diagonal. The results of this test should have and monomial  only one nonzero element in each row and column. Ignore entries that are below , for the entirety of testing, as any such entries are negligible.
If or is not monomial, or if the product is not equal to to within a negligible margin of error, the algorithm fails the test. Otherwise, the generated solution can be used to find the permutation matrix that makes and diagonal by replacing the nonzero entries of with 1’s. Since is diagonal, is also diagonal, and since is diagonal, so is . Knowing will make the rest of the testing much simpler since it is easier to identify whether a solution is of the form given above when it is not permuted.
Next, run the test again using a positive value for ; seems to work well, although using a variety of is also recommended. Make sure to use the same random seeds that were used in the test to produce corresponding output. Then check that the and given by the algorithm are such that and have nonzero entries only on the two diagonals that they are supposed to. If this doesn’t hold, changing might have changed which permutation returns and to the proper form, so check again; this happens more commonly among larger matrices than smaller ones. However, if and really do break the form, or , the algorithm fails the test on this input. Otherwise, it passes.
Note that even widely accepted algorithms do fail these tests occasionally, especially with matrices larger than , so it’s advisable to perform the test many times to get a more accurate idea of an algorithm’s performance.
V Conclusion
This paper proposes an approach to the problem of testing NMF algorithms by running the algorithms on simple input that can produce an exact nonnegative factorization, and perturbations of such input. In particular, square matrices with entries on the main diagonal and entries on the superdiagonal are proposed, because they have exact solutions that can enumerated mathematically, or because they are perturbations of matrices with exact solutions.
The test cases have been used as input on three known NMF algorithms that represent a variety of algorithms, and all of them behaved similarly, which suggests testable, quantifiable behaviors that many NMF algorithms share. These test cases offer one approach for testing candidate NMF implentations to help determine whether it behaves as it should.
Acknowledgment
The authors would like to thank Dr. Alan Edelman for providing and overseeing this research opportunity, and Dr. Vijay Gadepally for his advice and expertise.
References
 [1] Berry Browne Langville Pauca and Plemmons, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis 52 (2007), 155–173.

[2]
Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis,
Largescale matrix factorization with distributed stochastic gradient descent
, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA), KDD ’11, ACM, 2011, pp. 69–77.  [3] John Gilbert, personal communication, Sep 2015.
 [4] N. Guan, D. Tao, Z. Luo, and B. Yuan, Nenmf: An optimal gradient method for nonnegative matrix factorization, IEEE Transactions on Signal Processing 60 (2012), no. 6, 2882–2898.
 [5] Daniel D. Lee and H. Sebastian Seung, Algorithms for nonnegative matrix factorization, Advances in Neural Information Processing Systems 13 (T. K. Leen, T. G. Dietterich, and V. Tresp, eds.), MIT Press, 2001, pp. 556–562.
 [6] Suvrit Sra and Inderjit S. Dhillon, Generalized nonnegative matrix approximations with bregman divergences, Advances in Neural Information Processing Systems 18 (Y. Weiss, B. Schölkopf, and J. C. Platt, eds.), MIT Press, 2006, pp. 283–290.
 [7] Wenwu Wang, Instantaneous vs. convolutive nonnegative matrix factorization: Models, algorithms and applications, Machine Audition: Principles, Algorithms and Systems: Principles, Algorithms and Systems (2010), 353.
Comments
There are no comments yet.