1 Introduction
Summarizing large data sets using pairwise cooccurrence frequencies is a powerful tool for data mining. Objects can often be better described by their relationships than their inherent characteristics. Communities can be discovered from friendships [1], song genres can be identified from cooccurrence in playlists [2], and neural word embeddings are factorizations of pairwise cooccurrence information [3, 4]. Recent Anchor Word algorithms [5, 6] perform spectral inference on cooccurrence statistics for inferring topic models [7, 8]. Cooccurrence statistics can be calculated using a single parallel pass through a training corpus. While these algorithms are fast, deterministic, and provably guaranteed, they are sensitive to observation noise and small samples, often producing effectively useless results on real documents that present no problems for probabilistic algorithms.
We cast this general problem of learning overlapping latent clusters as JointStochastic Matrix Factorization (JSMF), a subset of nonnegative matrix factorization that contains topic modeling as a special case. We explore the conditions necessary for inference from cooccurrence statistics and show that the Anchor Words algorithms necessarily violate such conditions. Then we propose a rectified algorithm that matches the performance of probabilistic inference—even on small and noisy datasets—without losing efficiency and provable guarantees. Validating on both real and synthetic data, we demonstrate that our rectification not only produces better clusters, but also, unlike previous work, learns meaningful cluster interactions.
Let the matrix represent the cooccurrence of pairs drawn from objects:
is the joint probability
for a pair of objects and . Our goal is to discover latent clusters by approximately decomposing . is the objectcluster matrix, in which each column corresponds to a cluster and is the probability of drawing an object conditioned on the object belonging to the cluster ; and is the clustercluster matrix, in which represents the joint probability of pairs of clusters. We call the matrices and jointstochastic (i.e.,) due to their correspondence to joint distributions;
is columnstochastic. Example applications are shown in Table 1.Domain  Object  Cluster  Basis 

Document  Word  Topic  Anchor Word 
Image  Pixel  Segment  Pure Pixel 
Network  User  Community  Representative 
Legislature  Member  Party/Group  Partisan 
Playlist  Song  Genre  Signature Song 
Anchor Word algorithms [5, 6] solve JSMF problems using a separability assumption: each topic contains at least one “anchor” word that has nonnegligible probability exclusively in that topic. The algorithm uses the cooccurrence patterns of the anchor words as a summary basis for the cooccurrence patterns of all other words. The initial algorithm [5] is theoretically sound but unable to produce columnstochastic wordtopic matrix due to unstable matrix inversions. A subsequent algorithm [6] fixes negative entries in
, but still produces large negative entries in the estimated topictopic matrix
. As shown in Figure 3, the proposed algorithm infers valid topictopic interactions.2 Requirements for Factorization
In this section we review the probabilistic and statistical structures of JSMF and then define geometric structures of cooccurrence matrices required for successful factorization. is a jointstochastic matrix constructed from training examples, each of which contain some subset of objects. We wish to find latent clusters by factorizing into a columnstochastic matrix and a jointstochastic matrix , satisfying .
Probabilistic structure.
Figure 2 shows the event space of our model. The distribution
over pairs of clusters is generated first from a stochastic process with a hyperparameter
. If the th training example contains a total of objects, our model views the example as consisting of all possible pairs of objects.^{1}^{1}1Due to the bagofwords assumption, every object can pair with any other object in that example, except itself. One implication of our work is better understanding the selfcooccurrences, the diagonal entries in the cooccurrence matrix. For each of these pairs, cluster assignments are sampled from the selected distribution (). Then an actual object pair is drawn with respect to the corresponding cluster assignments (, ). Note that this process does not explain how each training example is generated from a model, but shows how our model understands the objects in the training examples.Following [5, 6], our model views
as a set of parameters rather than random variables.
^{2}^{2}2In LDA, each column of B is generated from a known distribution . The primary learning task is to estimate ; we then estimate to recover the hyperparameter . Due to the conditional independence , the factorization is equivalent toUnder the separability assumption, each cluster has a basis object such that and . In matrix terms, we assume the submatrix of comprised of the rows with indices is diagonal. As these rows form a nonnegative basis for the row space of , the assumption implies .^{3}^{3}3 means the nonnegative rank of the matrix B, whereas means the usual rank. Providing identifiability to the factorization, this assumption becomes crucial for inference of both and . Note that JSMF factorization is unique up to column permutation, meaning that no specific ordering exists among the discovered clusters, equivalent to probabilistic topic models (see the Appendix).
Statistical structure.
Let be a (known) distribution of distributions from which a cluster distribution is sampled for each training example. Saying , we have i.i.d samples which are not directly observable. Defining the posterior clustercluster matrix and the expectation , Lemma 2.2 in [5] showed that^{4}^{4}4This convergence is not trivial while as
by the Central Limit Theorem.
(1) 
Denote the posterior cooccurrence for the th training example by and all examples by . Then , and . Thus
(2) 
Denote the noisy observation for the th training example by , and all examples by . Let be a matrix of topics. We will construct so that is an unbiased estimator of . Thus as
(3) 
Geometric structure.
Though the separability assumption allows us to identify even from the noisy observation , we need to throughly investigate the structure of cluster interactions. This is because it will eventually be related to how much useful information the cooccurrence between corresponding anchor bases contains, enabling us to best use our training data. Say is the set of doubly nonnegative matrices: entrywise nonnegative and positive semidefinite (PSD).
Claim and
Proof
Take any vector
. As is defined as a sum of outerproducts,(4) 
Thus . In addition, for all . Proving is analogous by the linearity of expectation. Relying on double nonnegativity of , Equation (3) implies not only the lowrank structure of , but also double nonnegativity of by a similar proof (see the Appendix).
The Anchor Word algorithms in [5, 6] consider neither double nonnegativity of cluster interactions nor its implication on cooccurrence statistics. Indeed, the empirical cooccurrence matrices collected from limited data are generally indefinite and fullrank, whereas the posterior cooccurrences must be positive semidefinite and lowrank. Our new approach will efficiently enforce double nonnegativity and lowrankness of the cooccurrence matrix based on the geometric property of its posterior behavior. We will later clarify how this process substantially improves the quality of the clusters and their interactions by eliminating noises and restoring missing information.
3 Rectified Anchor Words Algorithm
In this section, we describe how to estimate the cooccurrence matrix from the training data, and how to rectify so that it is lowrank and doubly nonnegative. We then decompose the rectified in a way that preserves the doubly nonnegative structure in the cluster interaction matrix.
Generating cooccurrence .
Let be the vector of object counts for the th training example, and let where is the document’s latent topic distribution. Then is assumed to be a sample from a multinomial distribution where , and recall and . As in [6], we generate the cooccurrence for the th example by
(5) 
The diagonal penalty in Eq. 5
cancels out the diagonal matrix term in the variancecovariance matrix, making the estimator unbiased. Putting
, that is Thus by the linearity of expectation.Rectifying cooccurrence .
While is an unbiased estimator for in our model, in reality the two matrices often differ due to a mismatch between our model assumptions and the data^{5}^{5}5There is no reason to expect real data to be generated from topics, much less exactly latent topics. or due to error in estimation from limited data. The computed
is generally fullrank with many negative eigenvalues, causing a large approximation error. As the posterior cooccurrence
must be lowrank, doubly nonnegative, and jointstochastic, we propose two rectification methods: Diagonal Completion (DC) and Alternating Projection (AP). DC modifies only diagonal entries so that becomes lowrank, nonnegative, and jointstochastic; while AP enforces modifies every entry and enforces the same properties as well as positive semidefiniteness. As our empirical results strongly favor alternating projection, we defer the details of diagonal completion to the Appendix.Based on the desired property of the posterior cooccurrence , we seek to project our estimator onto the set of jointstochastic, doubly nonnegative, low rank matrices. Alternating projection methods like Dykstra’s algorithm [9] allow us to project onto an intersection of finitely many convex sets using projections onto each individual set in turn. In our setting, we consider the intersection of three sets of symmetric matrices: the elementwise nonnegative matrices , the normalized matrices whose entry sum is equal to 1, and the positive semidefinite matrices with rank , . We project onto these three sets as follows:
where is an eigendecomposition and is the matrix modified so that all negative eigenvalues and any but the largest positive eigenvalues are set to zero. Truncated eigendecompositions can be computed efficiently, and the other projections are likewise efficient. While and are convex, is not. However, [10] show that alternating projection with a nonconvex set still works under certain conditions, guaranteeing a local convergence. Thus iterating three projections in turn until the convergence rectifies to be in the desired space. We will show how to satisfy such conditions and the convergence behavior in Section 5.
Selecting basis .
The first step of the factorization is to select the subset of objects that satisfy the separability assumption. We want the best rows of the rownormalized cooccurrence matrix so that all other rows lie nearly in the convex hull of the selected rows. [6]
use the GramSchmidt process to select anchors, which computes
pivoted QR decomposition
, but did not utilize the sparsity of . To scale beyond small vocabularies, they use random projections that approximately preserve distances between rows of . For all experiments we use a new pivoted QR algorithm (see the Appendix) that exploits sparsity instead of using random projections, and thus preserves deterministic inference.^{6}^{6}6To effectively use random projections, it is necessary to either find proper dimensions based on multiple trials or perform lowdimensional random projection multiple times [11] and merge the resulting anchors.Recovering objectcluster .
After finding the set of basis objects , we can infer each entry of by Bayes’ rule as in [6]. Let be the coefficients that reconstruct the th row of in terms of the basis rows corresponding to . Since , we can use the corpus frequencies to estimate . Thus the main task for this step is to solve simplexconstrained QPs to infer a set of such coefficients for each object. We use an exponentiated gradient algorithm to solve the problem similar to [6]. Note that this step can be efficiently done in parallel for each object.
Recovering clustercluster .
[6] recovered by minimizing ; but the inferred generally has many negative entries, failing to model the probabilistic interaction between topics. While we can further project onto the jointstochastic matrices, this produces a large approximation error.
We consider an alternate recovery method that again leverages the separability assumption. Let be the submatrix whose rows and columns correspond to the selected objects , and let be the diagonal submatrix of rows of corresponding to . Then
(6) 
This approach efficiently recovers a clustercluster matrix mostly based on the cooccrrurence information between corresponding anchor basis, and produces no negative entries due to the stability of diagonal matrix inversion. Note that the principle submatrices of a PSD matrix are also PSD; hence, if then . Thus, not only is the recovered an unbiased estimator for , but also it is now doubly nonnegative as after the rectification.^{7}^{7}7We later realized that essentially same approach was previously tried in [5], but it was not able to generate a valid topictopic matrix as shown in the middle panel of Figure 3.
4 Experimental Results
Our Rectified Anchor Words algorithm with alternating projection fixes many problems in the baseline Anchor Words algorithm [6] while matching the performance of Gibbs sampling [12] and maintaining spectral inference’s determinism and independence from corpus size. We evaluate direct measurement of matrix quality as well as indicators of topic utility. We use two text datasets: NIPS full papers and New York Times news articles.^{8}^{8}8https://archive.ics.uci.edu/ml/datasets/Bag+of+Words We eliminate a minimal list of 347 English stop words and prune rare words based on tfidf scores and remove documents with fewer than five tokens after vocabulary curation. We also prepare two nontextual itemselection datasets: users’ movie reviews from the Movielens 10M Dataset,^{9}^{9}9http://grouplens.org/datasets/movielens and music playlists from the complete Yes.com dataset.^{10}^{10}10http://www.cs.cornell.edu/~shuochen/lme We perform similar vocabulary curation and document tailoring, with the exception of frequent stopobject elimination. Playlists often contain the same songs multiple times, but users are unlikely to review the same movies more than once, so we augment the movie dataset so that each review contains number of movies based on the halfscaled rating information that varies from 0.5 stars to 5 stars. Statistics of our datasets are shown in Table 2.
Dataset  Avg. Len  

NIPS  1,348  5k  380.5 
NYTimes  269,325  15k  204.9 
Movies  63,041  10k  142.8 
Songs  14,653  10k  119.2 
We run DC 30 times for each experiment, randomly permuting the order of objects and using the median results to minimize the effect of different orderings. We also run 150 iterations of AP alternating , , and in turn. For probabilistic Gibbs sampling, we use the Mallet with the standard option doing 1,000 iterations. All metrics are evaluated against the original , not against the rectified , whereas we use and inferred from the rectified .
Qualitative results.
Although [6] report comparable results to probabilistic algorithms for LDA, the algorithm fails under many circumstances. The algorithm prefers rare and unusual anchor words that form a poor basis, so topic clusters consist of the same highfrequency terms repeatedly, as shown in the upper third of Table 3. In contrast, our algorithm with AP rectification successfully learns themes similar to the probabilistic algorithm. One can also verify that cluster interactions given in the third panel of Figure 3 explain how the five topics correlate with each other.
Arora et al. 2013 (Baseline) 

neuron layer hidden recognition signal cell noise 
neuron layer hidden cell signal representation noise 
neuron layer cell hidden signal noise dynamic 
neuron layer cell hidden control signal noise 
neuron layer hidden cell signal recognition noise 
This paper (AP) 
neuron circuit cell synaptic signal layer activity 
control action dynamic optimal policy controller reinforcement 
recognition layer hidden word speech image net 
cell field visual direction image motion object orientation 
gaussian noise hidden approximation matrix bound examples 
Probabilistic LDA (Gibbs) 
neuron cell visual signal response field activity 
control action policy optimal reinforcement dynamic robot 
recognition image object feature word speech features 
hidden net layer dynamic neuron recurrent noise 
gaussian approximation matrix bound component variables 
Similar to [13], we visualize the five anchor words in the cooccurrence space after 2D PCA of . Each panel in Figure 1 shows a 2D embedding of the NIPS vocabulary as blue dots and five selected anchor words in red. The first plot shows standard anchor words and the original cooccurrence space. The second plot shows anchor words selected from the rectified space overlaid on the original cooccurrence space. The third plot shows the same anchor words as the second plot overlaid on the APrectified space. The rectified anchor words provide better coverage on both spaces, explaining why we are able to achieve reasonable topics even with .
Rectification also produces better clusters in the nontextual movie dataset. Each cluster is notably more genrecoherent and yearcoherent than the clusters from the original algorithm. When , for example, we verify a cluster of Walt Disney 2D Animations mostly from the 1990s and a cluster of Fantasy movies represented by Lord of the Rings films, similar to clusters found by probabilistic Gibbs sampling. The Baseline algorithm [6] repeats Pulp Fiction and Silence of the Lambs 15 times.
Quantitative results.
We measure the intrinsic quality of inference and summarization with respect to the JSMF objectives as well as the extrinsic quality of resulting topics. Lines correspond to four methods: Baseline for the algorithm in the previous work [6] without any rectification, DC for Diagonal Completion, AP for Alternating Projection, and Gibbs for Gibbs sampling.
Anchor objects should form a good basis for the remaining objects. We measure Recovery error with respect to the original matrix, not the rectified matrix. AP reduces error in almost all cases and is more effective than DC. Although we expect error to decrease as we increase the number of clusters , reducing recovery error for a fixed by choosing better anchors is extremely difficult: no other subset selection algorithm [14] decreased error by more than 0.001. A good matrix factorization should have small elementwise Approximation error . DC and AP preserve more of the information in the original matrix than the Baseline method, especially when is small.^{11}^{11}11In the NYTimes corpus, is a large error: each element is around due to the number of normalized entries. We expect nontrivial interactions between clusters, even when we do not explicitly model them as in [15]. Greater diagonal Dominancy indicates lower correlation between clusters.^{12}^{12}12Dominancy in Songs corpus lacks any Baseline results at because dominancy is undefined if an algorithm picks a song that occurs at most once in each playlist as a basis object. In this case, the original construction of , and hence of , has a zero diagonal element, making dominancy NaN. AP and Gibbs results are similar. We do not report heldout probability because we find that relative results are determined by userdefined smoothing parameters [13, 16].
Specificity measures how much each cluster is distinct from the corpus distribution. When anchors produce a poor basis, the conditional distribution of clusters given objects becomes uniform, making similar to . Intertopic Dissimilarity counts the average number of objects in each cluster that do not occur in any other cluster’s top 20 objects. Our experiments validate that AP and Gibbs yield comparably specific and distinct topics, while Baseline and DC simply repeat the corpus distribution as in Table 3. Coherence penalizes topics that assign high probability (rank ) to words that do not occur together frequently. AP produces results close to Gibbs sampling, and far from the Baseline and DC. While this metric correlates with human evaluation of clusters [17] “worse” coherence can actually be better because the metric does not penalize repetition [13].
In semisynthetic experiments [6] AP matches Gibbs sampling and outperforms the Baseline, but the discrepancies in topic quality metrics are smaller than in the real experiments (see Appendix). We speculate that semisynthetic data is more “wellbehaved” than real data, explaining why issues were not recognized previously.
5 Analysis of Algorithm
Why does AP work?
Before rectification, diagonals of the empirical matrix may be far from correct. Bursty objects yield diagonal entries that are too large; extremely rare objects that occur at most once per document yield zero diagonals. Rare objects are problematic in general: the corresponding rows in the matrix are sparse and noisy, and these rows are likely to be selected by the pivoted QR. Because rare objects are likely to be anchors, the matrix is likely to be highly diagonally dominant, and provides an uninformative picture of topic correlations. These problems are exacerbated when is small relative to the effective rank of , so that an early choice of a poor anchor precludes a better choice later on; and when the number of documents is small, in which case the empirical is relatively sparse and is strongly affected by noise. To mitigate this issue, [16] run exhaustive grid search to find document frequency cutoffs to get informative anchors. As model performance is inconsistent for different cutoffs and search requires crossvalidation for each case, it is nearly impossible to find good heuristics for each dataset and number of topics.
Fortunately, a lowrank PSD matrix cannot have too many diagonallydominant rows, since this violates the low rank property. Nor can it have diagonal entries that are small relative to offdiagonals, since this violates positive semidefiniteness. Because the anchor word assumption implies that nonnegative rank and ordinary rank are the same, the AP algorithm ideally does not remove the information we wish to learn; rather, 1) the lowrank projection in AP suppresses the influence of small numbers of noisy rows associated with rare words which may not be well correlated with the others, and 2) the PSD projection in AP recovers missing information in diagonals. (As illustrated in the Dominancy panel of the Songs corpus in Figure 4, AP shows valid dominancies even after in contrast to the Baseline algorithm.)
Why does AP converge?
AP enjoys local linear convergence [10] if 1) the initial is near the convergence point , 2) is superregular at , and 3) strong regularity holds at . For the first condition, recall that we rectified by pushing toward , which is the ideal convergence point inside the intersection. Since as shown in (5), is close to as desired.The proxregular sets^{13}^{13}13A set is proxregular if is locally unique. are subsets of superregular sets, so proxregularity of at is sufficient for the second condition. For permutation invariant , the spectral set of symmetric matrices is defined as , and is proxregular if and only if is proxregular [18, Th. 2.4]. Let be . Since each element in has exactly positive components and all others are zero, . By the definition of and , is locally unique almost everywhere, satisfying the second condition almost surely. (As the intersection of the convex set and the smooth manifold of rank matrices, is a smooth manifold almost everywhere.)
Checking the third condition a priori is challenging, but we expect noise in the empirical to prevent an irregular solution, following the argument of Numerical Example 9 in [10]. We expect AP to converge locally linearly and we can verify local convergence of AP in practice. Empirically, the ratio of average distances between two iterations are always on the NYTimes dataset (see the Appendix), and other datasets were similar. Note again that our rectified is a result of pushing the empirical toward the ideal . Because approximation factors of [6] are all computed based on how far and its cooccurrence shape could be distant from ’s, all provable guarantees of [6] hold better with our rectified .
6 Related and Future Work
JSMF is a specific structurepreserving Nonnegative Matrix Factorization (NMF) performing spectral inference. [19, 20] exploit a similar separable structure for NMF problmes. To tackle hyperspectral unmixing problems, [21, 22] assume pure pixels
, a separabilityequivalent in computer vision. In more general NMF without such structures, RESCAL
[23] studies tensorial extension of similar factorization and SymNMF [24] infers rather than . For topic modeling, [25]performs spectral inference on third moment tensor assuming topics are uncorrelated.
As the core of our algorithm is to rectify the input cooccurrence matrix, it can be combined with several recent developments. [16] proposes two regularization methods for recovering better . [13] nonlinearly projects cooccurrence to lowdimensional space via SNE and achieves better anchors by finding the exact anchors in that space. [11] performs multiple random projections to lowdimensional spaces and recovers approximate anchors efficiently by divideandconquer strategy. In addition, our work also opens several promising research directions. How exactly do anchors found in the rectified form better bases than ones found in the original space ? Since now the topictopic matrix is again doubly nonnegative and jointstochastic, can we learn supertopics in a multilayered hierarchical model by recursively applying JSMF to topictopic cooccurrence ?
Acknowledgments
This research is supported by NSF grant HCC:Large0910664. We thank Adrian Lewis for valuable discussions on AP convergence.
References
 [1] Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You are who you know: Inferring user profiles in Online Social Networks. In Proceedings of the 3rd ACM International Conference of Web Search and Data Mining (WSDM’10), New York, NY, February 2010.
 [2] Shuo Chen, J. Moore, D. Turnbull, and T. Joachims. Playlist prediction via metric embedding. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 714–722, 2012.
 [3] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. GloVe: Global vectors for word representation. In EMNLP, 2014.
 [4] Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In NIPS, 2014.
 [5] S. Arora, R. Ge, and A. Moitra. Learning topic models – going beyond SVD. In FOCS, 2012.
 [6] Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees. In ICML, 2013.
 [7] T. Hofmann. Probabilistic latent semantic analysis. In UAI, pages 289–296, 1999.

[8]
D. Blei, A. Ng, and M. Jordan.
Latent Dirichlet allocation.
Journal of Machine Learning Research
, pages 993–1022, 2003. Preliminary version in NIPS 2001.  [9] JamesP. Boyle and RichardL. Dykstra. A method for finding projections onto the intersection of convex sets in Hilbert spaces. In Advances in Order Restricted Statistical Inference, volume 37 of Lecture Notes in Statistics, pages 28–47. Springer New York, 1986.
 [10] Adrian S. Lewis, D. R. Luke, and J r me Malick. Local linear convergence for alternating and averaged nonconvex projections. Foundations of Computational Mathematics, 9:485–513, 2009.
 [11] Tianyi Zhou, Jeff A Bilmes, and Carlos Guestrin. Divideandconquer learning by anchoring a conical hull. In Advances in Neural Information Processing Systems 27, pages 1242–1250, 2014.
 [12] T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228–5235, 2004.

[13]
Moontae Lee and David Mimno.
Lowdimensional embeddings for interpretable anchorbased topic
inference.
In
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pages 1319–1328. Association for Computational Linguistics, 2014.  [14] Mary E Broadbent, Martin Brown, Kevin Penner, I Ipsen, and R Rehman. Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online, 3:50–71, 2010.
 [15] D. Blei and J. Lafferty. A correlated topic model of science. Annals of Applied Statistics, pages 17–35, 2007.
 [16] Thang Nguyen, Yuening Hu, and Jordan BoydGraber. Anchors regularized: Adding robustness and extensibility to scalable topicmodeling algorithms. In Association for Computational Linguistics, 2014.
 [17] David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. Optimizing semantic coherence in topic models. In EMNLP, 2011.
 [18] A. Daniilidis, A. S. Lewis, J. Malick, and H. Sendov. Proxregularity of spectral functions and spectral sets. Journal of Convex Analysis, 15(3):547–560, 2008.
 [19] Christian Thurau, Kristian Kersting, and Christian Bauckhage. Yes we can: simplex volume maximization for descriptive webscale matrix factorization. In CIKM’10, pages 1785–1788, 2010.
 [20] Abhishek Kumar, Vikas Sindhwani, and Prabhanjan Kambadur. Fast conical hull algorithms for nearseparable nonnegative matrix factorization. CoRR, pages –1–1, 2012.
 [21] Jos M. P. Nascimento, Student Member, and Jos M. Bioucas Dias. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, pages 898–910, 2005.

[22]
Cécile Gomez, H. Le Borgne, Pascal Allemand, Christophe Delacourt, and
Patrick Ledru.
NFindR method versus independent component analysis for lithological identification in hyperspectral imagery.
International Journal of Remote Sensing, 28(23):5315–5338, 2007.  [23] Maximilian Nickel, Volker Tresp, and HansPeter Kriegel. A threeway model for collective learning on multirelational data. In Proceedings of the 28th International Conference on Machine Learning (ICML11), ICML, pages 809–816. ACM, 2011.
 [24] Da Kuang, Haesun Park, and Chris H. Q. Ding. Symmetric nonnegative matrix factorization for graph clustering. In SDM. SIAM / Omnipress, 2012.
 [25] Anima Anandkumar, Dean P. Foster, Daniel Hsu, Sham Kakade, and YiKai Liu. A spectral algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 36, 2012, Lake Tahoe, Nevada, United States., pages 926–934, 2012.
Comments
There are no comments yet.