Curve Registered Coupled Low Rank Factorization

02/09/2018 ∙ by Jeremy Emile Cohen, et al. ∙ 0

We propose an extension of the canonical polyadic (CP) tensor model where one of the latent factors is allowed to vary through data slices in a constrained way. The components of the latent factors, which we want to retrieve from data, can vary from one slice to another up to a diffeomorphism. We suppose that the diffeomorphisms are also unknown, thus merging curve registration and tensor decomposition in one model, which we call registered CP. We present an algorithm to retrieve both the latent factors and the diffeomorphism, which is assumed to be in a parametrized form. At the end of the paper, we show simulation results comparing registered CP with other models from the literature.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Joint decomposition models such as the canonical polyadic (CP) tensor decomposition [4] allow to blindly extract patterns of underlying hidden phenomena from a block of data measurements based on their algebraic properties without statistical assumptions. Thanks to their uniqueness properties under mild conditions [4], tensor decompositions have been applied in many domains: neurosciences [1], chemometrics [21] and digital communications [20] to name a few.

To retrieve the latent patterns without statistical assumptions, the number of free parameters must be rather low (i.e. the number of latent patterns is small with respect to the data dimensions). For example, in the CP model for a 3-way data block, , each slice in one of the dimensions is approximated by a rank matrix decomposition: , where , , and is the diagonal matrix formed with the -th row of . Here the columns of these matrices are the latent patterns that we are searching for and the fundamental constraint is that the matrix factors and are exactly the same as varies. Clearly, the model for the slices in any of the 3-ways of the CP decomposition corresponds to a coupled matrix decomposition where the and matrix factors are shared. Other models with less stringent coupling constraints have been considered in the literature, for example, PARAFAC2 [7], Shift-PARAFAC [13, 6], soft non-negative matrix co-factorization [19] or probabilistic couplings [3].

In this paper, we are also interested in such a less constrained decomposition, where one of the matrix factors, for example, is allowed to change over the experimental parameter : . The components of the factor from one slice to another are all similar up to a diffeomorphism, that is up to local compression and dilations. This can be useful, for example, in ocular-artifact removal in EEG [17] where the coupled latent signals are related to different eye blinks or saccades, or in chromatography [2] where the latent components are time elution responses of chemical compounds on different chromatographic experiments. In both exemples, the patterns feature domain variations, that may differ at any given time but are similar after alignment through delay, local dilations and compressions.

Finding the diffeomorphisms, that is, the transformations of the arguments (time or space) of the latent curves, leading to an alignment is known in statistics as curve registration [15] and in signal processing as time warping [18]. In curve registration one may be interested in computing the structured average [10], i.e. an aligned mean curve, which serves as a template for trend analysis. In this paper, we are facing a different problem than in curve registration since the curves themselves are unknown latent functions. By merging both curve registration and CP decompositions, we expect that the factors obtained from the joint decomposition of each slice will be retrieved with an increased accuracy when compared with other methods which do not include fully the diffeomorphism coupling information, as in Shift-PARAFAC [6] and PARAFAC2 [13, 11].

In this work we propose to modify the well-known alternating least squares (ALS) algorithm for CP decomposition [4] to include a curve registration step on the factor containing domain variation. Closely related to our work, warped factor analysis (WFA) has been proposed in [8] where curve registration is explicitly carried out using a piecewise linear model for the diffeomorphism. In WFA, the template curve (i.e.

the structural average which is used as reference) is contained directly in the data, which is a fundamental difference with the proposed approach. In our work we extend WFA (i) to a generalized diffeomorphism model, and (ii) to have a less arbitrary template curve estimated from all latent patterns by searching for a structural average curve. To retrieve this structural average curve and the optimal diffeomorphisms, we follow an alternating approach similar to 

[22].

Notation:

Vectors are denoted in bold symbols , matrices as bold capital symbols . The -th entry of matrix is denoted , its -th column  or and the -th row . The transposition operator is denoted as .  is the composition operator: .

2 Curve registered decomposition models

In this section we present the curve registered decomposition model through a Bayesian estimation perspective. We present it in three steps: Section 2.1 develops the measurement model and its corresponding likelihood. Section 2.2 presents the registered CP derived from the maximum a posteriori (MAP) estimator of all unknown parameters (i.e. both measurement and coupling models). Finally Section 2.3

introduces a parametric model for the diffeomorphisms.

2.1 Measurement model: low-rank matrix decomposition model

Without loss of generality, we consider the data block to be a 3-way array, , such that 2-way measurement arrays () of size are available. Moreover, we suppose that each matrix is given by a rank- factorization plus a measurement noise term:

(1)

where the rank is supposed to be known and much smaller than the dimensions , and . The factor matrices , and are the unknown latent patterns to be retrieved and are noise matrices assumed to be independent from one another and with independent elements. Note that the factor matrix is shared across data slices . The elements 

of the noise matrices are assumed to be independent zero-mean normally distributed with a variance

: . Without further knowledge on a relationship relating factors , a natural way to retrieve the latent factors is through maximum likelihood estimation. This corresponds to the minimization of the cost function w.r.t. the factor matrices:

(2)

where stands for the Frobenius norm. Minimizing (2) actually corresponds to computing a low rank matrix factorization of the stacked matrices . Therefore, there is no guarantee that the retrieved patterns will be physically interpretable, since the model is not uniquely identifiable due to rotational ambiguity.

2.2 Registered CP from MAP formulation

In what follows, factors are supposed to be similar in shape but with variations on their domain. For example, consider that factors relate to time and that they are sampled versions of continuous-time signals: . We assume that the sampling grid points , with , are the same for all measurement matrices and we consider a normalized time period so that . For any of the underlying continuous signals, domain variation can be expressed as

(3)

where the functions representing the variation are diffeomorphisms from to . They are non-decreasing functions with and . Note that the signals play the role of common unknown reference shapes, and are zero mean white Gaussian processes independent for all different  and . This perturbation in the coupling model may be understood in two ways: 1) As some prior knowledge that the coupling relationship between factors is not exactly a warping. 2) As a variable splitting that makes the underlying optimization problem easier to solve. Indeed, if additional constraints are imposed on factors , for instance nonnegativity, we will show below that the estimation process can be cast as constrained least squares problem.

For discrete time samples and assuming are known, this approach implies that

are independent Gaussian random variables

, where is a known variance. With this prior, criterion (2) can be modified to obtain the following MAP cost function:

(4)

where the coupling term is introduced by the prior. The minimum of over all parameters yield the proposed model, coined Registered CP. The main difference with (2) is that the additional constraints are expected to solve the rotational ambiguity intrinsic to matrix factorizations.

It is worth noting that:

  • CP model: If are identity and if , then the model becomes a CP model obtained by stacking matrices along a third dimension.

  • Indeterminacy: An indeterminacy remains in determining canonical and , since for any given one can apply a common warping to all and obtain a different : . In other words, diffeomorphisms can only be obtained up to a common diffeomorphism.

  • Linear interpolation: In theory , and are functions of continuous time. In practice we work with discrete time. This means exact time transformations

    are not actually computed. Rather, transformed functions are obtained through linear interpolation.

2.3 Parametric model for the diffeomorphisms

In their non-parametric continuous-time form, the diffeomorphisms cannot be handled numerically. While it is possible to use dynamic programming to process these diffeomorphisms as non-parametric functions [22, 16], this is typically very sensitive to the noise and time consuming, specially if the dataset is large. Therefore, to simplify, we assume that these functions can be modeled with a parametric form, with a small number of parameters. Multiple parametrized forms for these functions exist. Here we focus on exponential maps.

Exponential maps:

Since

are also cumulative distribution functions, they can be defined through their derivatives, which are probability density functions,

i.e. they are positive and sum to one. We can define easily such functions by applying the exponential map to any function defined on . This approach is commonly found in curve registration [15, 9] and it is also referred as the log-derivative approach [12]. It leads to the following diffeomorphism:

(5)

The main purpose of this representation is that we can parametrize the functions without imposing monotonicity constraints. In particular, we can assume that all are linear combinations of functions :

(6)

where is the vector of parameters characterizing the diffeomorphism. The following particular cases are of interest:

B-splines: Function can be a B-splines with a fixed number of knots and degree.

Linear: If a simple linear function is used, with and , then the diffeomorphisms are

(7)

Constant: If we use a -th order B-splines basis then we obtain the parametrization used implicitly in [8].

3 Algorithm

This section describes the alternating algorithm approach to obtain the Registered CP model.

3.1 Multiway array decomposition algorithm

Given a previous update or guess of , one can minimize w.r.t. in an alternating approach using standard linear least squares, while factor can be retrieved by solving the following least squares problem:

(8)

where stands for taken at sampled times points using linear interpolation

3.2 Shape Alignment using exponential maps

From this point onwards, the diffeomorphisms are assumed to be well modelled as the previously introduced exponential maps . Given previous update of latent factors , what needs to be estimated are both the values of and the underlying . Thus, the following optimization problem needs to be solved for all :

(9)

Since estimating both the structured mean and is cumbersome, as suggested in [22], an alternating strategy is used. The following can be used independently as a very simple alignment algorithm summarized in Algorithm 1:

  1. Structure mean estimation : Given the values of , the structured averages are computed as the solutions of linear systems, namely

    (10)

    where is the interpolation matrix obtained by linear interpolation from the sampling grid to the warped sampling grid .

  2. Warping parameters estimation: Given , the criterion (9) becomes one dimensional problems. And even through it is highly non-convex in the general case, good values of can be computed using a grid search. Multiple strategies can then be used to refine the search space once convergence is achieved and we used in particular the Golden Search method [14]. In both cases, the cost of one evaluation is rather low since computing (9) requires a linear interpolation and multiplications, but evaluating the cost on a grid can be time consuming.

This algorithm should converge to a local minimum of the alignment cost function since the cost is reduced at each iteration and for each block of parameters.

  Input: Initial target , initial warping parameters , similar-shaped functions , regularization parameters .
  while residual is too large do
     Structure mean estimation: set as either the
        1. that minimized the residuals (first iteration)
        2. the solution to (10)
        3. initial target (inside a larger optimization scheme)
     Warping parameters estimation:
     if Residuals are higher than some threshold (coarse estimation) then
        Compute criterion (9) on a grid to define an interval surrounding the optimum.
     else
        Find the optimal in interval using Golden Search.
     end if
  end while
  Output: Estimated warping parameters and structured mean .
Algorithm 1 Alignment algorithm under parametrized diffeomorphisms.

3.3 Detailed 3-way algorithm

Joining the alternating least squares update of factors , , and with the alignment algorithm (Algorithm 1) leads to Algorithm 2, which is given below along with some implementation details. It can be easily adapted for constrained Registered CP by replacing the least squares solver with a constrained one: e.g., for nonnegative least squares, one can use the algorithm described in [5].

  Input: Data matrices , initial guesses , , , initial values.
  while Stopping criterion is not met do
      Solve and normalize column-wise with the norm
      , solve
      , solve optimization problem (8) and normalize column-wise with the norm,
      Use Algorithm 1 to align the previously estimated , and
      If necessary, increase the regularization parameters
  end while
  Output: Estimated factors , and , coupling parameters and .
Algorithm 2 Alternating least squares algorithm for Registered CP under parametrized diffeomorphisms.

Initialization: Due to the highly non-convex behavior of the cost function w.r.t. , a good initialization method is required. As a reasonable option, we used the factors given by a standard CP model fitting. Moreover, the initial values of are also very important, since large values put too much emphasis on the regularization terms, which implies factors not change much and the algorithm mostly fits and . Empirically, we used the following values for the values of at the first and second iterations:

(11)

where is the initial value of , is the estimate of after the first iteration, is a matrix containing stacked and SNR refers to the expected Signal to Noise ratio of the whole tensor data. We used in all following iterations.

Normalization: Columns of are normalized with norm, while the columns of are normalized with norm.

Case for all : It may happen that all components in have the same warping, for instance when the variability generating process affects the data uniformly across the sensors. Such an hypothesis is actually exploited also in [8] and is an underlying hypothesis of PARAFAC2. Formally, with parametrized diffeomorphisms, this means that for all . Then the alignment algorithm can be slightly modified to improve estimation accuracy since the number of parameters is reduced.

4 Experiments on simulated nonnegative data

In this section, the Registered CP model is tested on simulated nonnegative data and compared with similar state-of-the-art models, namely the Shift PARAFAC model and the PARAFAC2 model. Many data alignment models have been proposed in the literature, but only those two models align the factors directly inside the optimization process.

Simulation settings: After setting the rank , factors and

are drawn entry-wise from uniform distributions over

. A latent factor is generated column-wise using the exponential map, which mode is randomly determined but so that all modes do not overlap. The variances are also randomly determined. Then, are chosen using affine functions of the variable with random slope depending on the variable. Thus each component has its own warping range. Finally, the are generated from using exponential maps of parameters . Additive Gaussian noise variance is determined from a user-defined SNR using .

In the following experiment, the total reconstruction error on

(12)

is monitored over experiments. is the best permutation that matches columns of the estimated with the true . Note that in (12), the matrices are normalized column-wise using the norm. The rank is set to and data dimensions are . The Registered CP algorithm is initialized by the result of 100 iterations of standard alternating least squares.

Figure 1 shows for several SNR values and the various mentioned algorithms. Although PARAFAC2 algorithm should not perform well since it relies on the assumption that for all , it outperforms both the Shift-PARAFAC and the Registered CP model at high SNR values. However, on average, the Registered CP performs the best for medium and low SNR values. All algorithm feature a high variability in their outputs, thus indicating a high sensibility to the initialization. The fact that PARAFAC2 uses the best of several initializations is probably the reason why it performs best at high SNR.

Figure 1: : RCP - registered CP, SP - shift PARAFAC, P2 - PARAFAC2.

In order to study the dependence of on regularization parameters for Registered CP, in a second experiment, instead of the values suggested in equation (11), we fixed values or gridded over a multiplicative coefficient in front of the initial values:

(13)

Figure 2 shows the obtained results for realizations. It can be observed that finding a good set of regularization parameters is important to obtain better results on average. The good performance of the uncoupled matrix factorization algorithm (regularization set to 0) is due to the nonnegativity constraints applied on all factors. Nevertheless, using the Registered CP model, estimation performances on the are improved at both and . Variability however seems to increase alongside the amount of regularization.

Figure 2: : Left - , right - .

5 Conclusion

A new coupled tensor decomposition model is introduced, namely the Registered CP model, where factors on one mode are similar up to time contraction or dilatation. A specific class of diffeomorphisms is used to generate a decomposition algorithm that can identify both the factors and the latent coupling parameters. Simulations on synthetic data show encouraging results, but the Registered CP model is yet to be tested on actual data sets. Furthermore, in future works, the class of allowed diffeomorphisms should be enlarged.

References

  • [1] H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet. Brain source imaging: from sparse to tensor models. IEEE Sig. Proc. Magazine, 32(6):100–112, November 2015.
  • [2] R. Bro, C. A. Andersson, and H. A. L. Kiers. PARAFAC2-Part II. modeling chromatographic data with retention time shifts. J. Chemometr., 13(3-4):295–309, 1999.
  • [3] R. Cabral Farias, J. E. Cohen, and P. Comon. Exploring multimodal data fusion through joint decompositions with flexible couplings. IEEE Trans. Signal Process., 64(18):4830–4844, 2016.
  • [4] P. Comon, X. Luciani, and A. L. F. De Almeida. Tensor decompositions, alternating least squares and other tales. J. Chemometr., 23(7-8):393–405, 2009.
  • [5] N. Gillis and F. Glineur. Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. Neural Comput., 24(4):1085–1105, 2012.
  • [6] R. A. Harshman, S. Hong, and M. E. Lundy. Shifted factor analysis—Part I: Models and properties. J. Chemometr., 17(7):363–378, 2003.
  • [7] Richard A Harshman. PARAFAC2: Mathematical and technical notes. UCLA working papers in phonetics, 22(3044):122215, 1972.
  • [8] S. Hong. Warped factor analysis. J. Chemometr., 23(7-8):371–384, 2009.
  • [9] G. M. James.

    Curve alignment by moments.

    Ann. Appl. Stat., pages 480–501, 2007.
  • [10] A. Kneip and T. Gasser. Statistical tools to analyze data representing a sample of curves. The Annals of Statistics, pages 1266–1305, 1992.
  • [11] F. Marini and R. Bro. Scream: A novel method for multi-way regression problems with shifts and shape changes in one mode. Chemometr. Intell. Lab., 129:64–75, 2013.
  • [12] J. S. Marron, J. O. Ramsay, L. M. Sangalli, and A. Srivastava. Functional data analysis of amplitude and phase variation. Statistical Science, 30(4):468–484, 2015.
  • [13] M. Mørup, L. K. Hansen, S. M. Arnfred, L.-H. Lim, and K. H. Madsen. Shift-invariant multilinear decomposition of neuroimaging data. NeuroImage, 42(4):1439–1450, 2008.
  • [14] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes 3rd edition: The art of scientific computing, volume 3. Cambridge university press Cambridge, 2007.
  • [15] J. O. Ramsay and X. Li. Curve registration. J. R. Stat. Soc. Series B Stat. Methodol., 60(2):351–363, 1998.
  • [16] B. Rivet and J. E. Cohen. Modeling time warping in tensor decomposition. In IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2016), pages 1–5. IEEE, 2016.
  • [17] B. Rivet, M. Duda, A. Guérin-Dugué, C. Jutten, and P. Comon. Multimodal approach to estimate the ocular movements during EEG recordings: a coupled tensor factorization method. In Conf. Proc. IEEE Eng. Med. Biol. Soc., pages 6983–6986. IEEE, 2015.
  • [18] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26(1):43–49, 1978.
  • [19] N. Seichepine, S. Essid, C. Févotte, and O. Cappé. Soft nonnegative matrix co-factorization. IEEE Trans. Signal Process., 62(22):5940–5949, 2014.
  • [20] N. D. Sidiropoulos, G. B. Giannakis, and R. Bro. Blind PARAFAC receivers for DS-CDMA systems. IEEE Trans. Signal Process., 48(3):810–823, 2000.
  • [21] A. Smilde, R. Bro, and P. Geladi. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons, 2005.
  • [22] A. Srivastava, W. Wu, S. Kurtek, E. Klassen, and J. S. Marron. Registration of functional data using Fisher-Rao metric. arXiv preprint arXiv:1103.3817, 2011.