Heuristics for Efficient Sparse Blind Source Separation

by   Christophe Kervazo, et al.

Sparse Blind Source Separation (sparse BSS) is a key method to analyze multichannel data in fields ranging from medical imaging to astrophysics. However, since it relies on seeking the solution of a non-convex penalized matrix factorization problem, its performances largely depend on the optimization strategy. In this context, Proximal Alternating Linearized Minimization (PALM) has become a standard algorithm which, despite its theoretical grounding, generally provides poor practical separation results. In this work, we propose a novel strategy that combines a heuristic approach with PALM. We show its relevance on realistic astrophysical data.


page 1

page 2

page 3


unrolling palm for sparse semi-blind source separation

Sparse Blind Source Separation (BSS) has become a well established tool ...

Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation

Multichannel blind source separation aims to recover the latent sources ...

Sparse component separation from Poisson measurements

Blind source separation (BSS) aims at recovering signals from mixtures. ...

Formulating Beurling LASSO for Source Separation via Proximal Gradient Iteration

Beurling LASSO generalizes the LASSO problem to finite Radon measures re...

Lorentzian Peak Sharpening and Sparse Blind Source Separation for NMR Spectroscopy

In this paper, we introduce a preprocessing technique for blind source s...

Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001)

Blind source separation (BSS), i.e., the decoupling of unknown signals t...

Sifting Common Information from Many Variables

Measuring the relationship between any pair of variables is a rich and a...

1 Introduction

1.1 Blind Source Separation problem

In the BSS [1] framework, the data are composed of observations, each of which has samples. These observations are supposed to be some linear combinations of sources. The objective of BSS is to retrieve the sources as well as the mixing coefficients. In matrix form, the goal is therefore to find two matrices (of size ) and (of size ), called respectively the source and the mixing matrices, such that: , where (of size ) is the observation matrix that is corrupted with some unkwown noise . Since it requires tackling an ill-posed unsupervised matrix factorization problem, further assumptions are needed, including the statistical independance of the sources (ICA - [1]), the non-negativity of and [2]. In this work, we will focus on the sparsity of the sources [3, 4, 5, 6]. In this framework, sparse BSS will aim at finding a (local) minimum of:


The first data fidelity term promotes a faithful reconstruction of the data. The use of the Froebenius norm stems from the assumption of a Gaussian noise . The second term ensures that the columns of are all in the ball. This avoids degenerated and ( and ). The last term involving the Hadamard product enforces a sparsity constraint in a transformed domain (here, will be the starlet transform [7]). controls the trade-off between the data fidelity and the sparsity terms. It can be decomposed into where is a diagonal matrix of the regularization parameters and is a matrix used to introduce individual penalization coefficients in the context of reweighted [8].

1.2 Sparse BSS in practice

Since sparse BSS requires solving a penalized matrix factorization problem, the separation quality strongly depends on the optimization strategy. Different algorithmic frameworks have been used so far: projected Alternate Least-Square (ALS - [2]), PALM [10] and Block-Coordinate Descent (BCD - [9], which is not studied here due to a high computational burden):

  • PALM algorithm: PALM is an iterative algorithm, which alternates at each iteration between a proximal gradient step on and . PALM is proved to converge to a local minimum of (1) under mild conditions [10].

  • GMCA algorithm: the Generalized Morphological Component Analysis (GMCA - [3]) algorithm is based on projected ALS. At each iteration (k), the gradient step appearing in PALM is replaced by a multiplication by a pseudo-inverse. Compared to PALM, GMCA cannot be proved to converge. Furthermore, even when stabilizing the output of GMCA is not really minimizing (1).

In practice, the solution of sparse BSS methods is highly sensitive to the initial point and the values of the regularization parameters, which are generally tricky to tune without any first guess of the solution. Thanks to heuristics, GMCA benefits from an automatic thresholding strategy (see Sec.2.1

) which has been empirically shown to improve its robustness with respect to the initialization and give good estimations of

and . Such heuristics do not exist in PALM.

1.3 Contributions

While PALM is theoretically well rooted and yields rather fast minimization schemes (in contrast to BCD), it generally provides poor separation results. We show how PALM-based implementations can benefit from the information provided by heuristic approaches which are in practice more robust. The robustness of the proposed combined strategy is demonstrated on realistic astrophysical data.

2 Complexity of introducing heuristics in PALM

Building on the automatic thresholding strategy of GMCA, the goal of this part is to try to derive one for PALM.

2.1 Automatic parameter choice in GMCA

In GMCA, the threshold choice is performed computing the Median Absolute Deviation (MAD) for each currently estimated source . The corresponding threshold (which changes during the iterations) is set to a multiple of this value:


In this equation and in the remaining of this section,

was supposed to be the identity matrix without loss of generality.

is a positive number and the operator is computed row-by-row with for . Using this strategy, the threshold choice can be interpreted as a dense noise removal.

2.2 Introducing heuristics in PALM

Let us assume that the thresholds in PALM are computed the same way as in GMCA through Eq. (2) and that during the iterations the algorithm finds estimates that are close to both the true matrices and . Then, due to the assumption of sparsity:


Therefore, using the MAD enables a thresholding of a projection of the noise , which yields a similar interpretation as in GMCA. However, this interpretation requires that the algorithm finds good estimates of and during the iterations (contrary to GMCA, which does not require to find a good estimate of for the interpretation to be verified). In practice, using directly this strategy within a PALM algorithm therefore does not yield good estimates because when initialized with a random initialization the algorithm never becomes close to such good estimates.

3 Combining GMCA and PALM: a hybrid strategy

In this part, we propose to combine the best of GMCA and PALM in a two step approach. The algorithm comprehends a warm-up stage, in which GMCA is performed, followed by a refinement stage during which PALM is performed retaining as much information as possible coming from the warm-up stage.

3.1 Motivation and full description of the algorithm

Our approach is motivated by several remarks: i) PALM theoretical background: in particular, the 2-step algorithm is thus proved to converge and, once the weights estimated by the warm-up stage, it truly looks for a critical point of Eq. (1); ii) GMCA robustness with regards to initialization, which will lead to a robust 2-step algorithm; iii) Benefit from GMCA solution:

  • Since both and are close to and , they can be used to derive the thresholds using the MAD according to the previous section.

  • should already give a good approximation of the most prominent peaks. This can be exploited in the refinement stage through the introduction of reweighted L1 [8] using the reweighting matrix in problem (1): , with a small constant, the coefficient of corresponding to the ith line and jth column and the ith line of .

These remarks lead to the following 2-step algorithm:

Input : (data matrix) Random initialization and Warm-up stage: , = GMCA(,,) Refinement stage: , = PALM(,,) The initialization, thresholding strategy and reweighting information come from the warm-up stage.

4 Experiment on realistic data

4.1 Data description and experimental protocol

The goal of this part is to apply our algorithm on realistic data to show its efficiency. The sources come from simulations obtained from real data of Cassiopeia A supernova remnant. They each consists in a 2D image of resolution 128 128 pixels, supposed to be approximately sparse in the starlet domain. The mixing is performed through a

matrix drawn randomly following a standard normal distribution and modified to have unit columns. Its condition number is

. There is observations. To increase the realism of the data and further test the algorithm, we tried three relatively low SNR values: 10, 15 and 20 dB. is set to 3, which corresponds to a classical hypothesis in terms of Gaussian noise removal.

4.2 Empirical results

10 dB 15dB 20 dB
2 step 11.57 16.92 22.09
PALM 9.38 10.94 11.01
GMCA 8.87 12.15 15.87
EFICA -6.92 5.11 9.41
RNA -6.68 -5.49 6.27
Table 1: Average (10 different initializations) for 3 SNR values and 5 algorithms.

The mean of of the mixing matrix criterion [11] is displayed in Table 1. The 2-step approach always achieve better results than the two classical BSS algorithms with which we performed the comparison, namely Relative Newton Algorithm (RNA) and Eficient FastICA (EFICA). It also outerpeforms both GMCA and a PALM using directly the MAD heuristic, being always better by at least 2 dB than the best of them.
In addition to the results displayed in Fig. 1

, the standard deviation of

over different initializations is almost 0, which shows the robustness of the algorithm.


In this work, we introduce a 2-step strategy combining PALM with robust heuristic methods such as GMCA. Beyond improving the robustness of PALM-based implementations with respect to initialization, the regularization parameters can be automatically set in the proposed approach. Numerical experiments on realistic data demonstrate a high separation quality and good robustness on mixings with low SNR.


This work is supported by the European Community through the grant LENA (ERC StG - contract no. 678282).


  • [1] Pierre Comon and Christian Jutten.

    Handbook of Blind Source Separation: Independent component analysis and applications

    Academic Press, 2010.
  • [2] Nicolas Gillis and François Glineur. Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization. Neural Computation, 24(4):1085–1105, 2012.
  • [3] Jérôme Bobin, Jean-Luc Starck, Yassir Moudden, and Mohamed Jalal Fadili. Blind source separation: The sparsity revolution. Advances in Imaging and Electron Physics, 152(1):221–302, 2008.
  • [4] Michael Zibulevsky and Barak A Pearlmutter. Blind source separation by sparse decomposition in a signal dictionary. Neural Computation, 13(4):863–882, 2001.
  • [5] Alexander M Bronstein, Michael M Bronstein, Michael Zibulevsky, and Yehoshua Y Zeevi. Sparse ICA for blind separation of transmitted and reflected images. International Journal of Imaging Systems and Technology, 15(1):84–91, 2005.
  • [6] Yuanqing Li, Shun-Ichi Amari, Andrzej Cichocki, Daniel WC Ho, and Shengli Xie. Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2):423–437, 2006.
  • [7] Jean-Luc Starck, Fionn Murtagh, and Jalal M Fadili. Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge University Press, 2010.
  • [8] Emmanuel J Candes, Michael B Wakin, and Stephen P Boyd. Enhancing sparsity by reweighted minimization. Journal of Fourier analysis and applications, 14(5-6):877–905, 2008.
  • [9] Paul Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001.
  • [10] Jérôme Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1-2):459–494, 2014.
  • [11] Jerome Bobin, Jeremy Rapin, Anthony Larue, and Jean-Luc Starck. Sparsity and adaptivity for the blind separation of partially correlated sources. IEEE Transanctions on Signal Processing, 63(5):1199–1213, 2015.