1 Introduction
1.1 Blind Source Separation problem
In the BSS [1] framework, the data are composed of observations, each of which has samples. These observations are supposed to be some linear combinations of sources. The objective of BSS is to retrieve the sources as well as the mixing coefficients. In matrix form, the goal is therefore to find two matrices (of size ) and (of size ), called respectively the source and the mixing matrices, such that: , where (of size ) is the observation matrix that is corrupted with some unkwown noise . Since it requires tackling an illposed unsupervised matrix factorization problem, further assumptions are needed, including the statistical independance of the sources (ICA  [1]), the nonnegativity of and [2]. In this work, we will focus on the sparsity of the sources [3, 4, 5, 6]. In this framework, sparse BSS will aim at finding a (local) minimum of:
(1) 
The first data fidelity term promotes a faithful reconstruction of the data. The use of the Froebenius norm stems from the assumption of a Gaussian noise . The second term ensures that the columns of are all in the ball. This avoids degenerated and ( and ). The last term involving the Hadamard product enforces a sparsity constraint in a transformed domain (here, will be the starlet transform [7]). controls the tradeoff between the data fidelity and the sparsity terms. It can be decomposed into where is a diagonal matrix of the regularization parameters and is a matrix used to introduce individual penalization coefficients in the context of reweighted [8].
1.2 Sparse BSS in practice
Since sparse BSS requires solving a penalized matrix factorization problem, the separation quality strongly depends on the optimization strategy. Different algorithmic frameworks have been used so far: projected Alternate LeastSquare (ALS  [2]), PALM [10] and BlockCoordinate Descent (BCD  [9], which is not studied here due to a high computational burden):

GMCA algorithm: the Generalized Morphological Component Analysis (GMCA  [3]) algorithm is based on projected ALS. At each iteration (k), the gradient step appearing in PALM is replaced by a multiplication by a pseudoinverse. Compared to PALM, GMCA cannot be proved to converge. Furthermore, even when stabilizing the output of GMCA is not really minimizing (1).
In practice, the solution of sparse BSS methods is highly sensitive to the initial point and the values of the regularization parameters, which are generally tricky to tune without any first guess of the solution. Thanks to heuristics, GMCA benefits from an automatic thresholding strategy (see Sec.2.1
) which has been empirically shown to improve its robustness with respect to the initialization and give good estimations of
and . Such heuristics do not exist in PALM.1.3 Contributions
While PALM is theoretically well rooted and yields rather fast minimization schemes (in contrast to BCD), it generally provides poor separation results. We show how PALMbased implementations can benefit from the information provided by heuristic approaches which are in practice more robust. The robustness of the proposed combined strategy is demonstrated on realistic astrophysical data.
2 Complexity of introducing heuristics in PALM
Building on the automatic thresholding strategy of GMCA, the goal of this part is to try to derive one for PALM.
2.1 Automatic parameter choice in GMCA
In GMCA, the threshold choice is performed computing the Median Absolute Deviation (MAD) for each currently estimated source . The corresponding threshold (which changes during the iterations) is set to a multiple of this value:
(2) 
In this equation and in the remaining of this section,
was supposed to be the identity matrix without loss of generality.
is a positive number and the operator is computed rowbyrow with for . Using this strategy, the threshold choice can be interpreted as a dense noise removal.2.2 Introducing heuristics in PALM
Let us assume that the thresholds in PALM are computed the same way as in GMCA through Eq. (2) and that during the iterations the algorithm finds estimates that are close to both the true matrices and . Then, due to the assumption of sparsity:
(3) 
Therefore, using the MAD enables a thresholding of a projection of the noise , which yields a similar interpretation as in GMCA. However, this interpretation requires that the algorithm finds good estimates of and during the iterations (contrary to GMCA, which does not require to find a good estimate of for the interpretation to be verified). In practice, using directly this strategy within a PALM algorithm therefore does not yield good estimates because when initialized with a random initialization the algorithm never becomes close to such good estimates.
3 Combining GMCA and PALM: a hybrid strategy
In this part, we propose to combine the best of GMCA and PALM in a two step approach. The algorithm comprehends a warmup stage, in which GMCA is performed, followed by a refinement stage during which PALM is performed retaining as much information as possible coming from the warmup stage.
3.1 Motivation and full description of the algorithm
Our approach is motivated by several remarks: i) PALM theoretical background: in particular, the 2step algorithm is thus proved to converge and, once the weights estimated by the warmup stage, it truly looks for a critical point of Eq. (1); ii) GMCA robustness with regards to initialization, which will lead to a robust 2step algorithm; iii) Benefit from GMCA solution:

Since both and are close to and , they can be used to derive the thresholds using the MAD according to the previous section.

should already give a good approximation of the most prominent peaks. This can be exploited in the refinement stage through the introduction of reweighted L1 [8] using the reweighting matrix in problem (1): , with a small constant, the coefficient of corresponding to the i^{th} line and j^{th} column and the i^{th} line of .
These remarks lead to the following 2step algorithm:
Input : (data matrix) Random initialization and Warmup stage: , = GMCA(,,) Refinement stage: , = PALM(,,) The initialization, thresholding strategy and reweighting information come from the warmup stage.
4 Experiment on realistic data
4.1 Data description and experimental protocol
The goal of this part is to apply our algorithm on realistic data to show its efficiency. The sources come from simulations obtained from real data of Cassiopeia A supernova remnant. They each consists in a 2D image of resolution 128 128 pixels, supposed to be approximately sparse in the starlet domain. The mixing is performed through a
matrix drawn randomly following a standard normal distribution and modified to have unit columns. Its condition number is
. There is observations. To increase the realism of the data and further test the algorithm, we tried three relatively low SNR values: 10, 15 and 20 dB. is set to 3, which corresponds to a classical hypothesis in terms of Gaussian noise removal.4.2 Empirical results
10 dB  15dB  20 dB  
2 step  11.57  16.92  22.09 
PALM  9.38  10.94  11.01 
GMCA  8.87  12.15  15.87 
EFICA  6.92  5.11  9.41 
RNA  6.68  5.49  6.27 
The mean of of the mixing matrix criterion [11] is displayed in Table 1. The 2step approach always achieve better results than the two classical BSS algorithms with which we performed the comparison, namely Relative Newton Algorithm (RNA) and Eficient FastICA (EFICA). It also outerpeforms both GMCA and a PALM using directly the MAD heuristic, being always better by at least 2 dB than the best of them.
In addition to the results displayed in Fig. 1
, the standard deviation of
over different initializations is almost 0, which shows the robustness of the algorithm.Conclusion
In this work, we introduce a 2step strategy combining PALM with robust heuristic methods such as GMCA. Beyond improving the robustness of PALMbased implementations with respect to initialization, the regularization parameters can be automatically set in the proposed approach. Numerical experiments on realistic data demonstrate a high separation quality and good robustness on mixings with low SNR.
Acknowledgment
This work is supported by the European Community through the grant LENA (ERC StG  contract no. 678282).
References

[1]
Pierre Comon and Christian Jutten.
Handbook of Blind Source Separation: Independent component analysis and applications
. Academic Press, 2010.  [2] Nicolas Gillis and François Glineur. Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization. Neural Computation, 24(4):1085–1105, 2012.
 [3] Jérôme Bobin, JeanLuc Starck, Yassir Moudden, and Mohamed Jalal Fadili. Blind source separation: The sparsity revolution. Advances in Imaging and Electron Physics, 152(1):221–302, 2008.
 [4] Michael Zibulevsky and Barak A Pearlmutter. Blind source separation by sparse decomposition in a signal dictionary. Neural Computation, 13(4):863–882, 2001.
 [5] Alexander M Bronstein, Michael M Bronstein, Michael Zibulevsky, and Yehoshua Y Zeevi. Sparse ICA for blind separation of transmitted and reflected images. International Journal of Imaging Systems and Technology, 15(1):84–91, 2005.
 [6] Yuanqing Li, ShunIchi Amari, Andrzej Cichocki, Daniel WC Ho, and Shengli Xie. Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2):423–437, 2006.
 [7] JeanLuc Starck, Fionn Murtagh, and Jalal M Fadili. Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge University Press, 2010.
 [8] Emmanuel J Candes, Michael B Wakin, and Stephen P Boyd. Enhancing sparsity by reweighted minimization. Journal of Fourier analysis and applications, 14(56):877–905, 2008.
 [9] Paul Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001.
 [10] Jérôme Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(12):459–494, 2014.
 [11] Jerome Bobin, Jeremy Rapin, Anthony Larue, and JeanLuc Starck. Sparsity and adaptivity for the blind separation of partially correlated sources. IEEE Transanctions on Signal Processing, 63(5):1199–1213, 2015.