I Introduction
Ia Functional brain connectivity
The analysis of neural connectivity plays a crucial role for understanding the general functioning of the brain. In the past two decades such analysis has become possible thanks to tremendous progress that has been made in the fields of neuroimaging and mathematical modeling. Today, a multiplicity of imaging modalities exists, allowing to monitor brain dynamics at different spatial and temporal scales.
Given multiple simultaneouslyrecorded timeseries reflecting neural activity in different brain regions, a functional (taskrelated) connection (sometimes also called information flow or (causal) interaction in this paper) between two regions is commonly inferred, if a significant timelagged influence between the corresponding timeseries is found. Different measures have been proposed for quantifying this influence, most of them being formulated either in terms of the crossspectrum (e.g., coherence, phase slope index [1]
) or an autoregressive models (e.g., Granger causality
[2], directed transfer function [3], partial directed coherence [4], [5]).IB Volume conduction problem in EEG and MEG
In electroencephalography (EEG) and magnetoencephalography (MEG), sensors are placed outside the head and the problem of volume conduction arises. That is, rather than measuring activity of only one brain site, each sensor captures a linear superposition of signals from all over the brain. This mixing introduces instantaneous correlations in the data, which can cause traditional analyses to detect spurious connectivity [6].
IC Existing source connectivity analyses
Only recently, methods have been brought up, which qualify for EEG/MEG connectivity analysis, since they account for volume conduction effects. These methods can roughly be divided as follows.
One type of methods aims at providing meaningful connectivity estimates between sensors. The idea here is, that only the real part of the crossspectrum and related quantities is affected by instantaneous effects. Thus, by using only the imaginary part, many traditional coupling measures can be made robust against volumeconduction [6, 1].
Another group of methods attempts to invert the mixing process in order to apply standard measures to the obtained source estimates. These methods can be further divided into (i) sourcelocalization approaches (where sources are obtained as solutions to the EEG/MEG inverse problem), (ii) methods using statistical assumptions, and (iii) combined methods. The first approach is pursued, for example, in [7, 8]. Methods in the second category can be appealing, since they avoid finding an explicit inversion of the physical forward model. Instead, both the sources and the (de)mixing transformation are estimated. To make such decomposition unique, assumptions have to be formulated, the choice of which is not so straightforward. We will now briefly review some possibilities for such assumptions.
Principal component analysis (PCA) and independent component analysis (ICA) are the most prominent linear decomposition techniques for multivariate data. Unfortunately, these methods contradict either with the goal of EEG/MEG connectivity analysis (assumption of independent sources in ICA^{1}^{1}1Although, under some circumstances this approach can be justified [9].) or even with the physics underlying EEG/MEG generation (assumption of orthogonal loadings in PCA). Nevertheless, both concepts have been successfully used in more sophisticated ways to find meaningful EEG/MEG decompositions.
For example, an interesting use of ICA is proposed in [10]. The authors of this paper do not assume independence of the source traces, but rather argue that this property holds for the residuals of an MVAR model if no instantaneous correlations in the data exist. Hence, in their MVARICA approach they apply ICA to the residuals of a sensorspace MVAR model.
In this work, we first propose a singlestep procedure to estimate all parameters (i.e. the mixing matrix and MVAR coefficients) of the linear mixing model of MVAR sources
[10] based on temporaldomain convolutive ICA, instead of the combination of MVAR parameter fitting and demixing by instantaneous ICA. Furthermore, the approach enables us to integrate a sparsity assumption on brain connectivity, i.e. on interactions between underlying brain sources via the Group Lasso penalty. The additional sparsity prior can avoid overfitting in practical applications and yields more interpretable estimators of brain connectivity. We remark that it is hard to incorporate such sparsity priors in MVARICA, since MVAR is fit to the sensor signals where interactions (i.e. MVAR coefficients) are not at all sparse due to the volume conduction.The remainder of the paper is organized as follows. In Section II, our procedure will be explained step by step. The correlated source model assumed in this paper will be defined in IIB. The identification procedure called connected sources analysis (CSA) based on the convolutive ICA will be introduced (IIC) and followed by its sparse version, sparse connected sources analysis (SCSA) with the Group Lasso prior (IID). The relations of our methods with existing approaches such as MVARICA [10] and CICAAR [11] will be elucidated in detail (IIE). Finally, the optimization algorithms for CSA and SCSA will be explained (IIF). We implemented two versions for SCSA, one based on LBFGS and the other by EM algorithm which is slower, but numerically more stable. The next section III will provide our experimental results on simulated data sequences emulating realistic EEG recordings. The plausibility of our correlated source model will be discussed with future research directions in the context of computational neuroscience (Section IV), before the concluding remarks (Section V).
Ii Connected Sources Analysis with Sparsity Prior
Iia MVAR for modeling causal interactions
Autoregressive (AR) models are frequently used to define directed “Grangercausal” relations between timeseries. The original procedure by Granger involves the comparison of two models for predicting a time series , containing either past values of and , or only [2]. If involvement of leads to a lower prediction error, (Grangercausal) information flow from to is inferred. Since this may lead to spurious detection of causality if both and are driven by a common confounder , it is advisable to include the set of all other observable time series in both models.
It has been pointed out, that pairwise analysis can be replaced by fitting one multivariate autoregressive (MVAR) model to the whole dataset, and that Grangercausal inference can be performed based on the estimated MVAR model coefficients (e.g., [12, 5]). Several connectivity measures are derived from the MVAR coefficents [3, 4]
, but probably the following definition is most straightforward from Granger’s argument that the cause should always precede the effect. We say that time series
has a causal influence on time series if the present and past of the combined time series and can better predict the future of than the present and past of alone. In the bivariate case this is equivalent to saying that for at least one , the coefficient corresponding to the interaction between and at the th timelag is nonzero (significantly different from zero). In the multivariate case, Granger causality also includes indirect causes not contained in nonvanishing .IiB Correlated sources model
In this paper we propose a method for demixing the EEG/MEG signal into causally interacting sources. We start from the same model as in [10]: the sensor measurement is assumed to be generated as a linear instantaneous mixture of sources, which follow an MVAR model
(1)  
(2) 
Here, is the EEG/MEG signal at time , is a mixing matrix representing the volume conduction effect, is the demixed (source) signal. The sources at time are modeled as a linear combination of their past values plus an innovation term , according to an MVAR model with coefficient matrices . In the standard MVAR analysis, the innovation is a temporally and spatiallyuncorrelated Gaussian sequence. In contrast, we assume here that it is i.i.d.
in time and the components are subject to nonGaussian distributions in order to apply blind source separation (BSS) techniques based on higherorder statistics
[10, 11].For simplicity, we deal with the case that the numbers of sensors and sources are equal and the mixing matrix is invertible. When there exist less sources than sensors, the problem falls into the current setting after being preprocessed by PCA [10]. Under our model assumptions, the innovation sequence can be obtained by a finite impulse response (FIR) filtering of the observation, i.e.
(3)  
(4) 
where the filter coefficients are determined by the mixing matrix and the MVAR parameters as
(5) 
Thanks to the nonGaussianity assumption on the innovation , we can use BSS techniques based on higherorder statistics to identify the inverse filter . Since we would like to impose sparse connectivity as a plausible prior information later on, it is preferable to apply temporaldomain convolutive ICA algorithms. The obtained FIR coefficients directly identify the mixing matrix and the MVAR model of the same order .
IiC Identification by convolutive ICA
We use temporaldomain convolutive ICA for inferring volume conduction effects and causal interactions between extracted brain signals. The model parameters can be identified based on the mild assumptions that the innovations are nonGaussian and (spatially and temporally) independent. For EEG and MEG data, a superGaussian is prefered to a subGaussian distribution, assuming that ongoing activity of brain networks is triggered by spontaneous local bursts. We here adopt the superGaussian sechdistribution that was proposed in [11]. The Likelihood of the data under the model is then
(6)  
where . The cost function to be minimized is the negative logLikelihood
(7)  
The solution of Eq. ((7)) leads to the estimators of the mixing matrix and the MVAR coefficients via Eq. ((5)). We will call this procedure Connected Sources Analysis (CSA).
We remark that the temporaldomain algorithm of convolutive ICA has obvious indeterminacy due to permutations and sign flips. However, once we fix a rule to chose one from all candidates, the cost function can be considered as convex.
IiD Sparse connectivity as regularization
In practice, we usually have to consider a longrange lag to explain temporal structures of data sequences. However, this causes too many parameters to be estimated reliably. MaximumLikelihood estimation may easily lead to overfitting, especially if is small. For this reason, it is advisable to adopt a regularization scheme. Several authors have suggested that the complexity of MVAR models can be reduced by shrinking MVAR coefficients towards zero. In [12] and [13], MVARbased functional brain connectivity is estimated from functional magnetic resonance imaging (fMRI) recordings using an norm based (Lasso) penalty, which has the property of shrinking some coefficients exactly to zero. In [5] it is pointed out, that, by using a socalled Group Lasso penalty, whole connections between timeseries can be pruned at once. In this approach, all coefficients modeling the information flow from to are grouped together and can only be pruned jointly. From the practical standpoint such sparsification is very appealing, since fewer connections are much easier to interpret. But assuming sparse connectivity in fMRI data might also be justified from a neurophysiological point of view, since under appropriate experimental conditions only a few macroscopic brain areas are expected to show significant interaction. This reasoning also applies to EEG and MEG data.
We note that, besides the penaltybased approach, other strategies for obtaining sparse connectivity graphs exist. For example, posthoc sparsification can be achieved for dense estimators by means of statistical testing [14, 5]. However, due to the compelling builtin regularization, we here adopt Group Lasso sparsification.
Before applying our regularization to the cost function of the correlated sources model, it is important to note that the sparsity assumption is only reasonable for the MVAR coefficients , but not for the matrices which combine MVAR coefficients and the instantaneous demixing. Hence, in order to apply sparsifying regularization, one has to split the parameters into demixing and MVAR parts again, as in the original model Eq. ((1)). Since the offdiagonal elements correspond to interaction between sources, we propose to put a Group Lasso penalty on them analogously to [5]. I.e., we penalize the sum of the norms of each of the groups .
Let , and . The regularized cost function is
(8)  
being a positive constant. The solution to Eq. ((8)) for a choice of is called the SparselyConnected Sources Analysis (SCSA) estimate.
IiE Relation to other methods
The proposed method extends previously suggested MVARbased sparse causal discovery approaches [5, 12] by a linear demixing, which is appropriate for EEG/MEG connectivity analysis. Although the correlated sources model Eq. (1) leads to an MVAR model of the observation sequence [10], sparsity of the coefficients cannot be expected after mixing by volume conduction effects. Our method compares with MVARICA [10], which uses the same model Eq. (1), but estimates its parameters differently. More precisely, the authors of MVARICA suggest to first fit an MVAR model in sensorspace. The demixing can then be obtained by performing instantaneous ICA on the MVAR innovations, i.e., a dedicated contrast function (Infomax) is used to model independence of the innovations. The obtained sources follow an MVAR model with timelagged effects (interactions), but ideally no instantaneous correlations (as caused by volume conduction).
It also turns out that the model Eq. (1) is very similar to the convolutive ICA (cICA) [15, 16, 17, 11] model. The only difference is that Eq. (1) employs a FIR filter to extract the innovations, while an infinite response filter (IIR) is usually used in the cICA literature (see, e.g., [11]). This discrepancy is explained by the different philosophies that are associated with both methods. While in our approach the innovations arise as residuals of a finitelength sourceMVAR model, cICA understands them as sources of a finitelength convolutional mixture. Nevertheless, our unregularized cost function can be regarded as a maximumLikelihood approach to an IIR version of convolutive ICA. This leads us also to a new view of convolutive ICA as performing an instantaneous demixing into correlated sources. Hence, it is possible to conduct source connectivity analysis using cICA (see Fig. 1 for illustration).
Compared to MVARICA and timedomain implementations of convolutive ICA such as CICAAR [11], our formulation has the advantage that sparse connectivity can easily be modeled by an additional penalty. This is not possible for CICAAR, because CICAAR only indirectly estimates the MVAR coefficients through their inverse filters. However, these are generally nonsparse, even if the true connectivity structure is sparse. Inverting the inverse coefficients is also generally not possible (recall, that convolutive ICA is equivalent to an infinitelength sourceMVAR model). It is furthermore not possible to introduce a sparse regularization for MVARICA, since this method carries out the MVARestimation step in sensorspace, where no sparsity can be assumed.
By variation of the regularization parameter, our method is able to interpolate between a fullycorrelated source model (comparable to convolutive ICA) and a model which allows no crosstalk between sources. Interestingly, the latter extreme can be seen as a variant of traditional instantaneous ICA, in which independence is measured in terms of mutual predictability with a Grangertype criterion.
(a)  (b) 
(c) 
IiF Optimization
IiF1 Csa
The gradient of the unregularized cost function Eq. (7) is obtained as
(9)  
where , i.e. the
th column vector of
.We plug the gradient into a limited memory Broyden Fletcher Goldfarb Shanno (LBFGS) optimizer [18]^{2}^{2}2We use an implementation by Naoaki Okazaki, http://www.chokkan.org/software/liblbfgs/. and observe that the algorithm always converges to the global optimum, while only the signs and order of the components may depend on the initialization. We use and as a default initializer.
IiF2 SCSA via a modified LBFGS algorithm
Using sparse regularization, two additional difficulties emerge compared to the unregularized cost function. First, using the factorization Eq. (5) the guaranteed convergence to the minimum observed for CSA is unlikely to be retained. Furthermore, the function Eq. (8) is not differentiable, when one of the terms becomes zero, which is expected to be the case at the optimum.
For tackling these difficulties we here propose to use a modified version of the LBFGS algorithm, which allows joint nonlinear optimization of and , while taking special care of the nondifferentiability of the regularizer. The gradient of Eq. (8) is obtained as
(10)  
and
(11)  
Our modified LBFGS algorithm checks before each gradient evaluation, whether some terms are already (close to) zero. If any of the terms equals zero, the gradient is not defined uniquely but as a set (subdifferential). Nevertheless it is straightforward to compute the element of the subdifferential with minimum norm, whose sign inversion is always a descent direction. Care must be taken because in practice we would not find any of the above terms exactly equal to zero. Thus we truncate the elements of corresponding to the terms with small norms below some threshold to zero before computing the minimum norm subgradient. If the minimum is indeed attained at the truncated point, the minimum norm subgradient will be zero. Otherwise the subgradient will drive the solution out of zero. Further care must be taken in practice to prevent the solution from oscillating in and out of some zero.
We find that, using the outlined optimization procedure, sparse solutions can be found in shorter time, if the solution of the unregularized cost function is used as the initializer. The starting point can be obtained using the inverse transformation of Eq. (5), which is given by
(12)  
(13) 
IiF3 SCSA via an EM algorithm
Using joint optimization of and
, the heuristic pruning of connections might in some cases lead to suboptimal solutions regarding the composite cost function. For this reason, we present an alternative optimization scheme, which does not require any heuristic step. The idea here is to alternate between the estimation of both unknowns. Doing so can be justified as an application of the Expectation Maximization (EM) algorithm (see
[19]).Estimation of given (here called Estep) amounts to solving an unconstrained nonlinear optimization problem. Importantly, this problem is also convex, in contrast to the joint approach to SCSA parameter fitting. The convexity follows from the concavity of and for constant (and from the fact that the sum of convex functions is convex.). The great advantage of convex problems is, that they feature a unique (local and global) minimum. In our case, the objective is smooth, so the minimum is guaranteed to be found by the LBFGS algorithm, making use of the gradient in Eq. (11).
Optimization with respect to for fixed (Mstep) is more involved, since the nondifferentiable Group Lasso regularizer remains. Smooth optimization methods like LBFGS are unlikely to find the exact solution here. However, this problem is not as difficult as the joint optimization problem, since it is convex. This can be seen from the fact that it is composed of a sum of
terms (loss function) and the Group Lasso term (regularizer), which is a sum of
norms and thus convex. Hence we can solve this problem using the Dual Augmented Lagrangian (DAL) procedure [20], which has recently been introduced as a method for minimizing arbitrary convex loss functions with additional Lasso or Group Lasso penalties. Application of DAL requires the loss function and its gradient, the convex conjugate (Legendre transform) of the loss function, as well as gradient and Hessian of the conjugate loss. Let be the demixed sources and be their autoregressive approximations. The loss function in terms of is defined as(14) 
The gradient is
(15) 
Let (, ) denote the dual variables associated with the Legendre transform. The conjugate loss function is defined on the interval and evaluates to
(16) 
The gradient of the conjugate loss is given by
(17) 
The Hessian is diagonal with elements
(18) 
Having defined the E and Msteps, we have turned a nonconvex estimation problem into a sequence of two convex problems, which can both be solved exactly. A final estimate of the model parameters can now be obtained by alternating between E and Msteps until convergence.
IiG Treating source autocorrelations
Diagonal parts of the MVAR matrices model the sources’ autocorrelation and should preferably not be pruned. However, in some cases numerical stability can be increased if these variables are also penalized, especially if and are large. For this reason, we use a slight variation of the cost function Eq. (8) in practice, which includes
(19) 
as an additional penalty term. The augmented objective function can be minimized using the techniques presented in Section IIF.
Iii Performance under realistic conditions
We conducted the following simulations in order to assess the performance of the proposed source connectivity analysis compared to those of existing approaches.
Iiia Data generation
We simulated seven timeseries (pseudosources) of length according to an MVAR model of order . Seven out of the fortytwo possible interactions were modeled by allowing the corresponding offdiagonal MVAR coefficients to be nonzero. The innovations were drawn from the sechdistribution (Note that the assumption of nonGaussianity is crucial for recovering mixed sources.).
The pseudosources were mapped to 118 EEG channels using the theoretical spread of seven randomly placed dipoles. The spread was computed using a realistic forward model [21] which was built based on anatomical MR images of the “Montreal head” [22]. See Fig. 2 for an example illustrating the data generation.
In reality, measurements are never noisefree and the following model holds rather than Eq. (1)
(20) 
Since none of the methods compared here (see below) explicitly models a noise term, it is important to evaluate their robustness to model violation. To this end, we constructed additional variants of the pseudoEEG dataset by adding six different types of noise . The six variants (N1N6) are summarized in TABLE I. These variants differ in their degree of spatial and temporal correlation as follows. In variants N1 and N4, were drawn independently for each sensor, i.e., have no spatial correlation. For variants N2 and N5 noise terms were drawn independently for each source. In this case, sources and noise contributions to the EEG share the same covariance given by the mixing matrix , i.e., . For the last variants N3 and N6, spatially independent noise sources were simulated at all nodes of a grid covering the whole brain, yielding the model . Here, in contrast to the previous model, noise contributions are not collinear to the sources. We further distinguish between noise sources with and without temporal structure. In variants (N1N3), noise terms were drawn
from a normal distribution at each time instant
. In variants N4N6, the temporal structure was determined by a univariate AR model of order 20, i.e., .Note that, since no timedelayed dependencies between noise sources were modeled, no additional Grangercausal effects were introduced by the noise. We used a signaltonoise ratio (SNR) of 2 in all experiments, where SNR is defined as
(21) 
and is the Frobenius norm of a matrix.
Finally, PCA was applied to the pseudoEEG to reduce the dimensionality to (the original number of sources) by taking just the seven strongest PCA components. Onehundred datasets with different realisations of MVAR coefficients, innovations and noise were constructed for each category.
.
independent in time  correlated in time  

independent in sensors  N1  N4 
correlated in sensors^{a}  N2  N5 
correlated in sensors^{b}  N3  N6 
IiiB Methods
We tested the ability of ICA, MVARICA, CICAAR and the two proposed methods CSA and SCSA to reconstruct the seven sources and their connectivity structure. Although the goal of instantaneous ICA is fundamentally different to source connectivity analysis, it was also included here in the comparison. This is since, even if independence of the sources is not fulfilled, ICA might still provide asleastaspossible dependent components, the connectivity of which might be analyzed. The ICA variant used here is based on temporal decorrelation [23, 24, 25, 26] (implemented by fast approximate joint diagonalization [27]). The number of temporal lags was set to 100.
MVARICA, CICAAR, CSA and SCSA were tested with temporal lags, where four is the true MVAR model order for CSA, SCSA and MVARICA. CICAAR has the disadvantage here, that it may generally require extended temporal filters for reconstructing sources following model Eq. (1). However, due to computation time constraints, was taken as the maximum lag also for this method. For MVARICA and CICAAR, we used implementations provided by the respective authors. These implementations adopt the Bayesian Information Criterion (BIC) for selecting the appropriate number of time lags. The same criterion was used to select the model order in CSA and SCSA. The regularization constant of SCSA was set by fold crossvalidation. SCSA estimates of and were obtained either jointly using the modified LBFGS algorithm or alternately using 20 additional EM steps. These variants are named SCSA and SCS_EM here, respectively.
IiiC Performance measures
The most important performance criterion is the reconstruction of the mixing matrix, since all other relevant quantities can basically be derived from it. All considered methods provide an estimate of the demixing, which can be inverted to yield an estimated mixing matrix. The columns of the mixing matrix correspond to spatial field patterns of the estimated sources, but unfortunately these patterns can generally only be determined up to sign, scale and order. For this reason, optimal pairing of true and estimated patterns as described in [28] was performed. The similarity measure between patterns was slightly modified compared to the one used in [28]. We used the goodnessoffit achieved by a linear leastsquares regression of one to another pattern. For a true pattern and an estimated pattern the optimal regression coefficient is
(22) 
and the goodnessoffit (GOF) is
(23) 
Having found the optimal pairing, the colums of were permuted and scaled to approximate as good as possible using the optimal regression coefficients. The goodnessoffit with respect to the whole matrix was used to evaluate the quality of the different decompositions. Additionally, using the optimallymatched mixing patterns, dipole scans were conducted and the deviation of the obtained dipole locations from the true ones was measured. A typical example of a mixing pattern estimated by SCSA and the corresponding reconstructed dipole is shown in Fig. 2.
Finally, causal discovery according to [5]
was carried out on the demixed sources. The exact technique used was MVAR estimation with Ridge Regression. For the MVAR parameters estimated by Ridge Regression an approximate multivariate Gaussian distribution can be derived, which was used to test the coefficients for beeing significantly different from zero. An influence from
to was defined, if the value of one of the coefficients fell below the critical value. As a third performance criterion, the area under curve (AUC) score for correctly discovering the interaction structure was calculated by varying the significance threshold and comparing estimated and true connectivity matrix for each threshold. Note that this way of connectivity estimation was pursued here in order to treat all methods equally. This was necessary, since not all methods provide builtin connectivity estimates. For SCSA, however, interaction analysis could as well have been done by directly examining MVAR coefficients. Note further, that using Ridge Regression based testing, the nonGaussianity of the source MVAR innovations is only indirectly used through the use of the demixing matrix, but not for actual MVAR estimation. For this reason, the MVAR coefficients directly estimated by SCSA may be preferred to a subsequent Ridge Regression step when using SCSA in practice.IiiD Results
Fig. 3 shows, how well the mixing matrix was approximated by the different approaches. One boxplot is drawn for the noiseless case (N0) and each of the six noisy variants (N1N6, see Table I
). The plots show the median performance over 100 repetitions, as well as the lower and upper quartiles and the extremal values. Outliers (red crosses) were removed. As a result of the simulations, SCSA typically achieves the smallest reconstruction error, followed by CSA, CICAAR, MVARICA and ICA. In many cases, these differences are also significant, as indicated by notches in the boxes.
Correct (de)mixing matrix estimation affects both the localization error achievable by applying inverse methods to the estimated patterns and the error of any connectivity analysis performed at the demixed sources. As a result of good mixing matrix approximation, SCSA also achieves smaller dipole localization errors than all other methods, except in one scenario (shown in Fig. 4). The same situation occurs when it comes to estimating the connectivity between sources (Fig. 5).
Interestingly, the higher numerical stability we observed for the EM variant of SCSA compared to joint parameter estimation only sometimes leads to superior performance. This may be related to our observation, that the difference between the implementations becomes large only for excessively large amounts of regularization, which are not optimal in terms of the crossvalidation criterion. Another reason might be that the instability of the MVAR coefficients around zero does not play a crucial role in our current evaluation, since all performance measures used here were solely derived from the demixing matrix.
Regarding noise influence it might be said that the relative degradation of performance in the presence of noise is the same for all methods. Generally, noise that is collinear to the sources (N2/N5) seems to be less problematic than noise that is uncorrelated across sensors (N1/N4) and noise with arbitrary spatial correlation structure (N3/N6). Judging from mixing matrix approximation and dipole localization errors, the temporal structure of the noise seems not to affect the performance much. However, small errors in the (de)mixing matrix can have quite a negative effect on the connectivity estimation, as can be seen in the right part of Fig. 5.
The time each method consumed on average for processing one dataset is shown in Fig. 6. Most methods finish in rather short time, while the EM implementation of SCSA is in medium range and CICAAR requires the longest time. However, for SCSA there is still room for improvement, since the regularization parameter of this method is currently selected by the crossvalidation procedure, which could be changed.
Iv Discussion
Let us recall the assumptions we make to identify individual brain sources and to estimate their interactions. While ICA results in a unique decomposition assuming statistical independence, such an assumption is inconsistent when studying brain interactions. However, all neural interactions require a minimum delay well within the temporal resolution of electrophysical measurements of brain activity. Hence, it makes sense to assume independent innovation processes and to model all interactions explicitly using AR matrices. In relation to ICA we pay some price for that: In our case, independence is exploited effectively on reduced information contained in the residuals of the model. In principle, this can be a cause for less stable estimates. To increase stability, we have included sparsity assumptions based on the idea that only a few brain connections can be as strong to be observable in EEG data which is especially the case in the presence of artifacts and background noise.
We emphasize that we assume a linear dynamical model and nonGaussian innovation processes, i.e. the only cause of nonGaussianity is the innovation process itself. Real brain networks are, of course, more complicated. However, the question whether nonlinear dynamical models may improve the results or are even essential for a correct decomposition is beyond the scope of this paper and will be addressed in the future. Similarly, we assumed the total number of sources to be less or equal the number of channels. Apparently, the significance of this problem decreases when using a large number of channels.
V Conclusion
Analysing the functional brain connectivity is a hard problem, since volume conduction effects in EEG/MEG measurements can give rise to spurious conductivity. In this work we have established a novel connectivity analysis method SCSA that overcome these problems in an elegant and numerically appealing manner using Group Lasso. In detail, EEG is modeled as a linear mixture of correlated sources, then we estimate jointly the demixing process and the MVAR model (which is the model basis for the correlated sources). For this we assume that the innovations driving the source MVAR process are superGaussian distributed and (spatially and temporally) independent. To avoid overfitting we regularize the model using the Group Lasso penalty. In this manner we can achieve a data driven interpolation between two extremes: a source model that has full correlations, i.e. convolutive ICA and conventional ICA that does not allow for crosstalk between the extracted sources. In between, our method extracts a sparse connectivity model. We demonstrate the usefulness of SCSA with simulated data, and compare to a number of existing algorithms with excellent results.
Future work will study the link between methods for compensating nonstationarity in data such as Stationary Subspace Analysis (SSA, [29]) and our novel connectivity assessment. In addition, we aim to localize the extracted components of connectivity using distributed source models to enhance physiological interpretability (e.g. [30, 31]).
Acknowledgement
This work was partly supported by the Bundesministerium für Bildung und Forschung (BMBF), Fkz 01GQ0850 and by the European ICT Programme Project FP7224631 and 216886. We thank Germán Gómez Herrero and Mads Dyrholm for making the source code of their algorithms available, and Nicole Krämer for discussions.
References
 [1] G. Nolte, A. Ziehe, V. V. Nikulin, A. Schlögl, N. Krämer, T. Brismar, and K. R. Müller, “Robustly estimating the flow direction of information in complex physical systems,” Phys. Rev. Lett., vol. 100, p. 234101, Jun 2008.
 [2] C. Granger, “Investigating causal relations by econometric models and crossspectral methods,” Econometrica, vol. 37, pp. 424–438, 1969.
 [3] M. J. Kaminski and K. J. Blinowska, “A new method of the description of the information flow in the brain structures,” Biol Cybern, vol. 65, pp. 203–210, 1991.
 [4] L. A. Baccalá and K. Sameshima, “Partial directed coherence: a new concept in neural structure determination,” Biol Cybern, vol. 84, pp. 463–474, Jun 2001.
 [5] S. Haufe, G. Nolte, K.R. Müller, and N. Krämer, “Sparse causal discovery in multivariate time series,” in Proceedings of the NIPS’08 Causality Workshop, 2009.
 [6] G. Nolte, O. Bai, L. Wheaton, Z. Mari, S. Vorbach, and M. Hallett, “Identifying true brain interaction from EEG data using the imaginary part of coherency,” Clin Neurophysiol, vol. 115, pp. 2292–2307, Oct 2004.
 [7] A. G. Guggisberg, S. M. Honma, A. M. Findlay, S. S. Dalal, H. E. Kirsch, M. S. Berger, and S. S. Nagarajan, “Mapping functional connectivity in patients with brain lesions,” Ann. Neurol., vol. 63, pp. 193–203, Feb 2008.
 [8] L. Astolfi, F. Cincotti, D. Mattia, C. Babiloni, F. Carducci, A. Basilisco, P. M. Rossini, S. Salinari, L. Ding, Y. Ni, B. He, and F. Babiloni, “Assessing cortical functional connectivity by linear inverse estimation and directed transfer function: simulations and application to real data,” Clin Neurophysiol, vol. 116, pp. 920–932, Apr 2005.
 [9] L. Astolfi, H. Bakardjian, F. Cincotti, D. Mattia, M. G. Marciani, F. De Vico Fallani, A. Colosimo, S. Salinari, F. Miwakeichi, Y. Yamaguchi, P. Martinez, A. Cichocki, A. Tocci, and F. Babiloni, “Estimate of causality between independent cortical spatial patterns during movement volition in spinal cord injured patients,” Brain Topogr, vol. 19, pp. 107–123, 2007.
 [10] G. GómezHerrero, M. Atienza, K. Egiazarian, and J. L. Cantero, “Measuring directional coupling between EEG sources,” NeuroImage, vol. 43, pp. 497–508, Nov 2008.
 [11] M. Dyrholm, S. Makeig, and L. K. Hansen, “Model selection for convolutive ICA with an application to spatiotemporal analysis of EEG,” Neural Comput, vol. 19, pp. 934–955, Apr 2007.
 [12] P. A. ValdésSosa, J. M. SánchezBornot, A. LageCastellanos, M. VegaHernández, J. BoschBayard, L. MelieGarcía, and E. CanalesRodríguez, “Estimating brain functional connectivity with sparse multivariate autoregression,” Philosophical Transactions of the Royal Society B, vol. 360, pp. 969–981, 2005.
 [13] J. M. SánchezBornot, E. MartínezMontes, A. LageCastellanos, M. VegaHernández, and P. A. ValdésSosa, “Uncovering sparse brain effective connectivity: A voxelbased approach using penalized regression,” Statistica Sinica, vol. 18, no. 4, 2008.
 [14] D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel method for nonlinear Granger Causality,” Phys. Rev. Lett., vol. 100, p. 144103, 2008.
 [15] H. Attias and C. E. Schreiner, “Blind source separation and deconvolution: the dynamic component analysis algorithm,” Neural Comput, vol. 10, pp. 1373–1424, Aug 1998.
 [16] L. Parra and C. Spence, “Convolutive blind source separation of nonstationary sources,,” IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 320–327, 2000.
 [17] J. Anemüller, T. J. Sejnowski, and S. Makeig, “Complex independent component analysis of frequencydomain electroencephalographic data,” Neural Netw, vol. 16, pp. 1311–1323, Nov 2003.
 [18] J. Nocedal, “Updating quasinewton matrices with limited storage,” Mathematics of Computation, vol. 35, no. 151, pp. 773–782, 1980. [Online]. Available: http://www.jstor.org/stable/2006193
 [19] R. Neal and G. E. Hinton, “A view of the em algorithm that justifies incremental, sparse, and other variants,” in Learning in Graphical Models. Kluwer Academic Publishers, 1998, pp. 355–368.
 [20] R. Tomioka and M. Sugiyama, “Dual augumented lagrangian method for efficient sparse reconstruction,” IEEE Signal Proc Let, vol. 16, no. 2, pp. 1067–1070, 2009.
 [21] G. Nolte and G. Dassios, “Analytic expansion of the EEG lead field for realistic volume conductors,” Phys. Med. Biol., vol. 50, pp. 3807–3823, 2005.
 [22] C. J. Holmes, R. Hoge, L. Collins, R. Woods, A. Toga, and A. C. Evans, “Enhancement of MR images using registration for signal averaging,” J. Comput. Assist. Tomogr., vol. 22, no. 2, pp. 324–333, 1998.
 [23] L. Molgedey and H. G. Schuster, “Separation of a mixture of independent signals using time delayed correlations,” Phys. Rev. Lett., vol. 72, pp. 3634–3637, Jun 1994.
 [24] A. Belouchrani, K. AbedMeraim, J. F. Cardoso, and E. Moulines, “A blind source separation technique using secondorder statistics,” IEEE Trans Signal Proc, vol. 45, no. 2, pp. 434–444, August 1997. [Online]. Available: http://dx.doi.org/10.1109/78.554307

[25]
A. Ziehe and K.R. Müller, “TDSEP–an efficient algorithm for blind
separation using time structure,”
Proc. Int. Conf. on Artificial Neural Networks (ICANN ’98)
, pp. 675–680, 1998.  [26] A. Ziehe, K.R. Müller, G. Nolte, and B.M. M. a nd G. Curio, “Artifact reduction in magnetoneurography based on timedelayed secondorder correlations,” vol. 47, no. 1, pp. 75–87, January 2000.
 [27] A. Ziehe, P. Laskov, G. Nolte, and K.R. Müller, “A fast algorithm for joint diagonalization with nonorthogonal transformations and its application to blind source separation,” J. Mach. Learn. Res., vol. 5, pp. 777–800, 2004. [Online]. Available: http://portal.acm.org/citation.cfm?id=1016784
 [28] P. Tichavský and Z. Koldovský, “Optimal pairing of signal components separated by blind techniques,” IEEE Signal Proc Let, vol. 11, no. 2, pp. 119–122, 2004.
 [29] P. von Bünau, F. C. Meinecke, F. Kiraly, and K.R. Müller, “Estimating the stationary subspace from superimposed signals,” Physical Review Letters, vol. 103, p. 214101, 2009.
 [30] S. Haufe, V. Nikulin, A. Ziehe, K.R. Müller, and G. Nolte, “Combining sparsity and rotational invariance in EEG/MEG source reconstruction,” NeuroImage, vol. 42, no. 2, pp. 26 –738, 2008.
 [31] S. Haufe, V. V. Nikulin, A. Ziehe, K.R. Müller, and G. Nolte, “Estimating vector fields using sparse basis field expansions,” in Advances in Neural Information Processing Systems 21, 2009.