1 Introduction
Human brain is composed of roughly 100 billion neurons and brain functions are carried out by complex firing and interactions among the neurons, accompanied with electromagnetic, hemodynamic, and metabolic changes
[1]. As the electromagnetic is directly related to the neural firing activities, it reflects the realtime dynamical process of the brain, which can be directly measured by Electroencephalogram (EEG) and Magnetoencephalography (MEG). Both EEG and MEG yield a much higher temporal resolution up to a few milliseconds than other brain imaging modalities such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and singlephoton emission computed tomography (SPECT) [1, 2, 3, 4]. However, one limitation of EEG/MEG is the low spatial resolution, as the corresponding measurements are acquired on the scalp with little information regarding neural activations inside the brain. Reconstructing a brain source signal from EEG/MEG measurements is known as EEG/MEG source localization or EEG/MEG source imaging (ESI) [5]. The ESI techniques have been used in several clinical and/or brain research applications such as the study of language mechanisms, cognition process and sensory function with a braincomputer interface [6], the localization of primary sensory cortex in evoked potentials for surgical candidates [7], and the localization of the irritative zone in focal epilepsy [8] [9].In general, the number of EEG/MEG sensors is much less than the number of brain sources and hence the ESI problem is highly illposed. In order to find a reasonable solution, it is necessary to impose certain neurophysiologically plausible assumptions as regularizations [5] [10]. One seminal work to enforce a unique solution is by imposing the
norm, referred to as minimum norm estimate (MNE)
[11], and its variants such as dynamic statistical parametric mapping (dSPM) method [12] and standardized lowresolution brain electromagnetic tomography (sLORETA) [13]. The norm based methods usually lead to an overdiffuse solution. To overcome this drawback, Uutela et al. [14] adopted the norm, known as minimum current estimate (MCE), for sparse source reconstruction in ESI.However, the aforementioned approaches can potentially lead to inconsistency in the temporal direction, as source activities are estimated independently at each time point. To ensure temporal smoothness, a number of methods have been proposed. Mixed norm estimate (MxNE) [15] uses an norm regularization (), which imposes on the time course direction and () in the source space to ensure temporal smoothness and spatial sparsity. Later, irMxNE [16] based on an iterative reweighted optimization scheme to solve the MxNE surrogate problem was proposed for source reconstruction solution and achieved better precision and stability. Gramfort et al. proposed a timefrequency mixednorm estimate (TFMxNE) algorithm that recovers the sparse coefficients for the timefrequency dictionary for a better estimation of nonstationary and transient source signals [17]. A spatiotemporal unifying tomography (STOUT) algorithm [18] combined TFMxNE in the timefrequency domain and sparse basis field expansions in the spatial space to get a temporalspatial smooth solution. Recently, a supervised method [19] was proposed to achieve spatial sparsity and temporal consistency within class using label information derived graph regularization and norm.
In this paper, we are particularly interested in seeking a precise solution for ESI problem with the presence of noise in both sensor (channel) and source spaces. Due to the ubiquitousness of noise signal, the reconstructed source solution usually contains spurious sources that explain the measured noise. To properly deal with noise, certain pattern, such as classification/clustering structure, could be reasonably assumed for guiding precise source reconstruction. For example, the source activation pattern can form several representative microstates if they are evoked by multiple exterior stimuli. In [19], the microstates are assumed to be known as classification labels. However, the classification labels are generally difficult to acquire accurately. Moreover, it might be infeasible to find a onetoone matching between the underlying source activation pattern with the correct label information when the underlying source signal follows a smoothing manifold structure with uncertain transition between brain states.
To overcome the above challenging issues, we propose a novel probabilistic model with a specially designed hierarchical prior that can effectively model both the unknown microstates and manifold structure for ESI. Specifically, a set of landmarks are introduced to characterize the brain source signal. These landmarks can be considered as the unknown microstates for characterizing the source activation patterns. Moreover, the unknown manifold pattern is further represented by a graph structure, where the vertexes are the landmarks. By modeling both the landmarks and the graph structure in a hierarchical prior, we can find effective estimates including reconstructed source signal, landmarks and the graph structure via the probabilistic hierarchical model.
The contributions of this paper can be summarized as follows: (1) We formulate a novel probabilistic model for ESI, in which the denoised microstates and the manifold structure of the source activation pattern are presented as a hierarchical prior. (2) An efficient optimization method is proposed to solve the proposed model. The convergence of our proposed algorithm is provably guaranteed. (3) Extensive experiments on both simulated data and real data show that our proposed method outperforms the benchmark methods and demonstrates better results in the case of high level noise.
The remaining of this paper is organized as follows. In Section 2, we describe a probabilistic hierarchical model for ESI with three building blocks. We then present an efficient algorithm with guaranteed convergence in Section 3. Both synthetic and real data are examined in Section 4 to demonstrate the performance of the proposed approach in comparison with the benchmark ones. Finally, conclusions are given in Section 5.
2 Probabilistic Model with Hierarchical Prior
We propose a novel probabilistic hierarchical model for the inverse problem of EEG/MEG data using priors. The model is specially designed based on three important ingredients: the sparsity of the source signal, the source signal denoising, and the relationship learning of the landmarks of source signals. In the following, we first present the probabilistic formulations of the three ingredients individually and then combine them to form the proposed hierarchical model.
2.1 Sparsity prior in brain source imaging
The EEG/MEG data measures the electromagnetic field at a set of sensors (or channels) for time points. We denote the EEG/MEG measurements as . The linear mapping from the brain source to the sensors on the scalp is often referred to as the lead field matrix obtained from the quasistatic approximation of Maxwell’s equations [20], denoted by , where is the number of distributed sources used to represent the discretized 3D head model. Specifically, each column of
represents the electrical propagation weight vector from a particular source location to the EEG/MEG electrodes. Each row of of
represents the weights of contribution from all the sources to one channel. Given and , the objective to find a source activation (source signal), denoted as , where each column corresponds to electrical potentials in source locations for one of the time points. The measured EEG/MEG data can be described as a linear function of sources with an additive noise,(1) 
With proper whitening [21], the noise term
can be assumed to follow the Gaussian distribution with mean zero and the covariance identity matrix
, i.e.,. Therefore, the probability of
given is expressed as(2) 
Generally in EEG/MEG, the number of sources is much larger than the number of electrodes, i.e., , which implies that (1) is an underdetermined system and results in an illposed problem of ESI. In order to properly recover from , prior knowledge is often needed to formulate as a regularization term. A popular choice is the penalty [22] that yields a sparse solution, which can be expressed by the Laplacian distribution. In particular, we assume each entry of is i.i.d. drawn from a Laplacian distribution with mean zero and a positive parameter , i.e.,
(3) 
To find source estimates with spatially sparse but smooth active regions, we introduce the spatial smoothness basis dictionary. The definition of spatial basis matrix can be found in [23]. By introducing the spatial basis function
(4) 
where , and is the th element in , which is defined as:
(5) 
where is the distance between sources and , is the spatial smoothness scalar controlling the smoothness level, and is a set of nearest neighboring sources of based on geodesic distance. Equation (4) can be written as with and . Without introducing confusion, we still use instead of and the final reconstructed signal will be left multiplied by to get the source signal .
2.2 Source signal denoising prior over latent landmarks
In the EEG/MEG data, noise is inevitable in both the sensor and source spaces. To effectively deal with various kinds of noise, we innovatively consider to denoise the source signals as a probabilistic density estimate problem. The denoised signals are then constrained by a graph structure, which will be detailed in Section 2.3.
We formulate the source signal denoising as hierarchical prior in a probabilistic framework. Intuitively, the source signal denosing becomes simple if we know the true distribution of the clean source signals. In addition to the sparse prior (3), we assume that the true distribution of clean source signals depends on a set of latent variables , which we call the landmarks for the ease of reference. Informally, the true distribution of the clean source signal
considered as a random variable is denoted by
. For any given noisy source signal , we can measure the difference between and the posterior distribution obtained by applying the Bayes rule on (1) and (3). Instead of directly calculating the difference of two distributions based on a known true distribution, we model the source signal denoising as a prior for ESI by further modeling the latent landmarks in a probabilistic hierarchical model.Given landmarks in
, we employ kernel density estimation (KDE) on
and use the estimate for the approximation to the true distribution. The basic idea of KDE involves smoothing each point by a kernel function and summing up all these functions together to obtain a final density estimation. A typical choice of the kernel function is a Gaussian function, defined as , where is the dimension of the input y. Applying KDE to estimate over landmarks leads to the following density function,(6) 
where is the bandwidth parameter of the Gaussian kernel function. According to the Gaussian kernel function, the landmark is the mean of denoised source signal , so it is reasonable for to inherit the sparsity property of . As a result, we consider the landmarks to follow the same Laplacian distribution as but with a different parameter , i.e.,
(7) 
The denoising property of this prior will be further discussed in Section 2.4.
2.3 Landmark prior via graph structure learning
Recall that the landmarks are introduced to model the true distribution of the source signals. Instead of assuming independence among these landmarks, we attempt to model the pairwise relationships between landmarks in terms of some constrained graphs.
We assume the landmarks have an underlying graph structure, denoted by , where each vertex of corresponds to one landmark. Denote as the th row vector of , which is often referred to as the feature vector. We assume that is i.i.d. drawn from the following distribution,
(8) 
where is a positive parameter. This form of distribution (8) enforces smoothness among features [24] in the sense that a larger results in a smaller distance of . Since each row of
is assumed to be i.i.d., we obtain the joint distribution
(9) 
In order to make plausible as a probability measure, is required to be nonnegative and symmetric, i.e., and , respectively.
In general, is unknown, so it is challenging to model the density function over all graphs that have as their vertices. To make the structure learning pragmatic, constraints on the set of graphs are required. To overcome this challenge, we directly model instead of modeling for learning a specific type of graphs from data such as the spanning trees [25]. Informally, given a connected undirected graph with an edge , let be a tree and be the edges forming a tree. In order to represent and learn a tree structure,
are formulated as binary variables where
if and otherwise, i.e., . To this end, we express the parametric formulation of as an indicator function of the set of trees given by(10) 
where [25]. The third constraint in states that the spanning tree only contains edges and the fourth constraint imposes the acyclicity and connectivity properties of a tree. Equation (10) states that any graph as a tree is uniformly sampled from , and otherwise the probability is zero.
2.4 The proposed probabilistic hierarchical model
Given the likelihood function (2) and priors (3), (6), (7), and (9), we are use the Bayes rule ready to formulate the joint conditional distribution , which is proportional to
(11) 
To obtain estimates , and
from data, we propose to employ the maximum a posteriori estimation by minimizing the negative logarithm of the conditional probability
, given by,(12) 
The ESI framework is illustrated in Fig.1. To better understand the proposed model (12), we explore its many properties as follows:
2.4.1 Graph structure learning via minimumcost spanning tree
Problem (12) with respect to , namely the graph structure learning, is equivalent to the minimumcost spanning tree (MST) [25]. By substituting (10) into (12), we have the following optimization problem with respect to :
(13) 
which is a problem of the minimumcost spanning tree with the cost for the edge defined as .
2.4.2 Source signal denoising via landmarks
Based on the kernel density estimation (6), we obtain the probability of any point that is not in the set of landmarks. In particular, we can introduce an assignment matrix , whose each entry represents the probability of assigning to . We can obtain the following useful results ^{1}^{1}1some proof is omitted in this version:
Proposition 1.
Let be a positive value and the feasible set . We have
(14) 
where two functions are defined as
(15)  
(16) 
Proposition 2.
Given and , minimizing with respect to has a closed form solution
(17) 
Proposition 3.
Given and , has the closed form solution
(18) 
Propositions 2 and 3 state that the source signal is automatically adjusted according to the weighting strategy and the latent landmarks . Suppose that is one noisy source signal in and the landmarks capture the high density region according to KDE. Proposition 2 tells us the weighting can be very small if is far away from because the distance between the source signal and the landmark can be very large. Proposition 3 implies that the source signal will converge to the high density region of the most relevant landmarks (high weights). The representation of the noisy source signal to a robust point inside the high density region of landmarks is preferred for denoising source signals since landmarks are the representative points of the true distribution of clean source signals.
3 The proposed optimization approach
According to Proposition 1 and letting , the estimates , and of (12) can be equivalently obtained by introducing and solving the following problem
(19) 
where is defined as
(20) 
We consider an alternating convex search (ACS) method [26] to solve the proposed model (19). The ACS algorithm iterates as follows,
(21) 
where is the iteration number. We omit the iteration number when the context is clear. Below we will describe how to solve each subproblem in details.
We rewrite the subproblem in (21) as
(22) 
where is the Cholesky factor of and . It is straightforward that is invertible. By letting , we see the subproblem is equivalent to solving independently strictly convex subproblem:
(23) 
The subproblem in (21) can be expressed as
(24) 
Denote by and , where is a vector of all ones. Note that is the graph Laplacian matrix defined on graph . Let . We obtain an equivalent form of (3) as follows,
(25) 
This is an (unconstrained) strictly convex problem, which admits a unique solution. To solve (25), we consider the Cholesky decomposition of the matrix , where . Since the Laplacian matrix is positive semidefinite and the diagonal matrix is positive definite () according to Proposition 2, the matrix is positive definite, which guarantees that is invertible. As a result, we have the solution given by:
(26) 
Both (23) and (26) are regularized quadratic programming problem, which are strictly convex and hence there exists a unique solution for each subproblem. Furthermore, it can be efficiently solved by many well developed methods such as Homotopy [27], ADMM LASSO [28] and FISTA [29]. In this paper, we adopt the Homotopy solver for the above two problems.
The subproblem in (21) boils down to an MST problem as shown in problem (13). It can be solved efficiently by the Kruskal’s algorithm [30]. In addition, for the subproblem in (21), there is a closedform solution given by (17).
In summary, the overall ACS algorithm for solving the proposed model (19) is given in Algorithm 1 with the following initializations of variables: is solved by FISTA [29] for the inverse problem, is obtained by applying the means method [31] on the initialized , is the MST using the Kruskal’s algorithm and is updated using (17). The convergence of this algorithm is established in Theorem 1.
Theorem 1.
Suppose is the optimal solution of problem (19) in the th iteration with each subproblem solved exactly. Let be the corresponding objective function value . We have

the function is coercive ;

, i.e., objective function value is monotonically decreasing ;

the sequence converges as ;

the sequence has a convergent subsequence.
4 Numerical Experiments
We test the proposed algorithm on both simulated data and real EEG/MEG data to illustrate its effectiveness. The simulation study is based on a real head model. Although we know noise exists in both electrodes and brain source space, to what extent different levels of noise impacts on the inverse solvers are unclear or less studied so far. We consider various levels of signaltonoise ratio (SNR) in both sensor and source spaces to validate the proposed algorithm in comparison with some popular algorithms in the ESI literature [11][14][13][15]. The SNR is defined by , where is the power of signal and is the power of noise signal.
4.1 Competing algorithms and error metrics
We compare our method with four benchmark algorithms, namely MNE [11], MCE [14], sLORETA [13], and MxNE [15]. For the MCE solver, we employ the Homotopy algorithm due to its superior performance observed in our previous work on ESI [19]. In addition, we find that incorporating patial smoothness with the approach in MCE improves the reconstruction results, so we adopt this improved version of MCE in experiments. As for MxNE [15], we impose an norm on the source space and an across time, which is one form of MxNE, thus leading to regularization with an assumption of fixed source orientations and spatial smoothness. In this paper, we use instead of MxNE for a precise description. We find that Algorithm 1 is insensitive to parameters, so in our experiments, we fix , , , and tune and . We set the sparsity penalty parameter for MCE to be and is set to be respectively.
We quantitatively evaluate the performance of each competing algorithm based on the following metrics:

Data fitting (DF). We define
where is th column in the EEG data , is the mean of along the time axis, and is the fitted value defined as for any reconstructed source signal at time . The DF metric is defined as .

Reconstructed error (RE) in source location is defined as . However, RE cannot completely characterize the reconstruction performance due to the ESI’s inherent difficulty of having a highly correlated lead field matrix. In particular, one often observes that a reconstructed source is located in the neighborhood of the true location, thus leading to a very large RE.

Localization error (LE) measures the geodesic distance between two source locations on the cortex meshes using the Dijkstra shortest path algorithm. As two hemispheres of a brain are disconnected, we calculate LE separately for each hemisphere.
For the four metrics, better reconstruction results are expected if DF, AUC are closer to 1 and RE, LE are closer to 0.
4.2 Simulation study based on a real head model
As in many brain imaging problems, we do not know the underlying ground truth. As a result, we rely on synthetic data by numerical simulations to characterize the performance of various algorithms. In this paper, we consider a real head model with simulated brain source signals. The head model was reconstructed from T1MRI images of a male subject scanned at 26 years old in Massachusetts General Hospital. We used a 128channel BioSemi EEG cap layout and coregistered it with the subject’s head surfaces (Electrodes layout and coregistration can be found in Appendix Fig.6.1). Brain structure segmentation and cortical surface reconstruction were conducted using FreeSurfer. Coregistration of the head surface and EEG electrodes were conducted using Brainstorm [35] and then verified using the GUI of coregistration function in MNEPython [36]. The conductivity of brain, skull, and scalp were set to 0.3 S/m, 0.006 S/m and 0.3 S/m, respectively. The source space contained 1026 sources in each hemisphere, with 2052 sources in total.
In the simulation, we randomly selected two activation locations, one from each hemisphere so that LE could be calculated without confusion. We then generated two active source signals via a 5thorder autoregressive (AR) model at these locations (see Fig.6.2). In the simulated experiments, we considered three states with different activation locations, with the sampling frequency set to 100 Hz, and the time window set to 2 s. A detailed description of the simulated signals can be found in Ref.[37]
. Each of the three states corresponds to a multichannel time series of length 200. As a result, we obtain a time series with a length of 600 in total. We consider additive white noise with various SNR levels in both channel and source spaces, denoted as
and , respectively. We examine three noise levels in each space: dB and dB, and 10 dB. The impact of noises on the noiseless signal can be found in Fig.6.3 and Fig.6.4.The results of the proposed algorithm in comparison to the benchmark ones are summarized in Tables 13 according to different sensor noise levels. Each reported value is the mean result from 10 repeated experiments. The results show that the proposed approach outperforms the benchmark algorithms in most cases, particularly excelling in terms of LE and AUC. Specifically in Table 1 for a higher value (less noise in the channel space), all the algorithms perform very well in terms of LE. The proposed algorithm is marginally better than sLORETA (2.92 mm versus 3.64 mm). Although MCE has the best performance on DF, it has the largest LE compared to other algorithms. This phenomenon suggests that small DF does not necessarily lead to good performance in terms of RE, LE and AUC. In Table 2, we notice that sLERETA, and our algorithm are robust to noise, while MCE and MNE have a pronounced deterioration in performance, when is decreased from 30 dB to 20 dB. Table 3 shows significant improvements of the proposed approach over the stateoftheart in terms of LE and AUC. The mean LE of our algorithm is between 13 mm and 15 mm, compared favorably with 3037 mm for sLERETA, 5868 mm for , and 4565 mm for MNE and MCE. The AUC of the proposed algorithm is about 0.9, while the AUC values of other algorithms are all below 0.8. Finally, we illustrate the performance of LE, RE and AUC using boxplots in Fig. 2 for two cases when = 20 dB, = 10 dB (top panel) and = 10 dB,
= 10 dB (bottom panel). The boxplots demonstrate that the proposed algorithm achieves the best results with the least variance among different random realizations.
Methods  = inf dB  = 30 dB  = 10 dB  

DF  RE  LE  AUC  DF  RE  LE  AUC  DF  RE  LE  AUC  
MNE  0.952  0.62  8.72  0.927  0.943  0.67  9.70  0.920  0.901  0.70  16.13  0.887 
MCE  0.983  0.21  12.86  0.902  0.982  0.21  12.82  0.910  0.833  4.46  17.90  0.720 
sLORETA  0.418  1.44  3.64  0.936  0.401  1.40  4.39  0.922  0.396  1.66  8.26  0.881 
0.972  0.33  5.77  0.965  0.972  0.30  5.52  0.960  0.928  0.44  10.88  0.905  
proposed  0.969  0.15  2.92  0.981  0.968  0.16  3.34  0.980  0.903  0.30  8.25  0.911 
Methods  = inf dB  = 30 dB  = 10 dB  

DF  RE  LE  AUC  DF  RE  LE  AUC  DF  RE  LE  AUC  
MNE  0.901  0.85  16.45  0.854  0.902  0.85  18.28  0.862  0.901  0.88  22.59  0.847 
MCE  0.945  0.71  29.23  0.826  0.945  0.77  32.46  0.829  0.943  0.75  33.93  0.817 
sLORETA  0.395  1.55  8.71  0.891  0.406  1.67  10.01  0.896  0.396  1.88  10.24  0.876 
0.926  0.41  9.75  0.913  0.927  0.42  10.49  0.924  0.928  0.50  15.37  0.885  
proposed  0.908  0.31  5.14  0.958  0.909  0.326  4.95  0.966  0.903  0.32  7.25  0.927 
Methods  = inf dB  = 30 dB  = 10 dB  

DF  RE  LE  AUC  DF  RE  LE  AUC  DF  RE  LE  AUC  
MNE  0.741  2.80  50.87  0.753  0.736  2.93  46.35  0.751  0.738  3.12  52.95  0.745 
MCE  0.833  4.46  64.97  0.720  0.833  4.58  60.63  0.733  0.831  4.55  63.98  0.721 
sLORETA  0.416  2.49  33.69  0.795  0.395  2.63  31.49  0.785  0.401  2.92  36.22  0.789 
0.825  2.89  62.37  0.766  0.825  2.91  58.83  0.771  0.828  3.43  67.82  0.754  
proposed  0.736  0.53  14.64  0.908  0.736  0.52  13.65  0.898  0.734  0.52  15.28  0.896 
We illustrate the results of source reconstruction under different noise levels and locations in Fig. 3. The top row of Fig. 3 is for = 20 dB/ = 30 dB with true activation in the rostral middle frontal area. In this case of high SNR, all the methods achieve satisfactory results, except for overdiffused solutions from the MNE and sLORETA reconstructions. We then increase noise to = 10 dB/ = 30 dB and generate the ground true source activation in caudal middle frontal area. The reconstruction results are given in the middle row of Fig. 3, which shows that the activation patterns from the proposed algorithm is the closest to the groundtruth. MCE and can correctly identify some locations, but with a lot of spurious activations on the cortex surface. On the other hand, MNE and sLORETA give the overdiffused solution, but the locations with largest magnitude on both hemisphere align well with the true activation locations. The bottom row in Fig. 3 is the case of larger noise, namely = 10 dB and = 10 dB, where the proposed algorithm yields a sparse and accurate reconstruction.
In summary, numerical results on synthetic EEG data show that the proposed approach is comparable to the benchmark algorithms at lower noise levels. When the noise level is high, we demonstrate significant improvements over the benchmark algorithms. The observation confirms that source signal denoising and graph structure play an important role in identifying activation patterns under a high level of noise in both sensor and source spaces.
4.3 Real data experiment
We conduct experiments on a real dataset that is publicly accessible through the MNEPython package [36]
. The EEG/MEG data is collected when the subjects are given auditory and visual stimuli. In the experiment, the subject is presented with checkerboard patterns to the left or right eye, and interspersed by high frequency noise to the left or the right ear. The recording device is a wholehead Elekta Neuromag Vector View 306 MEG system with 102 triplesensor elements (two orthogonal planar gradiometers and one magnetometer per location). The EEG data from a 60channel cap are also recorded simultaneously. Standard preprocessing steps including bandpass filter, bad epoch rejection, and bad channel selections were conducted before applying ESI algorithms
[17]. There are 7498 sources distributed over the cortical surfaces.The epochs under study are from left auditory stimuli (LAS) and left visual stimuli (LVS). There are 66 epoches for LAS and 73 for LVS. The time used for all the epoches of LVS and LAS is from 0.03 s to 0.16 s after the stimuli events. The average time course aligned with stimuli events for EEG, Gradiometers and Gagnetometers are illustrated in Fig. 6.6 for LAS and Fig.6.8 for the LVS in Appendix. By checking the time series plots in Fig. 6.6, the event related potential (ERP) activation pattern for LAS is very clear from time 0.08 s to 0.12 s and the related topomaps are illustrated on rightmost plot in Fig.4 for the time point 0.08 s, 0.10 s and 0.12 s. Following the same parameter setting as the simulation study, we perform the source localization on an averaged epoch in a moving window with 15 epoches (overlap=5) for LAS and LVS and the final activation patterns on the cortex for LAS are illustrated in Fig. 4 presents the source reconstruction results for s (top), s (middle), and s (bottom), respectively. At time s, all the algorithms show that the right hemisphere has stronger activated sources than the left hemisphere, which is consistent with the topomaps in Fig. 4 where the right hemisphere has a stronger electricity potential than the left hemisphere. At time s the left hemisphere demonstrates a stronger activation. At time s, we observe that the activation in the left hemisphere is stronger than the right hemisphere from both reconstructed source space and the topomaps in sensor space. The other competing methods suffer from overdiffused or spurious activations in their reconstruction, similar to our observation for the synthetic data.
To visualize and validate the existence of spanning tree in real data, we project the source solutions of all the averaged epoches from different moving windows for LAS and LVS on the first 3 principal components in Fig. 5. We can see the intersection of scattered points from LAS and LVS denotes the patterns before the actual ERP activations and then the activations under different stimuli condition progress on different branches of the spanning tree.
Compared to all the benchmark algorithms, our approach provides a sparse source reconstruction with less spuriously activated sources incurred from the noise and activation patterns and the activations are more consistent across time.
5 Conclusions
In this paper, we presented a novel probabilistic ESI model. Particularly important is the introduction of a hierarchical graph prior that blends the flexibility of maintaining consistency across time as well as allowing the existence of distinct representative landmark activation patterns in the studied period. An efficient algorithm based on alternating convex search was proposed with provable convergence. We conducted extensive numerical experiments, including both the synthetic data and real data, and demonstrated that the proposed algorithm can robustly localize the activated sources with satisfactory precision to handle large noise in both channel and source spaces, where the traditional algorithms often fail to do so. Specifically for synthetic experiments, our algorithm outperforms all the benchmark algorithms and yields significant improvements when the noise level in channel space and source space is high. This study also filled an gap in the literature to investigate the impact of noise separately from channel and source spaces. In the example of real data, our algorithm rendered more consistent reconstructions, while allowing variations across the time axis. Compared with other algorithms, the source signal reconstructed by ours is less contaminated from spurious noise originated from brain spontaneous source activations or channel noise.
References
 [1] B. He, A. Sohrabpour, E. Brown, and Z. Liu. Electrophysiological source imaging: A noninvasive window to brain dynamics. Annual review of biomedical engineering, 20:171–196, 2018.
 [2] B. Babadi, G. ObregonHenao, C. Lamus, M. S. Hämäläinen, E. N. Brown, and P. L. Purdon. A subspace pursuitbased iterative greedy hierarchical solution to the neuromagnetic inverse problem. NeuroImage, 87:427–443, 2014.
 [3] A. Custo, S. Vulliemoz, F. Grouiller, D. Van De Ville, and C. Michel. EEG source imaging of brain states using spatiotemporal regression. Neuroimage, 96:106–116, 2014.
 [4] C. Li, D. Jacobs, T. Hilton, M. del Campo, Y. Chinvarun, P. L. Carlen, and B. L. Bardakjian. Epileptogenic source imaging using crossfrequency coupled signals from scalp EEG. IEEE Transactions on Biomedical Engineering, 63(12):2607–2618, 2016.
 [5] C. M. Michel, M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. G. de Peralta. EEG source imaging. Clinical neurophysiology, 115(10):2195–2222, 2004.
 [6] B. J. Edelman, B. Baxter, and B. He. EEG source imaging enhances the decoding of complex righthand motor imagery tasks. IEEE Transactions on Biomedical Engineering, 63(1):4–14, 2016.
 [7] R. B. Willemse, A. Hillebrand, H. E. Ronner, W. P. Vandertop, and C. J. Stam. Magnetoencephalographic study of hand and foot sensorimotor organization in 325 consecutive patients evaluated for tumor or epilepsy surgery. NeuroImage: Clinical, 10:46–53, 2016.
 [8] P. Mégevand, L. Spinelli, M. Genetti, V. Brodbeck, S. Momjian, K. Schaller, C. M. Michel, S. Vulliemoz, and M. Seeck. Electric source imaging of interictal activity accurately localises the seizure onset zone. J Neurol Neurosurg Psychiatry, 85(1):38–43, 2014.
 [9] B. Erem, D. E. Hyde, J. M. Peters, F. H. Duffy, and S. K. Warfield. Dynamic electrical source imaging (desi) of seizures and interictal epileptic discharges without ensemble averaging. IEEE transactions on medical imaging, 36(1):98–110, 2017.
 [10] R. Grech, T. Cassar, J. Muscat, K. P. Camilleri, S. G. Fabri, M. Zervakis, P. Xanthopoulos, V. Sakkalis, and B. Vanrumste. Review on solving the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilitation, 5(1):1, 2008.
 [11] M. S. Hämäläinen and R. J. Ilmoniemi. Interpreting magnetic fields of the brain: minimum norm estimates. Medical biological engineering and computing, 32(1):35–42, 1994.
 [12] A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau, J. D. Lewine, and E. Halgren. Dynamic statistical parametric mapping: combining fmri and meg for highresolution imaging of cortical activity. Neuron, 26(1):55–67, 2000.
 [13] R. D. PascualMarqui. Standardized lowresolution brain electromagnetic tomography (sloreta): technical details. Methods Find Exp Clin Pharmacol, 24(Suppl D):5–12, 2002.
 [14] K. Uutela, M. Hämäläinen, and E. Somersalo. Visualization of magnetoencephalographic data using minimum current estimates. NeuroImage, 10(2):173–180, 1999.
 [15] A. Gramfort, M. Kowalski, and M. Hämäläinen. Mixednorm estimates for the M/EEG inverse problem using accelerated gradient methods. Physics in medicine and biology, 57(7):1937, 2012.
 [16] D. Strohmeier, Y. Bekhti, J. Haueisen, and A. Gramfort. The iterative reweighted mixednorm estimate for spatiotemporal MEG/EEG source reconstruction. IEEE transactions on medical imaging, 35(10):2218–2228, 2016.
 [17] A. Gramfort, D. Strohmeier, J. Haueisen, M. S. Hämäläinen, and M. Kowalski. Timefrequency mixednorm estimates: Sparse M/EEG imaging with nonstationary source activations. NeuroImage, 70:410–422, 2013.
 [18] S. CastañoCandamil, J. Höhne, JD MartínezVargas, XW An, G. CastellanosDomínguez, and S. Haufe. Solving the EEG inverse problem based on space–time–frequency structured sparsity constraints. NeuroImage, 118:598–612, 2015.
 [19] F. Liu, J. Rosenberger, Y. Lou, R. Hosseini, J. Su, and S. Wang. Graph regularized EEG source imaging with inclass consistency and outclass discrimination. IEEE Transactions on Big Data, 3(4):378–391, 2017.
 [20] C. H. Wolters, L. Grasedyck, and W. Hackbusch. Efficient computation of lead field bases and influence matrix for the FEMbased EEG and MEG inverse problem. Inverse problems, 20(4):1099, 2004.
 [21] Denis A Engemann and Alexandre Gramfort. Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals. NeuroImage, 108:328–342, 2015.

[22]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro.
Online dictionary learning for sparse coding.
In
Proceedings of the 26th annual international conference on machine learning
, pages 689–696. ACM, 2009.  [23] S. Haufe, R. Tomioka, T. Dickhaus, C. Sannelli, B. Blankertz, G. Nolte, and KR Müller. Largescale eeg/meg source localization with spatial flexibility. NeuroImage, 54(2):851–859, 2011.
 [24] X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semisupervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML03), pages 912–919, 2003.

[25]
M. Cheung.
Minimumcost spanning trees.
http://people.orie.
cornell.edu/dpw/orie6300/fall2008/Recitations/rec09.pdf, 2008.  [26] J. Gorski, F. Pfeuffer, and K. Klamroth. Biconvex sets and optimization with biconvex functions: a survey and extensions. Mathematical methods of operations research, 66(3):373–407, 2007.
 [27] M. S. Asif and J. Romberg. Sparse recovery of streaming signals using l1homotopy. IEEE Transactions on Signal Processing, 62(16):4209–4223, 2014.
 [28] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
 [29] A. Beck and M. Teboulle. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
 [30] J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48–50, 1956.
 [31] Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
 [32] A. Molins, S. M. Stufflebeam, E. N. Brown, and M. S. Hämäläinen. Quantification of the benefit from integrating MEG and EEG data in minimum l2norm estimation. Neuroimage, 42(3):1069–1077, 2008.
 [33] Min Zhu, Wenbo Zhang, Deanna L Dickens, and Lei Ding. Reconstructing spatially extended brain sources via enforcing multiple transform sparseness. NeuroImage, 86:280–293, 2014.
 [34] A. Sohrabpour, Y. Lu, G. Worrell, and B. He. Imaging brain source extent from EEG/MEG by means of an iteratively reweighted edge sparsity minimization (IRES) strategy. NeuroImage, 142:27–42, 2016.
 [35] F. Tadel, S. Baillet, J. C. Mosher, D. Pantazis, and R. M. Leahy. Brainstorm: a userfriendly application for MEG/EEG analysis. Computational intelligence and neuroscience, 2011:8, 2011.
 [36] A. Gramfort, M. Luessi, E. Larson, D. A. Engemann, D. Strohmeier, C. Brodbeck, L. Parkkonen, and M. S. Hämäläinen. MNE software for processing MEG and EEG data. Neuroimage, 86:446–460, 2014.
 [37] S. Haufe and A. Ewald. A simulation framework for benchmarking EEGbased brain connectivity estimation methodologies. Brain topography, pages 1–18, 2016.