1 Introduction
1.1 Motivation
The motivation for a new classification method comes from MRI study of early normal brain development (Almli et al., 2007, Evans, 2006) designed to create a database of healthy controls comprised of T1weigthed (T1w), T2weighted (T2w) and protondensity weighted (pdw) images.
MR dataset of early brain development (from birth to 4.5 years of age) poses a challenge to using automatic classification procedures such as FAST (Zhang2001), ANIMAl+INSECT (Collins et al., 1999) and SPM (AshFr2005). These methods rely on young adult brain atlases that do not adequately reflect dynamic changes in brain anatomy through early childhood (Almli et al., 2007, Fonov et al., 2011).
Furthermore, there is a number of factors inherent to early developing brain MR images that hamper automated classification process. They include high tissue intensity variation, low contrasttonoise ratio between tissue types and partial volume artifact meaning that a brain voxel posesses a signal average of two or three different tissue classes.
However, the development of segmentation techniques in newborn and infant brain MRI over the past decade has shown that atlasbased segmentation methodology for adult brain MR images can be applicable to child brain MRI when tuned to the specific properties of the dataset under study. There are three major child MRI segmentation frameworks, ExpectationMaximization (EM)
(Van Leemput et al., 1999), Registrationbased method (Collins et al., 1999) and Adaptive Label Fusion (Weisenfeld and Warfield, 2009).A barrier to implementation of these methods is the dependency on a pediatric brain atlas with accurate measures of spatial boundary uncertainty for tissue classes that the early childhood dataset does not possess. Up to date, the creation of such standard atlas that would cover the entire developmental epoch still remains a significant challenge since it requires semiautomated segmentation of large datasets.
Another methodological issue with EMbased and Adaptive Label Fusion methods in applications to infant brain MRI is using global intensity statistics that requires for each tissue type to produce similar intensities in different parts of the image.
1.2 Handling intraclass variability
It is important to find a strategy to cope with a highly variable signal intensity since the same tissue type having different intensities in different parts of the brain is prone to misclassification. An effective remedy to this problem was offered by Pappas (Pappas, 1992)
. The author devised an adaptive clustering algorithm that estimates the mean intensity function (representing the pure spacedependent signal) and the corresponding posterior probabilities of the classes for all pixels in a 2D image with a varying window size. In this way, adaptability of the mean intensity function from the initially global estimates of the class means to the local characteristics of the image was achieved. However, the implementation of this adaptive clustering method to a 3D image is computationally expensive due to the slow process of successive updates of mean class intensities for each pixel within the window making the extension of this technique to 3D MRI segmentation unfeasible.
To tackle high intraclass intensity variability im EM framework, Prastawa et al. (Prastawa et al., 2005) used a technique developed by (Van Leemput et al., 1999)
incorporating bias field correction step into EM algorithm. The segmentation results were refined by pruning the training samples used to estimate the probability distributions of classes for each voxel by Parzen windowing
(Wells et al., 1996). The successful EM classification of child brain MRI is achieved at a high computational cost associated with overcoming restrictions of the global Gaussian mixture modelling.
(Tohka et al., 2004) included of the partial volume effect (PVE) model (Choi et al., 1991) in EM framework that estimated mixing tissue proportions within each voxel and improved detection GM, WM and the CSF in small structures and areas between sulcal CSF and the surrounding GM. The PVE algorithm has been successfully applied to single or multichannel adult brain scans. In this work, we investigate the performance of PVE technique in segmentation of young child’s brain MRI.
Another approach is local segmentation by brain splitting introduced by (Xue et al., 2007)
. The brain was subdivided into regions with different statistical properties of WM samples using kmeans clustering followed by a Voronoi tesselation. The EM segmentation performed on each Voronoi region prevented overestimation of cortical GM. However, the number of subvolumes in Xue’s brain splitting algorithm is arbitrary.
The label fusion technique handles tissue intensity variability by means of constructing a probabilistic atlas from a small cohort of newborn brain MRI and incorporating it into the fused segmentation model as a spatial prior (Warfield et al., 2000). This introduces dependency on a brain atlas that our MR dataset of early brain development does not possess.
1.3 The proposed framework
We propose a novel local atlasindependent framework inspired by modern trends in Computer Vision such as Kernel Fisher Discriminant Analysis (KFDA) for pattern recognition and structural similarity index (SSIM;
(Wang et al., 2004)) for perceptual image quality evaluation. KFDA is related to kernelbased classifiers such as Support Vector Machines (SVM) introduced by
(Vapnik, 1998). KFDA has shown competitive performance compared to other stateoftheart classifiers such as SVM and AdaBoost (Mika et al., 1999, Rätsch et al., 2001).In the proposed framework, the KFDA discriminant criterion is modified by including a regularization term that penalizes intensity differences in the neighbourhood of a brain voxel. A regularized solution in the abstract highdimensional feature space yields connected, and therefore more meaningful, spatial tissue patterns.
We build on the global version of the multichannel KFDAbased classification algorithm introduced in (Portman and Evans, 2013) and further refine the algorithm to improve its performance. Namely, we propose a local approach that handles withinclass intensity variations by optimal partitioning of the brain into overlapping subdomains having different average intensities.
We aim to classify agespecific pediatric templates into two major tissue classes (GM, WM) and the CSF. The classified representative templates can then be used for the classification of a subject brain MRI of a developmental age similar to the age range of the template.
In the absence of a “ground truth”, we assess the quality of classification via SSIM that predicts image quality as perceived by the Human Visual System (HVS) (Wang and Bovik, 2009) which is regarded as an optimal information extractor that seeks to identify objects in the image. As such, the HVS must automatically identify structural distortions and compensate for the nonstructural distortions (e.g., image corruption by noise). The SSIM is an objective image quality metric that simulates this functionality and computes the degree of structural similarity between reference and distorted images. It has been shown that the SSIM is wellmatched to the subject ratings of image databases, and therefore, it is a good approximation to the perceived image quality (Wang et al., 2003). In this work, the SSIM finds a new role as
Given the objectivity and effectiveness of the SSIM we apply it for comparison of Partial Volume Estimation (Tohka et al., 2004) and KFDAbased classification algorithms, as well as for automatic monitoring of the quality of classification. That is, we compute the structural closeness of classified 3D brain images containing GM, WM and CSF mean intensity values with their counterparts seen in an MR image type of a highest contrast and regarded as references. In a test example of a brain template for ages 8 to 11 months shown in Figure LABEL:fig1 we rely on T1w as the most informative of all MR image types.
1.4 Paper organization
In the following we will describe the algorithm step by step, namely, optimal brain partitioning, modified KFDAbased classification guided by SSIM, and image stitching. A brief background on KFDA is provided in A. Finally, we will report on classification results for brain template for ages 8 to 11 months, compare the performance of PVE and KFDA methods and discuss possible improvements.
2 Method
2.1 Optimal brain partitioning
We start with a mathematical formalization of the brain splitting problem. Let
be a discrete random variable (r.v.) representing the bins
of the histogram containing interior brain voxels with marginal distribution , be the voxeltovoxel image partition containingvoxels with uniform distribution
and be a random clustering over into clusters containing voxels at the partitioning step .For each of subvolumes at the partitioning step (referred to as , for simplicity) we aim to find the partition that maximizes defined by
(2.1) 
In equation (2.1), is the marginal probability distribution of the histogram bins, is a joint probability distribution of r.v. and and is the conditional probability of the r.v. taking the value given that the r.v. is the jth cluster .
(2.2)  
(2.3)  
(2.4) 
Here, is the number of shared voxels between the cluster and the histogram bin , and is the number of voxels in . Substituting 2.3 and 2.4 for and into equation (2.1) we obtain
(2.5) 
The can be rewritten in terms of entropy functions as follows
(2.6) 
When is a single intensity voxel and takes values from a set then there is no uncertainty that this voxel belongs to one of the bins. So, is equal to 0 implying that
(2.7) 
Brain partitioning algorithm.

Given brain volumes (or nodes of a binary tree at the level ) initially partition into subvolumes by cutting each of the volumes into and that contain three subsequent slices and the rest of the brain slices, respectively, with sagittal, coronal and axial planes. For each direction (sagittal, coronal, axial) compute 2.5 between the histogram bins and the initial clusters.

For each subvolume and each direction, create a set of new clusters by moving the cutting plane one slice at a time (until only three slices remain in ). Find the clustering that maximizes over a set of all possible clusters in all directions.

Given the total number of brain subvolumes that possess from the top level down to the current level of the binary tree compute the total acquired in the partitioning process. is the weighted sum of the associated with the subdomains .
(2.8) The weights are relative volumes of the image subdomains, where is the number of voxels comprising the entire 3D brain volume.

Compute signaltonoise ratio (SNR) averaged over newly obtained
brain subvolumes to control growth of the MIR. Construct the SNR curve with respect to the number of subvolumes and normalize it to the range of MIR values. 
Move down to the level of the binary tree and repeat the procedure from step 1 until the following stopping criterion is satisfied.
Stopping criterion: Find the abscissa of the intercept between the MIR and SNR curves that provides an optimal number of subvolumes.
Note: With the growing number of partitioned brain subvolumes the acquisition of information increases. Therefore, the is an increasing function with respect to the number of subvolumes.
As the subdomains decrease in size in the process of brain splitting, the SNR decreases. The subvolume sizes should allow samples of brain voxels large enough to distinguish between the constituent tissue classes. In order to control the brain subvolume sizes and deduce an optimal number of subvolumes we compute the SNR.
In this way, the balance between the ability of the proposed method to classify image subdomains and the SNR is established.
Algorithm implementation.
Figure 2.1 shows the graphs of the SNR and MIR curves as functions of the number of subdomains in the T1w template for ages 44 to 60 months. The abscissa of the intercept yields the optimal number of 22 subdomains for this template. This number falls between 16 and 32 subdomains of the levels 4 and 5 of the binary tree. To determine these 22 brain subvolumes we followed a simple decision rule. We sorted the 32 subdomains in descending order according to their size and selected the first 22 subdomains with a larger size and therefore a higher SNR.
To maintain the continuity of the classified image subdomains across their boundaries, we added two slices to each planar boundary of each subdomain thus creating an overlap of 4 slices between adjacent brain subdomains.
Since the partitioning algorithm is intensitybased, the brain regions and their total number vary for different brain scans. Partitioning of brain MRI with a higher intensity variation yields a larger number of brain regions as demonstrated in an example below. The T1w infant brain template for ages 8 to 11 months was subdivided into 40 subvolumes (Figure 2.2.a).
Local assessment of the contrasttonoise ratio (CNR) between the CSF and G+WM showed that CNR significantly varies across brain subdomains as seen in Figure 2.2.b. It tends to be lower near the base of the brain and higher in the central part of the brain.
In a similar fashion, we can compute GM/WM CNR in image subdomains of the brain template. Local dependency of CNR graphs provides a better understanding of natural MR signal intensity variation throughout an individual brain shedding light on subvolumes with poor CSF and/or WM signal detection. Since the proposed method is kernelbased the kernel parameters (e.g., bandwidth of the Gaussian kernel) can be tuned over a certain range of values in accordance with CNR of brain subvolumes to ensure successful segmentation.
2.2 Modified KFDA optimality criterion
We reap the benefits of the KFDA approach to brain tissue classification of infant brain MRI. To mention a few, KFDA allows a simple closed form solution, improves classification by taking into account all MR signal observations (samples) in the input space, and automatically identifies PV voxels that lie near the boundary between the tissue classes. The competitiveness of the KFDA method compared to other stateoftheart classifiers (Mika et al., 1999) has motivated our exploration of KFDA to find high accuracy segmentation solutions in child brain MRI.
We implement a binary formulation of KFDA in the feature space provided in A. The classical kernel Fisher discriminant given by (A.10) does not assume spatial correlations between neighbouring intensity voxels in .To increase robustness to misclassification caused by the presence of noise in MR images we introduce a spatial regularization term that penalizes local kernelprojected intensity differences in the feature space . Since the graph that defines the local relationships between the brain voxels is a 3D grid with each interior node having 26 neighbours we define the neighbourhood matrix as follows
Then
(2.10) 
where is a set of edges .
Let be the kernel projection of the input data onto the optimal direction in . can be rewritten as due to the expansion of in spanned by the mapped training samples .
We modified the KFDA optimality criterion by adding the penalty term of the form , where is the kernel matrix of size
(2.11) 
In this setup, the penalty function forces misclassified voxels closer to another class cluster centroid.
The problem (2.11) can be reformulated as
(2.12) 
The constrained optimization problem (2.12) can be solved algebraically using the method of Lagrange multipliers. We constructed the Lagrangian function
(2.13) 
and by computing the gradient of the Lagrangian with respect to we obtained an eigenvalue problem
(2.14) 
Then the solution to (2.11) is a leading eigenvector of .
2.3 Objective assessment of classification quality via SSIM
A modified version of the KFDA criterion given by (2.11) depends on a regularization coefficient that influences the quality of classification. In order to find the “best” value of we constructed a sequence of values and computed the Structural Similarity Index (SSIM) (Wang et al., 2002, 2004) between the classified and structural MR input data for each value in the sequence. In this application, the SSIM quantifies the degree of structural similarity between spatial tissue patterns seen in ideal (reference) and distorted (classified) images.
We implemented the SSIM to evaluate the performance of our classification algorithm in the absence of ground truth. We evaluated how well GM, WM and CSF boundaries are captured by our algorithm versus the boundaries visually extracted from input T1w images.
For each partitioned brain subvolume we solved a 2class separation problem and created subvolumes with the mean T1w intensity values for the two tissue types (G+WM and CSF, and GW and WM). These are the classified dependent image subdomains to be compared with T1w reference. We computed the SSIM between each classified and T1w brain slices and then found the mean SSIM taken over all slices of the brain subvolume. Thus, we obtained mean Structural Similarity Indices (MSSIM) and chose the value corresponding to the largest MSSIM. In this way, we were able to automatically control the quality of classification.
Given below is a formula for the mean SSIM
where is the total number of interior brain voxels, and are local image patches^{2}^{2}2a sliding window that moves across the entire brain slice pixel by pixel. For the MSSIM the background patches have been excluded. and , , are the luminance, contrast and structure comparison measures defined by
Here, (), () and
represent the local mean, standard deviation and crosscorrelation estimates of the local image patches
and , respectively, and are small constants that have been derived from the properties of the visual system (Wang et al., 2002).Figure 2.3 illustrates the dependency of the CSF pattern identified by KFDA on the value of the regularization parameter . The value of reveals most of the connected CSF tissue surrounding sulci, thus making the sulcus contours visible. The computed MSSIMs between the classified subdomains and the corresponding T1w template for values from the sequence show that the CSF structure corresponding to is most similar to its counterpart in T1w reference. Therefore, the regularization term builds more of the connected CSF component by forcing initially identified G+WM voxels into the CSF class if predominant neighbouring intensities are the CSF samples.
2.4 KFDA implementation
To proceed with the KFDA implementation, we solve an eigenvalue problem (2.14) in a highdimensional feature space whose dimension is defined by the number of brain voxels in the image subdomain . Given a vectorvalued image function with labels for the two tissue classes (CSF and G+WM, or GM and WM) as an input
where are voxel coordinates of the interior brain subvolume, the input data are implicitly and nonlinearly transformed to
Next, having chosen a kernel function and computed a leading eigenvector of the matrix given in equation (2.14) we find an optimal projection of the mapped data in
We performed KFDA in two steps.
Step1: Classification into the CSF and G+WM.
Through extensive experimentation we determined that the sigmoidal kernel function (with and being userspecified parameters) is the best choice for delineation of the CSF. Figure 2.4 shows a test example of a template brain subvolume for ages 8 to 11 months that consists of 7 axial slices with initial CSF and G+WM labels obtained using global PVE. The values of the image vectorfunction are plotted in the 3D intensity space shown in Figure 2.4.b. Each point in the 3D input space carries specific stereotaxic coordinates of a brain voxel. The initial CSF and G+WM clusters are coloured in blue and red, respectively.
An optimal projection of the input data with initial classification shown in red (G+WM) and blue (CSF) is given in Figure 2.4.c. The nonlinear data transformation via sigmoidal kernel preserves the sparsity of the input CSF pattern and the density of the G+WM pattern. In this Figure, the Xaxis represents columnwise enumeration of the interior brain voxels from to and the Yaxis represents the projected values . When calculated with the offset they are positive for one class and negative for another. Peaks of both class clusters correspond to the voxels that lie deep inside the tissue volumes and the voxels next to the decision line lie close to the boundary between the CSF and G+WM.
Voxel categorization. Since KFDA computes an optimal decision surface between the CSF and G+WM it easily identifies PV voxels that lie near or on the boundary between the classes.
A closeup look at the distribution of the kernelprojected data reveals overlapping CSF and G+W voxels and class outliers (Figure 2.5.a). We identifed the following voxel categories,

G+WM overlapping voxels (in the negative CSF range) in black,

CSF overlapping voxels (in the positive G+WM range) in green,

CSF outliers in cyan,

G+WM outliers in yellow,

CSF and G+WM tissue prototypes in blue and red, respectively.
A visualization of the overlapping set (coloured in black and green) in the stereotaxic space given in Figure 2.5.b shows that the overlapping voxels are located within the boundary regions between G+WM and the CSF. We recognize a particularly problematic brain area around the sulcus where the CSF is usually poorly detected due to the presence of PVE. Therefore, it is reasonable to assume that the overlapping voxels contain a mix of both tissue classes. The outliers located away from the decision boundary and their initial class centroids are most likely to change their initial membership and the rest of the brain voxels constitute tissue prototypes that truly represent CSF and G+WM tissues.
Voxel categorization is useful since we can use the predictive power of KFDA to assign the most likely class membership to overlapping voxels (containing PVE). For this purpose, we treat the overlapping voxels as a testing set and tissue prototypes as a training set. For the classification of the overlapping set we implemented
nearest neighbours(KNN) classifier. It determines the class membership based on the class majority rule in the vicinity of each voxel defined by Euclidean distances to
nearest tissue prototypes in . The outliers comprise a separate testing set and they are classified using Mahalanobis distance.It is tempting to classify all kernelprojected samples into 2 classes based on their sign (positive or negative). However, this usually leads to an overestimate of the CSF. To monitor the quality of classification produced by predictive labelling of the testing set we devised the following SSIMguided classification procedure.
An algorithm for SSIMguided computation of the decision surface.

Classify the outlier set using Mahalanobis distance

Compute the resulting classified image with tissue class means and the

Classify the overlapping set using KNN classifier with different values of the number of neighbours

For each value of the parameter , compute the resulting classified image and the . Choose the classification that corresponds to the maximal

Compare and and return the classification that corresponds to a larger value.
This algorithm is general enough to handle a variety of class intensity distributions. For a test example provided by Figure 2.4 there are only a few class outliers and their assignment to different classes will not make a visible difference to the initial classification. Due to a large overlapping set, application of the KNN classifier contributes most to the final KFDA classification of the test example. However, there are cases when there is a large number of the class outliers and very few overlapping voxels. In this case, Mahalanobis classification of the outlier set suffices to apply.
Figures 2.6.(ab) demonstrate steps 14 of the algorithm, and Figure 2.6.c shows the separating surface in the input space corresponding to the KNN classification with an optimal value of the parameter shown in Figure 2.6.b. The discriminating boundary between the CSF and G+WM clusters is a plane. The CSF cluster identified by the SSIMguided algorithm contains 1917 voxels, a significant increase over the initial CSF volume consisting of 515 voxels. It is shown in stereotaxic coordinates in Figure 2.6.d.
Step2: Classification into GM and WM.
Having delineated the CSF we classify G+WM into GM and WM. Experiments with various kernel functions showed that the Gaussian radial basis function
is the best choice to model the nonlinear structure of WM and GM clusters. The G+WM of the test example with initial GM and WM labels is given in Figures 2.7.(ab).The initial classification represents a significant underestimate of WM. The small WM structure in the temporal lobe of low signal intensity is visible but not detected by the PVE method.We performed KFDA of the labeled G+WM input data and displayed the projected GM and WM distributions in as shown in Figure 2.7.c. Here, WM and GM peaks correspond to the voxels located deep inside the GM and WM. Unlike the CSF and G+WM case, GM and WM distributions contain only GM overlapping voxels that fall in the positive range of WM and class outliers (Figure 2.7.d). The tissue prototypes that form the training set are colorcoded in red for WM and blue for GM.
We then applied the algorithm for SSIMguided computation of the decision surface and identified 819 WM voxels in the feature space in addition to the initial 482 WM voxels shown in red in Figure 2.7.e. The corresponding nonlinear decision surface that attempts to mimic the shape of the GM cluster in the input space is displayed in Figure 2.7.f. Figure 2.7.f also demonstrates the importance of providing multichannel data for the input. This test example possesses a wide range of T1w WM intensities and a narrow range of T2w WM intensities. Without the T2w imaging data it would not be possible to detect myelinated WM with a lower signal intensity.
SSIM comparison of KFDA and PVE classification results shows that KFDA yields spatial tissue patterns that are structurally closer to their counterparts in T1w reference subdomain (Figure 2.8).
2.5 Stitching of brain subdomains
We applied a Simulated Annealing technique to stitch the local classifications together into a cohesive global picture of the classified brain. First of all, for each brain slice we collected the constituent subimages. Each pair of overlapping subimages contained a joint image region of size or that is to be optimally estimated from two sets of observations , or , as illustrated in Figures 2.10.ab.
We define an undirected graph to represent a lattice structure of the image , where are nodes of the graph and are pairs of neighbouring nodes or edges. For every pixel within there is a hidden node. The node is a random variable taking values from the state space . The nodes are divided into 2 sets (Figure 2.9),

 the observed random variable with a label at a pixel coordinate ,

 a hidden node.
Given neighbouring node interactions it is natural to model as a realization of a Markov Random Field with respect to the graph . The hidden nodes and are connected in a lattice way if and . Each possible label coupling and label weights are modelled by the edge potential and the node potential , respectively. The posterior probability density of the joint label configuration becomes
(2.15) 
where the first product is over all pairs of neighbouring nodes and the second one over all nodes. is a normalizing constant that sums the probabilities over all possible configurations of the variables to 1.
To find labelling of the joint region with the highest probability we estimate and for every node from two sets of observations (e.g., ). is defined on and for every it is given in the form of a matrix with entries
Here, , the space of all possible combinations of label pairs. The node potential assigns weight to every possible value of the node as follows
Here, we also take into account nodal locations, namely, the rightmost nodes in and the leftmost nodes in (similarly, the uppermost nodes in and the lowermost nodes in ) are more likely to have values observed in and (or and ), respectively. In this way, we ensure the continuity of label propagation from nonoverlapping regions to the joint region . The additional weight function is defined as
With this setup we implemented the iterative Simulated Annealing (SA) algorithm (Grenander, 1996) to find the most probable joint region for each pair of overlapping classified subimages in a brain slice. We initialized as a region containing the first two columns/rows of / and the last two columns/rows of /. Such initialization provides a good approximation to the optimal solution and speeds up the convergence of the algorithm.
An example of SA application is shown in Figure 2.10.c. The leftmost columns of and appear slightly different in presence of the CSF and GM.The rightmost columns of and only differ in the value of a single central pixel. The optimal labelling preserves the label configuration of its first and last columns as they appear in their respective overlapping regions and .
(a) Left and right image region overlap, (b) upper and lower image region overlap, (c) Maximum a posteriori estimate of the label configuration
in presence of observations , (d) Sequental steps of the stitching algorithm in direction from top left to bottom right.We started with the upper left subimage and stitched its neigbouring subimages vertically and horizontally. Then for each of the neigbouring subimages we identified overlapping ones in both horizontal and vertical directions, and glued them together using the SA algorithm. By progressive stitching of constituent image parts we assembled the entire brain slice as illustrated in Figure 2.10.d. The proposed SA algorithm yields seamless estimates of the joint regions.
3 Results and Discussion
We applied the local KFDAbased method to segment the brain template for ages 8 to 11 months initially classified into GM, WM and the CSF using PVE. The proposed approach leads to a significant improvement in the CSF detection throughout the brain almost doubling the initial CSF volume (see Figures 3.1 and 3.2). The initial CSF cluster determined by PVE consisted of 26717 voxels and has increased to 53681 voxels with application of KFDA.
Domain  PVE  KFDA  Domain  PVE  KFDA 
1  0.9286  0.9206  21  0.8481  0.8848 
2  0.8963  0.9000  22  0.9496  0.9595 
3  0.8357  0.8500  23  0.8890  0.8975 
4  0.8682  0.8768  24  0.9226  0.9197 
5  0.9014  0.8958  25  0.8769  0.8776 
6  0.8776  0.8709  26  0.9155  0.9496 
7  0.8623  0.8676  27  0.9502  0.9628 
8  0.8850  0.8753  28  0.8642  0.8675 
9  0.8337  0.8530  29  0.8978  0.8897 
10  0.8874  0.9254  30  0.8795  0.8765 
11  0.8375  0.8349  31  0.8261  0.8525 
12  0.8129  0.8545  32  0.8733  0.8709 
13  0.9027  0.9034  33  0.8722  0.8746 
14  0.8928  0.8978  34  0.9743  0.9759 
15  0.9399  0.9382  35  0.9135  0.9254 
16  0.9033  0.983  36  0.8501  0.8581 
17  0.9433  0.9416  37  0.8952  0.8927 
18  0.8768  0.8828  38  0.8191  0.8379 
19  0.8828  0.8878  39  0.8198  0.8456 
20  0.8688  0.8807  40  0.9566  0.9652 
Total MSSIM  
0.8668  0.87588 
The local MSSIM assessment of the performance of KFDA and PVE given in Table 3.1 shows that KFDA outperforms PVE in 28 subdomains out of 40 and overall. KFDA ”loses” to PVE in lower contrast subdomains 5, 15 and 17 and 9 other subdomains within the middle CNR range presumably due to overestimation of the CSF in these subvolumes.
We also applied our segmentation method to the 3D brain template for ages 44 to 60 months with initialization obtained by a label transfer from the brain template for ages 4.5 to 8.5 years. Figure 3.3 illustrates the performance of KFDA using classified brain slices.
Domain  Initial  PVE  KFDA  Domain  Initial  PVE  KFDA 
1  0.7809  0.7846  0.8243  12  0.8444  0.8856  0.8718 
2  0.8051  0.8454  0.8389  13  0.7904  0.8247  0.8320 
3  0.8405  0.8405  0.8759  14  0.8142  0.8304  0.868 
4  0.7815  0.8212  0.8422  15  0.8560  0.8804  0.8825 
5  0.8230  0.8671  0.8601  16  0.8004  0.8231  0.8467 
6  0.7422  0.7422  0.8146  17  0.7785  0.7785  0.8292 
7  0.7071  0.7384  0.7830  18  0.7755  0.8234  0.8479 
8  0.8266  0.8864  0.8676  19  0.7600  0.7600  0.8373 
9  0.9257  0.9482  0.9400  20  0.7213  0.7299  0.8058 
10  0.7925  0.8511  0.8348  21  0.8347  0.8571  0.8747 
11  0.7529  0.8373  0.8128  22  0.7416  0.8071  0.8251 
means  0.7952  0.8256  0.8458 
Local MSSIM assessment of the quality of classifications by initial label transfer, KFDA and PVE classifications suggests that KFDA improves initial classification and outperforms PVE in all but 7 brain subvolumes (Table 3.2). However, in high contrast subvolumes KFDA tends to overestimate WM and CSF leading to poorer structural similarity estimates. Overall, in comparison to PVE, KFDA detects brain tissue classes throughout the entire brain with a higher accuracy.
This example demonstrates the capability of the proposed method to identify small brain structures hardly visible in low contrast subdomains of reference images.
4 Summary
In this paper, we developed a powerful KFDAbased framework that overcomes methodological limitations of existing segmentation approaches in infant brain MRI such as global modelling of tissue intensity distributions and dependency on probabilistic brain atlas. The proposed method

uses a general class separability criterion since it does not impose tissue intensity distribution models on the input data and captures a wide range of tissue cluster nonlinearities,

avoids a buildup of various techniques for refinement of segmentation results and accommodation of intraclass intensity variability,

takes advantage of multichannel brain imaging data and avoids inconsistencies that arise when segmenting each image type separately,

classifies MR brain data locally and improves detection of spatial tissue patterns in low contrast subdomains,

adapts initial classification to the tissue intensity distributions within an individual brain scan,

allows comparison of classification algorithms and automatic monitoring of the quality of classification in the absence of the ground truth via the objective image quality metric SSIM.
Our framework is semisupervised since we used tissue label transfer from an older brain MRI to a younger brain anatomy for initialization. The initialization provides the best guess on the spatial locations of tissue classes that is updated in accordance with tissue intensity histograms of brain imaging data.
We explored the potential of KFDA in applications to brain tissue classification of infant brain MRI, in particular, the NIH pediatric database. We observed that even with poor initial estimates of the class clusters in the brain template for ages 8 to 11 months KFDA compensates for the underestimates and detects most of the tissues visible in MRI. Overall, application of the KFDAbased method yields a more accurate quantification of brain tissue volumes from infant brain MRI.
The proposed method is mathematically rich and offers avenues for the further advancement of KFDAbased methodology. To mention a few,

The classification problem in the absence of ground truth can be stated in a mathematically rigorous way, namely, as an optimization of MSSIM over a set of overlapping voxels in the stereotaxic space. The Gibbs field model of the unknown label configuration defined on this set with the Gibbs interaction energy in the form of the negative of MSSIM increment will allow optimal estimation of labelling via SA. The classification results obtained in this way do not correspond to the global maximum of MSSIM.

A more competitive perception quality measure can be used for the evaluation of the classification quality. The information contentweighted SSIM (Wang and Qiang, 2011) is an advanced version of SSIM that assigns local information content weights to image components containing more information. Such a metric will emphasize tissue boundaries in the classified image and measure the structural similarity between classified and T1w(reference) images more accurately.

A more general kernel function in the form of a linear combination of Gaussians with different bandwidths can be explored.
The proposed method is applicable to brain tissue classification of multichannel brain MRI for ages 8 months and older.
5 Acknowledgements
This project was funded in whole or in part by the Montreal Neurological Institute in the form of a postdoctoral fellowship, the National Institute of Child Health and Human Development, the National Institute on Drug Abuse, the National Institute of Mental Health, and the National Institute of Neurological Disorders and Stroke (Contract #s N01HD023343, N01MH90002, and N01NS92314, 2315, 2316, 2317, 2319 and 2320). This manuscript expresses the views of the authors and may not reflect the opinions or views of the NIH.
Appendix A Fundamentals of KFDA
Originally, Kernel Fisher Discriminant Analysis was proposed for a binary classification problem (Mika et al., 1999) and later it has been extended to multiclass classification. Presented below is a classical binary KFDA.
Let be an input set, an arbitrary subset in and let be class label set. We refer to an input pair , where and as a sample. The assumption is that all samples are drawn randomly and independently from unknown probability distribution over .
Let and be samples from two different classes (referred to as negative and positive for a reason that will become clear later) with means and .
KFDA finds a nonlinear direction of maximal information discrimination in the input space by first mapping the data and nonlinearly into a highdimensional feature space through an implicit mapping
and computing an optimal separating hyperplane there that corresponds to a nonlinear separating surface in the input space. The power of KFDA lies in ability to capture complex nonlinear structures of clusters in the input space.
To understand a kernel ”trick” associated with KFDA we first formulate a linear discriminant optimality criterion. In the context of our classification problem in 3D input space, we aim to find the direction of input data projection such that

withinclass variance is minimized, where
Overall, the objective is to maximize a linear discriminant
To find an optimal nonlinear direction of data projection, we first nonlinearly transform the data with the implicit mapping
to the abstract feature space and compute the linear discriminant in this feature space. Then, in we have(A.1)  
(A.2)  
(A.3) 
Thus, the linear discriminant in is
(A.4) 
Since is a highdimensional space whose dimension would be equal to the total number of interior brain voxels in applications to MR brain images it is impossible to solve A.4 directly. We seek a formulation of the optimality criterion in terms of dotproducts of the mapped samples since by Mercer’s theorem (ShaweTaylor and Cristianini, 2004) we can compute these dotproducts without explicit knowledge of the mapping via kernel functions. The kernel is a symmetric function that defines the dot product for . Possible choices for are Gaussian radial basis function(RBF), , polynomial kernels
.Therefore, we make use of the kernel trick and rewrite (A.4) in computable form. Let the total number of samples be . Given that any solution lies in the span of training samples (Saitoh, 1988)
(A.5) 
Using equations (A.5) and (A.1) we obtain
(A.6) 
where is an vector with entries . Similarly, we find
(A.7) 
By using equations (A.2), (A.6) and (A.7) the numerator of in (A.4) can be rewritten as
(A.8) 
By using equations (A.1), (A.5) and (A.3) the denominator of in (reffisher) becomes
(A.9) 
where and are kernel matrices of sizes and , respectively, with entries ,
is the identity matrix of size
and is the matrix with all entries . Combining (A.8) and (A.9) we aim to find a vector in that maximizes(A.10) 
Here, is a small positive regularization parameter added to the variance entries of the withinclass variancecovariance matrix to ensure its positivedefiniteness since N may be singular. The decision boundary between the classes in the input space is a set of points that satisfies
(A.11) 
Since depends on the kernel choice, the classification result will also be kerneldependent. As such, it is important to choose the kernel function that best captures a nonlinear behaviour of class clusters based on their natural occurrences in the input space.
References
 Seeded region growing. IEEE Trans. Pattern Analysis Machine Intelligence 16(6), pp. 641–647.
 The nih mri study of normal brain development (objective2): newborns, infants, toddlers, and preschoolers. NeuroImage 35 (1), pp. 308–325. Cited by: §1.1.
 BigBrain: an ultrahighresolution 3d human brain mode. Science 340(1472), pp. 1472–1475.
 Multimodal image coregistration and partitioning a unified framework. NeuroImage 6, pp. 209–217.
 On evaluating brain tissue classifiers without a ground truth. NeuroImage 36, pp. 1207–1224.
 Partial volume tissue classification of multichannel magnetic resonance imagesa mixel model. IEEE Transactions on Medical Imaging 10 (3), pp. 395–407. Cited by: §1.2.
 A fully automatic and robust brain mri tissue classification method. Medical Image Analysis 7 (4), pp. 513–527.
 Automatic 3d intersubject registration of mr volumetric data in standardized talairach space. J Comput Assist Tomogr. 18 (2), pp. 192–205.
 ANIMAL+ insect: improved cortical structure segmentation. In Information Processing in Medical Imaging, pp. 210–223. Cited by: §1.1.
 The nih mri study of normal brain development. NeuroImage 30, pp. 184–202. Cited by: §1.1.
 Brain templates and atlases. NeuroImage 62, pp. 911–922.
 An mribased probabilistic atlas of neuroanatomy. In Magnetic Resonance Scanning and Epilepsy Plenum, NATO ASI Series, pp. 263–274.
 Unbiased average ageappropriate atlases for pediatric studies. NeuroImage 54 (1), pp. 313–327. Cited by: §1.1.
 Elements of pattern theory. The Johns Hopkins University Press, Baltimore, Maryland. Cited by: §2.5.

Optimal kernel selection in kernel fisher discriminant analysis.
In
ICML 2006 Proceedings of the 23rd international Conference on Machine learning
, pp. 465–472.  The effect of map boundary on estimates of landscape resistance to animal movement, online publication. PLoS ONE 5(7). Cited by: item 4.
 A dynamic 4d probabilistic atlas of the developing brain. NeuroImage 54 (4), pp. 2750–2763.
 Automated 3d extraction of inner and outer surfaces of cerebral cortex from mri. NeuroImage 12, pp. 340–356.
 Fisher discriminant analysis with kernels. In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp. 41–48. Cited by: Appendix A, §1.3, §2.2.
 Segmentation of brain mri in young children. Academic Radiology 14 (11), pp. 1350–1366.
 New variants of a method of mri scale standardization. IEEE Transactions on Medical Imaging 19 (2), pp. 143–150.
 An adaptive clustering algorithm for image segmentation. IEEE Transactions on Signal Processing 40 (4), pp. 901–914. Cited by: §1.2.
 Applying circuit theory for corridor expansion and management at regional scales: tiling, pinch points, and omnidirectional connectivity, online publication. PLoS ONE 9(1). Cited by: item 4.
 Novel vectorvalued approach to automatic brain tissue classification. In Medical Computer Vision, Recognition Techniques and Applications in Medical Imaging, Berlin Heidelberg, pp. 70–81. Cited by: §1.3.
 Local semisupervised approach to brain tissue classification in mri with high intensity variation. to appear in IEEE Transactions on Medical Imaging.
 Automatic segmentation of mr images of the developing newborn brain. Medical Image Analysis (MedIA) 9 (5), pp. 457–466. Cited by: §1.2.
 Soft margins for adaboost. Machine Learning 42, pp. 287–320. Cited by: §1.3.
 Medical image segmentation based on mutual information maximization. In 7th Int. Con. on Medical Image Computing and Computer Assisted Intervention, Lecture Notes in Computer Science, Vol. 3216, pp. 135–142.
 Theory of reproducing kernels and its applications. Longman Scientific & Technical, Harlow, England. Cited by: Appendix A.
 Lesion identification using unified segmentationnormalisation models and fuzzy clustering. NeuroImage 41(43), pp. 1253–1266.
 Kernel methods for pattern analysis. Cambridge University Press, Cambridge. Cited by: Appendix A.
 A nonparametric method for automatic correction of intensity nonuniformity in mri data. IEEE Transactions on Medical Imaging 17 (1), pp. 87–97.
 Advances in functional and structural mr image analysis and implementation as fsl. NeuroImage 23 (S1), pp. 208–219.
 Fast robust automated brain extraction. Human brain mapping 17 (3), pp. 143–155.
 Fast and robust parameter estimation for statistical partial volume models in brain mri. NeuroImage 23 (1), pp. 84–97. Cited by: §1.2, §1.3.
 Automated modelbased tissue classification of mr images of the brain. IEEE Transactions on medical imaging 18 (10), pp. 897–908. Cited by: §1.1, §1.2.
 Statistical learning theory. Wiley, New York. Cited by: §1.3.
 Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §1.3, §2.3.
 Mean squared error: love it or leave it? a new look at signal fidelity measures. Signal Processing Magazine, IEEE 26 (1), pp. 98–117. Cited by: §1.3.
 Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing 20 (5), pp. 1185–1198. Cited by: item 2.
 A universal image quality index. IEEE Signal Processing Letters 9 (3), pp. 81–84. Cited by: §2.3, §2.3.
 Multiscale structural similarity for image quality assessment. In Conference Record of the 37th Asilomar Conference on Signals, Systems and Computers, Vol. 2, pp. 1398–1402. Cited by: §1.3.

Adaptive, template moderated, spatially varying statistical classification
. Medical Image Analysis 4 (1), pp. 43–55. Cited by: §1.2.  Automatic segmentation of newborn brain mri. Neuroimage 47 (2), pp. 564–572. Cited by: §1.1.
 Adaptive segmentation of mri data. IEEE Transactions on Medical Imaging 15 (4), pp. 429–441. Cited by: §1.2.
 Automatic segmentation and reconstruction of the cortex from neonatal mri. NeuroImage 38 (3), pp. 461–477. Cited by: §1.2.
 Morphometric analysis of white matter lesions in mr images: method and validation. IEEE Transactions on Medical Imaging 13 (4), pp. 716–724.
Comments
There are no comments yet.