## 1 Introduction

The delineation of primary tumors or metastases from diagnostic imaging is a crucial but challenging problem in any clinical applications, such as tumor detection, diagnosis, treatment planning for radiofrequency ablation or surgical interventions, and monitoring of treatment response which is done with predefined criterions. Segmentation of tumors in 3D is a pre-requisite for precise clinical measurements that are done during clinical examinations. One example of such an application in liver oncology is the Response Evaluation Criteria in Solid Tumors (RECIST) that measures the size of solid tumors as the maximum diameter Eisenhauer09. Ideally, this metric would be made in a volumetric fashion in 3D, instead of repeating the delimitation on each axial plane. However manual segmentation is not applicable in clinical routine as it takes too much time and lacks reproducibility across raters Pinto10. Furthermore, to precisely follow the progression of a tumor throughout longitudinal scans, it remains difficult to identify the accurate boundaries of the tumors to quantify the difference in volume. For these reasons, automated tumor segmentation would represent an ideal alternative if it was efficient, reproducible, and permitted accurate delineation of tumor boundaries and tumor volumetry. Further, clinical utility of automated tumor segmentation would be enhanced if it permitted longitudinal comparison of tumor volumetry in subsequent visits.

Different methods were previously presented to achieve semi-automated or automated segmentation of metastases in different organs and pathologies, such as in prostate, lung or for neurological disorders. Approaches using discrete Markov Random Fields (MRFs; Li09) and Conditional Random Fields (CRFs; Lafferty01

) were used to model complex dependencies within sample distributions, offering improved segmentation accuracy compared to independent and identically distributed (i.i.d.) classifiers such as Support Vector Machines (SVM). Other works by Li et al.

Li06 located tumor boundaries uses machine learning to classify regions, while Freiman et al. Freiman08 used a Bayesian classifier to segment various organs with medical images and subsequently applied morphological operators to correct for segmentation discrepancies. Furthermore, a study comparing three semi-automated methods was presented in Zhou10 using contrast-enhanced CT. In the study by Smeets et al. Smeets10, a supervised statistical pixel classification method was trained from a dataset of liver tumors. The technique then applied a level-set adaptation approach to isolate the region of interest. In Subbanna13, a hierarchical probabilistic Gabor filter was used in combination with an MRF segmentation framework to delineate brain tumors in MRI volumes, where a Bayesian framework provides a coarse probabilistic texture-based segmentation of active and edema regions. But despite significant intra- and inter-rater variabilities and prolonged time for manual segmentation, very few of these automated approaches are currently used in clinical practice. Completely automated methods may be prone to suboptimal precision and not be as robust as semi-supervised approaches. Furthermore, they often require high computational requirements.State-of-the-art tumor segmentation frequently combines efficient classification techniques with low level segmentation methods. From such perspective, tumor detection is addressed as a classification problem where one aims at separating normal from diseased tissues at the voxel level, while imposing smoothness criterions in the process. In Bauer11, SVM classification using multispectral intensities and textures is combined with hierarchical CRFs to segment brain tumors in MR images. In Hame12

, a semi-automated lesion segmentation method is presented using Markov Random Fields and non-parametric distributions which are defined locally to estimate the tumor shape. Alternatively, the use of fully-connected pairwise models can improve the performance over traditional MRF models that are based on a finite number of links. Still, pairwise models remain limited to depict higher-order relationships, which could produce results of limited accuracy when modelling the global appearance of an object. On the other hand, higher-order relationships can incorporate a reliable labelling output within inhomogeneous regions, as shown in

Sabuncu10, while prior information modelled as co-occurrences between various classes has demonstrated significant promise for segmentation Vineet12. This idea has been applied for the classification of pathology using discrete hidden MRFs Charras-Garrido12.A major drawback from these methods is that conditional probabilities are obtained directly from the high-dimensional image space, which is not ideal as they fail to express the underlying representation of the dataset and assimilates all measures to Euclidean distances

Roweis00. They also assume linear discrimination (SVMs) when in most cases, sets are not linearly separable in the image space. In contrast, manifold learning techniques intrinsically consider the nonlinear distribution of the data, and enable important evaluation of test cases in a learned population by using a mapping distance. Recently, various approaches have used manifolds to discriminate the presence of white matter brain lesions Kadoury12 or track organ motion or discover regional variations within images Bhatia12. In Gerber10, manifold learning techniques were used to capture the non-linear variability of brain development from a dataset of MR images. However, techniques such as Laplacian eigenmaps are sensitive to outliers and unable to cope with pathological or abnormal tissues

Yang14. To the best of our knowledge, prior information captured by a discriminant embedding has yet to be exploited in a fully automated graphical model segmentation framework. Preliminary results demonstrated promise but was validated on a limited dataset and could not properly capture inter-tumor inhomogeneities Kadoury13.In this paper, we propose a segmentation approach for tumors using discriminant manifold embeddings, where the low-dimensional representation are integrated in a graphical model used for segmentation. The contributions are two-fold. First, a discriminant graph-embedding with Grassmannian kernels is generated from prior data to learn the nonlinear intensity distributions of normal liver tissue and pathological regions. Second, we suggest to employ a recently proposed mean-field CRF inference approach where potentials are computed directly from the low-dimensional embedding, capturing the underlying structure. Unary and pairwise potentials assess the proximity to manifold regions and the dissimilarity between pairs of segments, respectively. Higher-order potentials ensure regional consistency to efficiently discriminate tumors from normal tissue. These potentials are integrated in an existing higher-order conditional random field Kohli09. We validated the manifold constrained segmentation method on metastatic liver tumors in CT images, from two separate clinical datasets which included diagnositic images of patients which were subsequently treated by radio frequency ablation.

We now describe the proposed framework, with the outline illustrated in Fig. 1

. The first phase consists of creating a non-linear, low-dimensional embedding from patches extracted from training datasets of annotated tumor images and liver organ tissue, surrounding the tumor region. These patches are embedded in a Grassmannian manifold, where within and between similarity graphs are created from the manifold points in order to incorporate a discriminatory measure in the learning process. Once the learning phase is completed, a new patient image can be processed using an integrated and interconnected CRF graph to perform the unsupervised segmentation of tumor regions. This graph involves costs related to manifold support, prior geometrical dependencies and cliques described in the discriminant manifold domain, within regions of interest of the tumor. Pairwise potentials measures the similarity between connected voxels in manifold space, while high-order cliques evaluates the variance of data points within manifold space. The energy function is minimized with primal-dual optimization procedure. A careful selection of the intrinsic dimensionality and parameter settings is performed to properly model the non-linear space.

The paper is structured as follows. The developed method, including the training and segmentation steps is presented in Section 2. The experimental results are reported in Section 3, and Section 4 concludes the paper with a discussion.

## 2 Materials and Methods

In this section, we present the imaging data used in this study and the overall framework of the metastatic liver tumor segmentation. First, the training phase learns the discriminant Grassmannian manifolds using databases of labelled tumor and normal liver tissue in order to determine the mapping function for unseen cases. Then, the segmentation process of a new CT examination uses the manifold potentials in a conditional random field segmentation process to delineate the region of interest.

### 2.1 Imaging data

To train and test the proposed tumor segmentation framework, two distinct CT imaging datasets were used. The first clinical dataset was the public dataset from the 2008 MICCAI segmentation challenge, which was used by several other groups for tumor segmentation Deng08. The database includes 4 training CT datasets, which have 10 ground-truth tumor segmentations. The testing database has 13 CT datasets, totalling 20 tumors with ground-truth segmentations.

The second clinical dataset was composed of a total of 30 images provided by radiology departments from two separate clinical institutions, totalling 40 tumors. The first subset included 12 contrast-enhanced CT examinations, totalling 15 tumors as two patients had multiples tumors. The images were acquired with a 64-slice CT scanner with axial dimensions of 512 by 512, 1mm slice thickness, in-plane resolution of 0.6mm and the number of slices ranging between 112 and 330. Cases were referred for follow-up or for subsequent treatment such as for chemo-embolization or radio frequency ablation (RFA). The second subset included 18 contrast-enhanced CT examinations acquired also with a 64-slice CT scanner (in-plane resolution of 512 by 512), with a slice thickness varying between 0.9 and 1.2mm, in-plane resolution of 0.8mm and the number of slices ranging between 87 and 274. In this database subset, tumors were limited to liver metastases from colorectal cancer. Tumors sizes varied between 2 and 7 cm in maximum diameter in this database and were manually segmented by an experimented radiologist. Every patient had a single tumor, except 5 patients, who had 3 tumors (2 patients) and 2 tumors (3 patients). The total number of metastatic tumors was 25 tumors. The second dataset was randomly separated into 10 training and 30 testing cases.

### 2.2 Learning Discriminant Grassmannian Manifolds

Manifold learning algorithms are based on the premise that data are often of artificially high dimension and can be embedded in a lower dimensional space. However the presence of outliers and multi-class information can on the other hand affect the discrimination and/or generalization ability of the manifold. We present here the mechanism to learn the optimal separation between normal and pathological tissue by using a discriminant graph-embedding based on Grassmannian manifolds proposed by Harandi11, where it is adapted to the segmentation process. Here, we describe the Grassmannian kernels applied to the input data points, which in turn are embedded into a discriminant manifold domain that incorporates within and between similarity graphs.

#### 2.2.1 Grassmannian kernels

In the framework, each sample point (representing an image patch) in a Grassmannian manifold is described by the set of -dimensional domains in , described by a orthonormal matrix of . It is then possible to see if two Grassmannian manifold points are similar, by mapping one to another using the orthogonal matrix. In this work, the similarity between two manifold points () is measured as a combination of two Grassmannian kernels defined in the Hilbert Space such that:

(1) |

with . The projection kernel defined as determines the largest correlation based on the cosine measure between principal vectors of two sets. The second denotes canonical correlation Grassmanian kernel:

(2) |

subject to and . This kernel is positive definite since such that as shown in Harandi11

, and well-defined since singular values of

are equal to , with , indicating orthonormal matrices of order . This makes the kernel invariant to various representation of the subspaces. While the projection kernel based on principal angles is able to identify the predominant differences from a pair of image sets, the canonical correlation kernel acts as a similarity metric to identify features which offer the best correlation between two sets of images. This later kernel is also invariant to various representations of the subspaces. With each kernel representing different components of the image distributions, the combined kernel is able to cover a wider range of features to efficiently characterize tumors with respect to typical features apparent in normal liver tissue, and identify the discriminant features to separate both classes.#### 2.2.2 Graph architecture

In order to effectively discover the low-dimensional embedding, it is necessary to maintain the local structure of the data in the new embedding. The structure is an undirected similarity graph, using a collection of nodes *V* connected by edges, and a matrix *W* with symmetric values describing relationships between the nodes. The relationship between the diagonal *D* and the Laplacian *L* matrices is modelled by as . Here, each element of *D* describes the sum of the weight matrix *W*. By representing an architecture as a set nodes and edges, the goal of discriminant graph embedding is to increase the discriminatory power of embedded data by mapping the high-dimensional distribution data to another low-dimensional space. In the context of embedding, similarities between pairs of vertices remain the same in either space by solving a general eigen-analysis problem Yan07.

Here, labelled points are generated from the underlying manifold , where denotes the label (tumor or normal). The task at hand is to maximize the discrimination between tumor and normal liver tissue by mapping the underlying data into a vector space , while preserving similarities between data points in the high-dimensional space. Discriminant graph-embedding based on locally linear embedding (LLE) Roweis00 uses graph-preserving criterions to maintain these similarities, which are included in a sparse and symmetric matrix, denoted as . The embedding cost function is minimized using the following formulation:

(3) | ||||

with as identity. Therefore, the LLE algorithm is described as a direct graph embedding problem.

#### 2.2.3 Within and between similarity graphs

In our work, the manifold embedding is constructed with a within-class graph , describing the similarity of regions with the same type (tumor and normal liver tissue), and a between-class penalty graph , used to discriminate normal and tumor tissue. When constructing the discriminant LLE graph, elements are partitioned between these two classes. The intrinsic graph is first created by assigning edges only to vertices of the same class (tumor or normal). The local reconstruction coefficient matrix is obtained by minimizing:

(4) |

with as the neighborhood of size , within the same region as point (e.g. tumor region). Each sample is therefore reconstructed only from images of the same region. The local reconstruction coefficients are incorporated in the within-class similarity graph, such that the matrix is defined as:

(5) |

Conversely, the between-class similarity matrix depicts the statistical properties which are used to maximize the distance between classes and used as a high-order constraint. Distances between normal and pathological samples are computed as:

(6) |

with containing neighbors having different class labels from the th sample. The objective is to transform points to a new manifold of dimensionality , i.e. , by mapping connected tumor or normal samples in as close as possible to the class cluster, while moving tumor and normal areas of as far away from one another. This results in optimizing the following minimization and maximization objective functions:

(7) |

(8) |

Here, Eq.(7) is used to penalize neighbouring points belonging to the same type if they are mapped far away within . Conversely, Eq.(8) penalizes embedded points of distinct classes, in the case where these are mapped close to each other within , as shown in Fig. 2. Using the structure, it can be shown that an implicit representation of manifold points can be obtained by finding the similarity between points using the Grassmannian kernel , an appropriate mapping function can be obtained using a supervised manifold learning approach, which will be described in the following section.

#### 2.2.4 Supervised manifold learning

The optimal projection matrix, mapping new points to the manifold created in the training phase, is obtained by simultaneously maximizing class separability and preserving interclass manifold property, as described by the objective functions in Eqs.(7)-(8). Assuming points on the manifold are known as similarity measures given by the Grassmannian kernel defined in Eq.(1), a linear solution can be defined, i.e., for the

largest eigenvectors with

. Defining the coefficient and kernel vectors, the output can be described as . By replacing the linear solution in the minimization and maximization of the between- and within-class graphs, the optimal projection matrix is acquired from the optimization of the function as proposed by Harandi Harandi11. The algorithm generates implicitly a mapping using the points on the Grassmannian manifold. The discriminant mapping will enable the conservation of the global geometrical structure of the inherent distribution. The minimization of Eq.(7) becomes:(9) | ||||

given Eq. (3). Here, and . Similarly, the maximization of the between-class graph is defined as , with . By combining and , the optimal projection matrix is acquired from the optimization of the function:

(10) |

which is obtained by the eigenvalue decomposition method and

is a Lagrangian multiplier that regularizes the final mapping function. Hence for any test point , a manifold representation is obtained using the kernel function based on and mapping .### 2.3 Fully-Connected Graph Segmentation

Once the Grassmannian manifolds are obtained from the training phase, the segmentation problem is performed using a higher-order CRF model where clique potentials are inferred from the low-dimensional embeddings. In this sub-section, we describe the segmentation framework with the minimization of the energy term used in the process, as well as the definition of the energy terms defined in the manifold space.

We use higher-order conditional random fields Kohli09; Bauer11, where instead of using direct voxel intensities which is a highly dimensional problem, clique potentials are derived from the trained discriminant embeddings in order to perform an unsupervised tumor segmentation. Potentials describing smoothness terms restrain traditional techniques in segmenting small structures in specific regions of interest, such as hyper-vascular tumors and irregularly shaped metastases. In fact in previous works, models generated using regular CRFs based on pairwise potentials yield very smooth results which do not follow the actual region of interest. Instead of using direct pixel intensities, we propose to employ potential measures obtained from the learned models to incorporate regional consistency in the segmentation process. A voting scheme will help to drive the delineation step as a smooth constraint.

We propose to incorporate manifold potentials defined in the discriminative high-dimensional feature space into the typical CRF model. The well known Gibbs formulation of this higher order CRF is adapted for this purpose and defined as:

(11) |

The random variables

denotes whether labels belong to the object of interest (tumor or not) in label space , denotes edges connecting pairs of nodes and denotes the set of segments, composed of multiple embedded patches. The CRF model defines the energy function based on the sum of unary and pairwise potentials , as well as higher-order functions , which will be described in the following section, where denotes the manifold projection of data point and is the label (tumor or normal) of node . The nodes in the CRF model corresponds to the manifold embedded segments of . Here, image segmentsare obtained from superpixels on each 2D slice, which consists of a normalized-cut segmentation method, followed by an iterative k-means clustering to create an image parcelled in equal-sized patches (average size of 12 pixels) that do not overlap

Mori05. Then, segments are mapped using the projection matrix such that . Features are selected implicitly by the manifold from the low-dimensional space to obtain the best discrimination between the classes. Here, we utilize a fully-connected CRF model using a mean-field approximation of the original CRF such that the distribution is composed of a set of independent marginals minimizing KL-divergence.#### 2.3.1 Manifold-based potentials

Here, we describe the three potentials that constitute the objective function of the CRF defined in Eq.(11).

*Unary potentials* in the CRF model, represented by the term of the energy function, represents the likelihood to assign a patch with a specific label. Typically, the unary potential is determined by the grayscale value of the voxel and the appearance model for each region of interest. However, intensity alone is not sufficient to discriminate between similar groups and often produces erroneous tumor delineations. Instead, we propose to utilize elaborate potential functions based on manifold representations of the classifier , which evaluates the likelihood that the extracted patch belongs to the region of interest based on its embedded location within the Grassmannian manifold. The likelihood is calculated based on the negative value of the log function.

*Pairwise potentials* are expressed as non-parametric manifold dissimilarities, extending the Gaussian kernel formulation in feature space which uses mean-field approximations of fully-connected CRF models Krahenbuhl11. Instead of forcing Euclidean features to fulfill this task, pairwise potentials are conditioned on the input data such that , where describes how compatible labels are assigned between nodes and . The distance between points and , under label , is the conditional distribution of the label , with:

(12) |

Assuming the Frobenius distance in manifold space can offer a non-parametric estimation of the dissimilarity measure, the pairwise potential can be defined by imposing a range over which valuable information can be inferred when applying a Gaussian window:

(13) |

with the parameter weighting the pairwise relations.

*Higher-order potentials* are quality sensitive functions that define the label inconsistency in regions (different labels are assigned to a set of neighboring segments), by adding edges between nodes that are not immediate neighbours. We use the strategy in Vineet12 where for a given clique grouping segments, different embeddings are generated with the manifold projection matrix . The variance of the embedded coordinates of for all data points is the used as a quality measure for all consistent segments in . The higher-order potential is defined as
when , which counts the number of segments in not taking the dominant label , is lower than a threshold . In the case the count is over , then , where produces a penalizing term using , which defines the flexibility of the clique potential. This term will favour joining similar segments together, while penalizing other segments which are less likely to constitute a single region. As proposed by Kohli09, the penalty term measures how the output unary potential varies on all voxels of a segment in order to assess the reliability of the potential. Here, the model parameters which are assigned to the potentials are obtained using cross-validation on the training data.

#### 2.3.2 Energy minimization

We minimize the higher order CRF defined earlier in Eq.(11) by decomposing the problem into sub-modular terms. Kohli et al. Kohli09 showed that robust higher order functions can be found by expanding or replacing the clique function using auxiliary binary terms. Optimal displacements are calculated for these algorithms in order to minimize the clique potentials. Therefore, the high-order displacement terms are transformed into sub-modular quadratic functions, and are optimized based on graph cuts. The minima of the energy function represents the solution for the tumor segmentation based on a labelling process which assigns a class at voxel of the image, generating tumor and normal tissue segmentations.

Finally we apply a Primal-Dual algorithm called FastPD KomodakisPT11

which can efficiently solve the problem in a discrete domain by formulating the duality theory in linear programming. Compared to other methods which do not provide optimal solutions or require long computational times to approximate the global minimum, the advantage of FastPD lies in its generality and efficient computational speed. It also guarantees that the generated solution will be the best approximation of the true global optimum without the condition of linearity.

### 2.4 Validation methodology

To evaluate the performance of the algorithm in a controlled setting and help determine parameter values for the proposed framework, a dataset created from synthetic phantoms using uniform agar gel (8% agar w/v) as shown in Fig. 3(a), were used for an initial evaluation. Phantoms provide a medium to generate artificial data in a controlled settings, therefore providing gold-standard segmentation for reference. While they do not yield images similar to real patient images, it was shown in previous studies to provide significant information on the behavior of image processing algorithms Li13. A total of nine phantoms were created, where each model included a tumor region composed of diluted dye agent in the middle of the phantom model. The size of the tumor (between 1 and 5cm), as well as the contrast of the tumor varied in each synthetic model. A CT image for each phantom was obtained with a 512 by 512 resolution and 1mm slice thickness. Gaussian noise with a s.d. of 1.5 was added to the images, simulating point granularity during image acquisition. This way, nine tumors were artificially simulated at different levels of contrast (8, 13 and 20 Hounsfield units) (Fig. 3(b)), with three different tumor diameters (1, 3 and 5 cm).

For the first clinical dataset, experienced radiologists generated ground-truth segmentations for the test data, and the evaluation was conducted by comparing the automated segmentations to the interobserver variations obtained by ground-truth segmentations. Measures such as (1) volumetric overlap error (%), (2) the average symmetric surface distance (mm), (3) the root mean square (RMS) symmetric surface distance (mm), (4) the relative absolute volume difference (%), and (5) the maximum symmetry surface distance (mm) were computed for each segmentation based on the 2008 MICCAI liver tumor segmentation challenge Deng08

. A score between 0 and 100 was assigned to each measure based on the scoring system designed for the tumor segmentation challenge to compute an overall score based on human segmentation. For a perfect segmentation, a metric will get a score of 100 when the value is an exact match (error=0). A reference point with a score 90 for each metric is determined from the segmentation performed by independent users. It represents the score a human observer can get by manual segmentation. The values of the five metrics for the reference point are as follows: (1) volumetric overlap error = 12.94 %, (2) relative absolute volume difference = 9.64%, (3) average surface distance = 0.40mm, (4) RMS distance = 0.72 mm, (5) maximum symmetry surface distance = 4.0 mm. The score of each metric for a segmentation is then obtained using linear interpolation or extrapolation between the two points specified above (exact match and reference point).

To set similar conditions to the experiments from the MICCAI challenge, the operator could identify a coarse region of interest (120x120) around the tumor area in a single 2D axial slice, in order to allow a more narrow search region that surrounds the target tumor. The modified search area provides the method with a better sample tissue representation, which can help the discrimination process between normal and pathological tissue. The final segmentation obtained with the selected search region was used for comparison to the ground-truth segmentation.

For the second clinical dataset, the same five quantitative measurements defined previously were obtained by comparing the automated segmentations with the ground-truth manual segmentations, which were obtained by two independent experienced radiologists. No user interaction was required as the datasets were already processed to segment the liver from CT images zhang2010automatic. The manifold learning and segmentation algorithm were programmed in C++, running on a 3.4 GHz Intel Core i7 CPU with 32 GB of RAM. The average training time took approximately 6h, which included the patch extraction and learning the manifolds offline.

## 3 Results

In this section, we present the experiments performed on both synthetic and clinical datasets of CT images with liver tumors. Results obtained from both qualitative and quantitative measures were compared to manual ground-truth segmentations, and confronted to state-of-the-art tumor segmentation approaches.

### 3.1 Artificial data

The Grassmannian manifold was trained with ground-truth datasets and the optimal neighborhood size was found at for within-class similarity graphs (), and for between-class neighborhoods (). The optimal manifold dimensionality was set at , when the trend of the nonlinear residual reconstruction error curve stabilized for the entire training set. Fig. 4a shows the ROC curves when using different types of kernels (, illustrating the increased accuracy using the combined kernel (), which suggests each are extracting complementary features from the training data. A 3D embedding of the computed manifold is represented in Fig. 4b, showing the group of training image patches provided by the tumor and normal liver patches embedded in a single continuous manifold structure.

Segmentation accuracy results where then obtained for all the synthetic datasets using a leave-one-out cross validation, studying the effect of contrast and tumor size from the segmentations of nine tumors. The results from this evaluation are presented in Table 1. Unsurprisingly, these show that the method’s performance in segmentation accuracy increases as the contrast level of the tumor increases as well. The accuracy with a contrast level of 8 Hounsfield units (HU) yields the lowest scores, while a contrast level of 20 HU generated the highest accuracy in segmentation. Typical levels of contrast for tumors in patients treated with RFA is of 10 HU. Results presented here can be used as a reliability indicator of the segmentation result of real tumors. Another factor which was evaluated in this experiment was the effect of the tumor size on the segmentation results. As the tumor tends to be larger, it provides the algorithm with increased contextual information that can be integrated within the higher-order clique potentials. On the other hand, these artificial tumors are constituted of primarily homogenous regions, while real patient data often presents heterogeneous portions, which is typically more challenging.

### 3.2 Challenge dataset

Table 2 presents the evaluation results from all 20 test cases provided by the tumor segmentation challenge dataset, while the distribution of scores compared to human raters is shown in the box-plots diagram in Fig. 6

. In the evaluation, the training data was not included. The average time for segmenting a tumor was 53sec, given an input ROI. Segmentations were compared to an active contours approach with local Gaussian distributions

Wang09 and to a texture classification method Bauer11. The average and RMS surface distances ( and mm) of the proposed method were significantly lower () than distances generated by Wang09 and Bauer11. Typical results of metastatic liver tumors segmented on CT images are shown in Fig. 5. As it was observed in a previous study, the gallbladder represents an important cause of wrong classification of perihepatic metastasis detection since it exhibits similar intensity levels to many metastases, while being close to the location of the liver. The gallbladder can generate a convex-shaped indentation in the liver capsule, changing the morphology of the liver capsule.The average score (using the challenge scoring system) for overlap error between automated and manual segmentation was . Boxplots show the proposed method performs similarly to interobserver variability in all quantitative measurements. As demonstrated in Hame12, segmentation results were not as accurate for larger tumors compared to smaller sized tumors. By analyzing results for tumors with higher volumes, the average score of was obtained, while the overlap error and volume error were similar to average results. On the other hand, surface distance errors were higher, with a average distance of 3.06mm. This can be explained by the less-uniform appearance of larger tumors, which may include hyper-vascular or necrotic regions. Furthermore, larger tumors are often located close to the liver edges, causing misclassification with surrounding organs as well.

### 3.3 Clinical dataset

Table 3 shows the performance of the method with 10 training and 30 test cases from the clinical dataset. The average segmentation time per tumor was of 1min 42sec, since the automatic process included the entire liver region. Results were also compared to an active contours approach with local Gaussian distributions Wang09 and to a texture classification method Bauer11. The average and RMS surface distances ( and mm) of the proposed method were significantly lower () to distance measures generated by Wang09 and Bauer11. In terms of overlap error, the proposed approach reduces this error by compared to the top ranking texture classification. Typical problems occurred in the periphery of the tumors and in cases of rim-enhancing liver metastases. These cases offer a density which was not observed in the training set, but could be compensated with additional data in the manifold. Fig. 7 shows sample segmentation results.

By assessing the gain in accuracy when adding the higher-order terms in the energy formulation, the overlap error is reduced by 2.2, which is a statistically significant difference () to the second-order MRF model. In order to evaluate the robustness of the method, we performed additional experiments by measuring segmentation accuracy with 4 different levels of Gaussian noise added to the input images. Fig. 8 demonstrates that the proposed methodology possesses increased tolerance to noise compared to the other methods. Furthermore, the inclusion of the higher-order term improves the robustness to standard pairwise CRFs. Finally, segmentation accuracy was assessed by computing the amount of voxels classified in the wrong group within the zones surrounding the tumor contours, instead of the entire volume. A comparison between different segmentation methods is shown in Fig. 9. Hence, the evolution of the voxelwise classification errors decreases as the size of the evaluated patches around the tumor increases. The patch sizes did not affect computation times.

## 4 Discussion

We proposed a new, accurate and adaptable method for liver tumor segmentation. This was achieved through a higher-order fully-connected graphical model that was optimized using potential functions defined in a discriminant Grassmannian manifold. This increased the ability to isolate diseased liver regions from normal tissue as compared to state of the art CRF or SVM techniques. A thorough validation on subjects with metastatic liver tumors was undertaken to evaluate the performance of the method, leading to promising results. Based on the comparative measurements to ground-truth delineations, the resulting segmentations showed to be accurate for metastatic liver tumors, while yielding similar results to interobserver variability. The method’s performance was tested on two distinct collections of tumors. For the clinical dataset provided by two hospitals, the proposed framework achieved slightly better results than other methods used for liver tumor segmentation. The average overlap error was improved compared to recently published method using local Gaussian distribution fitting. The evaluation with the challenge dataset illustrated the fact the approach yielded encouraging results, even on images with little contrast, noisy data or in cases where the tumor boundaries were not as clear.

The boundaries of the tumors represent the greatest source of discrepancy between manual and automated methods, as it remains difficult to set hard limits for the segmentation area. In some case, undersegmentation was present when the transition in voxel intensities between tumor and normal liver tissue deviates significantly from the learned manifold. In this case, the discriminant framework would classify these points as normal since the projection matrix tends to map unknown sample points closer to the physiological normal class. On the other hand, in the experiments presented in this paper, the manifolds were representative of the data distribution in order to model intensity variations that were present in the testing dataset. While we did not experiment testing the method on the clinical dataset using a model trained on the challenge dataset, it is not expected to penalize the performance of the segmentation algorithm as both datasets are similar in tumor type and imaging modality. However testing on non-enhanced CT images using manifold embeddings trained on contrast-enhanced CT is a current limitation. The method was also able to handle various shapes of liver tumors, such as ellipsoid or elongated shapes. Another source of possible difference is the distinction between necrotic and active tumor regions, which produces a greater level variability among raters in the challenge dataset.

The method relies on a set of parameters which may affect the performance of the algorithm, such as the intrinsic dimension of the discriminant manifold. A careful choice for this parameter can significantly optimize the performance of the method. Our experiments chose this parameter carefully by selecting the dimension yielding the lowest residual variance in reconstruction error. The within and in-between similarity graphs also relies on the size of their respective neighbourhoods, which affects the compactness or spread of the points on the manifold. Another set of important parameters are the weights applied to combined kernel functions, which controls the importance of both the projection and canonical correlation kernels. These kernels are often application specific and can be determined once during the training phase. A limitation of the method is the relatively high computational cost for learning the discriminant manifold embeddings during the learning process, which can take a few hours of computation. However this process is performed offline and can potentially be parallelized.

In the datasets used in this study, liver metastases were a result of primary colorectal cancer. The results for lesions mainly caused by other types of cancer might differ. The evaluation on the multiple raters segmenting the data has shown that, overall, the proposed method significantly reduces both inter- and intra-observer variability of volumetric measurements. This introduces a certain level coherence in the delineation of liver metastases, and can be of significant value for the application of the RECIST criterion, in order to obtain repeatable results through multiple patient scans. For liver metastases, a limitation of the evaluation is that the intra-observer variability for the delineation of the tumor between raters can be higher to the difference between unsupervised and manual approaches. Another observation was that as the lesion increases in size, the variability between the segmentations is reduced, mainly because smaller differences in the segmentations have a lower relative influence.

An improvement of the approach will be to perform a series of experiments to determine the accuracy of the resulting model when providing a more localized initialization by automatically detect position of potential metastases with the segmented liver shape and limiting the number of potential false-positives generated by the algorithm. Furthermore, the model can be improved to enable the extraction of multiple sub-regions within the tumor (necrotic, active tumor), as well as include the capability of adaptation to unconventional tumor configurations, such as H-shaped tumors and more diffuse or infiltrating metastases. Integration of advanced graphical models in the local mesh adaptation process can potentially increase the delineation of detailed tumor boundaries. Translating the algorithm to other modalities, such as MRI and 3D ultrasound for interventional use, is also envisioned for interventional guidance of RFA procedures. Finally, a multi-site evaluation of this technique would be beneficial towards wide-spread clinical adoption.

The proposed framework based on discriminant manifolds is general, and can be extended to other applications in vision (segmentation) and medical imaging (Alzheimer’s diagnosis, clinical decision support systems), to accommodate for modelling object tissue variations of similar shape and size, where disease detection can be of significant clinical value. The proposed method promises to facilitate and accelerate quantitative image analysis for clinical diagnosis with MRI or CT. Encoding prior knowledge relating to shape representation is a natural extension of the proposed formulation which can allow to ensure consistency in the final shape segmentation. Finally, the approach can also be extended to other pathologies such as glioblastoma multiforme by segmenting high-grade gliomas in MRI. In this case, the manifold potentials could be trained to characterize different necrotic, active and edema regions in the tumor from multi-parametric MRI, and help to automatically discriminate between normal and diseased brain tissue. Indeed, the framework easily allows to add multiple classes in the discriminant learning process, not only a two-class problem.

Acknowledgments
The authors would like to acknowledge Dr. Nadine Abi-Jaoudeh from the National Institutes of Health for providing liver CT images. Research funding was supported in part by the Canada Research Chairs and the NSERC Discovery grant program and the Fonds ed recherch du Quebec - Santé (Cancer Award 26993).

References