Discriminative Parameter Estimation for Random Walks Segmentation: Technical Report

06/05/2013 ∙ by Pierre-Yves Baudin, et al. ∙ 0

The Random Walks (RW) algorithm is one of the most e - cient and easy-to-use probabilistic segmentation methods. By combining contrast terms with prior terms, it provides accurate segmentations of medical images in a fully automated manner. However, one of the main drawbacks of using the RW algorithm is that its parameters have to be hand-tuned. we propose a novel discriminative learning framework that estimates the parameters using a training dataset. The main challenge we face is that the training samples are not fully supervised. Speci cally, they provide a hard segmentation of the images, instead of a proba-bilistic segmentation. We overcome this challenge by treating the optimal probabilistic segmentation that is compatible with the given hard segmentation as a latent variable. This allows us to employ the latent support vector machine formulation for parameter estimation. We show that our approach signi cantly outperforms the baseline methods on a challenging dataset consisting of real clinical 3D MRI volumes of skeletal muscles.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Literature Survey on the State of the Art Algorithms in Muscle Segmentation

In the following, we present the principal methods addressing the segmentation of skeletal muscles.

Muscle Segmentation using Simplex meshes with medial representations

A skeletal muscle segmentation method was presented in [8] based on simplex meshes [6]. Considering a 3D surface model, simplex meshes are discrete meshes where each vertex has exactly 3 neighbors. Having a constant connectivity allows to simply parametrize the location of one vertex with respect to its neighbors, and thus parametrize deformation of the shape – translation, rotation, scaling – in a local manner. Indeed, the location of a pixel, denoted as can be expressed as a linear combination of the locations of the three neighbors plus a local elevation term parallel to the local normal: As a result, many local measurements – including curvature and cell surface – can be computed efficiently and global energy terms enforcing local constraints come up naturally.

Here, the authors impose local smoothing via curvature averaging, which does not tend to reduce the surface like 1-order operators typically do. Prior knowledge is imposed by constraining the local scale changes on the elevation parameter with respect to a reference shape. Denoting the surface of the triangle formed by the three neighbors of a pixel as , given the reference shape parameters , the new location of the considered pixel is expressed as:


where is a parameter which sets the amount of allowed local deformation: with this definition is similitude invariant; with this definition is invariant through rigid transformations only. The model is attached to the target image through either gradient norm maximization in the direction of the gradient at the location of the vertices, or maximization of similarities between the reference and the target images at the vertices location.

A medial representation – similar to the M-reps [13] – is combined with the simplex parametrization to exploit the specific tubular shapes of the skeletal muscles. Medial vertices are added to the model, constrained to remain on the medial axis of the tubular objects. This is achieved by connecting the new vertices to the surface vertices through spring-like forces. This constrains the global structure to resemble its initial reference shape, thus acting as a global shape prior. This medial axis representation also allows efficient collision handling. The model is fit to the image through an iterative process of successive local evolutions. Such model appear to always yield a valid solution, sometimes at the price of an excessive regularization or lack of adaptability to the specifics of the target image. paragraphMuscle Segmentation using Deformable Models and Shape Matching

Muscle Segmentation using Deformable Models and Shape Matching

A shape prior for muscle segmentation in 3D images was presented in [9], deriving from a computer animation technique, called shape matching, used to efficiently approximate large soft-tissue elastic deformations. This method was applied to muscle segmentation with some success. In this approach, discrete meshes are used to parametrize the moving surface. Let be the vector containing the initial position of the control points of the parametric surface. Clustering is performed on such that each cluster contains at least a certain number of vertices (set by the user). During segmentation, the evolution of the active surface is performed according to the following iterative procedure:

  1. Shift vertices according to the external force: . The external “force” is computed as the maximal gradient search in the gradient direction.

  2. Regularize vertex positions:

    1. Compute rigid registration for each clusters:

    2. Average target position for each vertex:


Single reference prior models are convenient in that they require only one annotated example of the objects of interest. However, when segmenting a class of objects whose shape varies a lot, such approach becomes too constraining and does not allow the model to adopt valid shapes which are too different from the single reference.

Muscle segmentation using a hierarchical statistical shape model

A hierarchical prior model using Diffusion Wavelets was proposed to segment organs in [7] – including one calf muscle – in MRI. This model builds on the formulation of the ASM [4], using a different basis for the subspace of valid solutions. One of the main drawbacks of ASMs, is that they often require a large number of training data in order to obtain relevant decomposition modes. Indeed, some non-correlated shape features – such as global and local shape configurations – are often modeled by the same deformation modes. Thus, desired shape behaviors are often mixed with unwanted shape behaviors when optimizing the shape parameters for segmenting a new image. The hierarchical approach allows to uncorrelate small and large scale shape behaviors. Moreover, the presented method also uncorrelates long-range shape behaviors, thus ensuring that deformation mode are spatially localized.

We give a brief summary of this method. First, a graph is built on the set of landmarks: is the set of nodes and each landmark corresponds to a node in ; is the set of edges, whose weights are determined through a statistical analysis of the mutual relations between the landmarks in the training set (cf. Shape Maps [12]). As a result, landmarks with independent behaviors will be connected by edges with a small weight, whereas nodes with strongly related behaviors – such as neighboring points – will be connected by large weight edges.

Second, a Diffusion Wavelet decomposition of is performed. This process involves computing the diffusion operator of graph , which is the symmetric normalized Laplace-Beltrami operator, and computing and compressing the dyadic powers of . The output of this decomposition is a hierarchical orthogonal basis for the graph, whose vectors correspond to different graph scales; considering the vector of landmark positions when decomposed on the new basis:


global deformations – i.e. global relations between all the nodes – are controlled by some of the coefficients in , while local interactions – i.e. local interactions between close-by nodes – are controlled by some other coefficients in . Projecting all the training examples onto this new basis, a PCA is performed at each scale of the decomposition. Finally, during the segmentation process, the landmarks are positioned on the target image in an iterative manner: 1) the position of the landmarks is updated according to a local appearance model; 2) they are projected into the hierarchical subspace defined previously.

Muscle segmentation using a continuous region model

A region-based segmentation method, proposed in [5], was extended to multi-label segmentation in [2], and applied to skeletal muscle segmentation [1]

. Before performing the PCA on the training samples, an Isometric Log-Ratio (ILR) transform is applied to the assignment vectors. The reason for using this transform is that multi-label segmentation requires to have probabilities at all times, which the previous method does not achieve. Here, the PCA is performed in the ILR space and its output is projected back into the initial probability space. Denoting

a segmentation in the subspace of valid solution spanned by the PCA in the ILR space, the following functional is proposed:


where is an intensity prior functional for separating muscle voxels from background voxels, and is and edge-map of the target image such that the energy is minimal when the boundaries of the model match the edges in the image.

2 The Random Walks Algorithm

The Random Walks algorithm is graph-based: consider a graph , where is a set of nodes – corresponding to each voxel in the 3d image – and is a set of edges – one per pair of adjacent voxels. Let us also denote a set of labels – one per object to segment – as .

The aim of the Random Walks Algorithm [10] is to compute an assignment probability of all voxels to all labels. These probabilities depend on: i) contrast between adjacent voxels, ii) manual – thus deterministic – assignments of some voxels, called seeds, and iii) prior assignment probabilities.

The probabilities, contained in vector , can be obtained by minimizing the following functional:

which is a linear combination of Laplacians and prior terms.

The Laplacian matrices contain the contrast terms. Its entries are of the form:


Here, designates the weight of edge . It is usually computed as follows:


where is the intensity of voxel .

In our experiments, we used three different Laplacians using this formnulation, with three values of

: 50, 100 and 150 (with the image voxel values normalized with their empirical standard deviation).

We also implemented the lesser used alternate formulation:


which we employed in one additional Laplacian term with and for comparison purposes, since the selected values do give good results on their own.

Since the objective function is quadratic in , its minimum can be computed by minimizing a linear system. The quadratic term, composed of a sum of Laplacians and diagonal matrices due to the prior term, is very sparse and has a specific structure due to the fact that only adjacent voxels are connected with an edge.

Given the size of the problem (several millions of variables), this system has to be solved with iterative methods, such as Conjugate Gradient. The specific structure of the problem and the existence of parallelized algorithms (such as multigrid Conjugate Gradient) allow for an efficient optimization. For instance, our own implementation takes less than 20s for volumes of size on a regular desktop machine.

3 Derivation of the Latent SVM Upper Bound

Given a dataset , which consists of inputs and their hard segmentation we would like to estimate parameters

such that the resulting inferred segmentations are accurate. Here, the accuracy is measured using the loss function

. Formally, let denote the soft segmentation obtained by minimizing the energy functional for the -th training sample, that is,


We would like to learn the parameters such that the empirical risk is minimized over all samples in the dataset. In other words, we would like to estimate the parameters such that


The above objective function is highly non-convex in , which makes it prone to bad local minimum solutions. To alleviate this deficiency, the latent SVM formulation upper bounds the risk for a sample as follows:


The first inequality follows from the fact that the prediction has the minimum possible energy (see equation 9). Thus, its energy has to be less than or equal to the energy of any compatible segmentation with . The second inequality is true since it replaces the loss augmented energy of the prediction with the minimum loss augmented energy.

This inequality leads to the following minimization problem:


where is a regularization term, preventing overfitting the parameters to the training data.

4 Dual-Decomposition Algorithm for the ACI

Briefly, dual decomposition allows us to iteratively solve a convex optimization problem of the form


At each iteration it solves a set of slaves problems


where are the dual variables satisfying . The dual variables are initialized as , and updated at iteration as follows:


where is the learning rate at iteration . Under fairly general conditions, this iterative strategy converges to the globally optimal solution of the original problem, that is, . We refer the reader to [3, 11] for details.

In order to specify our slave problems, we divide the set of voxels into subsets , such that each pair of neighboring voxels appear together in exactly one subset . Given such a division of voxels, our slave problems correspond to the following:


where is the Laplacian corresponding to the voxels . The prior energy functions modify the original prior by weighing each voxel by the reciprocal of the number of subsets that contain . In other words, the prior term for each voxel is multiplied by .

The slave problems defined above can be shown to provide a valid reparameterization of the original problem:


By using small subsets we can optimize each slave problem in every iteration using a standard quadratic programming solver. In our experiments, we used the Mosek solver. To the best of our knowledge, this is the first application of dual decomposition to solve a probabilistic segmentation problem under linear constraints.