Non-Rigid Volume to Surface Registration using a Data-Driven Biomechanical Model

05/29/2020 ∙ by Micha Pfeiffer, et al. ∙ 0

Non-rigid registration is a key component in soft-tissue navigation. We focus on laparoscopic liver surgery, where we register the organ model obtained from a preoperative CT scan to the intraoperative partial organ surface, reconstructed from the laparoscopic video. This is a challenging task due to sparse and noisy intraoperative data, real-time requirements and many unknowns - such as tissue properties and boundary conditions. Furthermore, establishing correspondences between pre- and intraoperative data can be extremely difficult since the liver usually lacks distinct surface features and the used imaging modalities suffer from very different types of noise. In this work, we train a convolutional neural network to perform both the search for surface correspondences as well as the non-rigid registration in one step. The network is trained on physically accurate biomechanical simulations of randomly generated, deforming organ-like structures. This enables the network to immediately generalize to a new patient organ without the need to re-train. We add various amounts of noise to the intraoperative surfaces during training, making the network robust to noisy intraoperative data. During inference, the network outputs the displacement field which matches the preoperative volume to the partial intraoperative surface. In multiple experiments, we show that the network translates well to real data while maintaining a high inference speed. Our code is made available online.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

a)

b)
Figure 1:

Non-rigid registration using our CNN. Given the intraoperative point cloud of a partial liver surface and the preoperative liver mesh (a), our network estimates a displacement field, which deforms the preoperative volume to match the intraoperative surface (b). Displacement magnitude is encoded in shades of blue.

In navigated surgical interventions, the aim is to aid surgeons in finding structures of interest - such as tumors, vessels or planned resection lines. Often, a detailed, accurate and highly informative preoperative computer tomography (CT) scan is available and the challenge is to align this model with the intraoperative scene. Whenever deforming soft-tissue is involved, this requires a non-rigid registration. However, usually, only limited, sparse and noisy data can be acquired during the intervention. Furthermore, it is usually very difficult to find one-to-one correspondences between the pre- and intraoperative data, since they are subject to different types of noise and can look substantially different. Together with the large deformation, only partially visible surface and many unknown parameters - such as the organ’s elastic properties, acting forces and boundary conditions - this makes the soft-tissue registration a challenging problem.

This work explores a deep-learning based approach to performing a non-rigid organ registration. We focus on laparoscopic liver surgery and register a given preoperative liver volume mesh to an intraoperative, partial surface of the same organ (obtained from a laparoscopic stereo video stream). We use a fully convolutional neural network (CNN), which analyses the two meshes and outputs a displacement field to register the preoperative organ volume to the intraoperative surface (see Fig.

1). To reach this aim, the network must learn to find a plausible solution for the surface correspondence problem, while at the same time learning about biomechanical deformation in order to constrain itself to physically realistic solutions. For training, a deformed intraoperative state of each preoperative mesh is simulated, using the Finite Element Method (FEM). We use synthetically generated, random organ-like meshes, which allows the trained network to generalize to new patients without the need to re-train for every liver. We call the approach Volume-To-Surface Registration Network (V2S-Net) and publish the code as well as the pretrained network online111https://gitlab.com/nct_tso_public/Volume2SurfaceCNN.

1.0.1 Related Work

The application of deep learning to simulate organ deformation has recently received a lot of attention, mainly due to the very low inference times [Mendizabal2019, MendizabalTagliabue2019, PellicerValero2020, Pfeiffer2019]. It has been shown that these data-driven models can achieve a high accuracy [Mendizabal2019, PellicerValero2020], can deal with partial surface information [Brunet2019, Pfeiffer2019] and can even learn to deform real organs after training on synthetic data only [Pfeiffer2019]. However, all of the mentioned methods require the use of boundary conditions similar to the FEM. These are very difficult to obtain in a real surgical setting, since, for example, forces are extremely difficult to measure and estimating the surface displacement would require known surface correspondences.

On the other hand, the machine learning community has developed a large number of data-driven feature descriptors to efficiently describe and register 3D structures

[DeepLearning3DReview2019, 3DKeypointDescriptors2018, PointNet2017]. While this shows the ability of neural networks to interpret 3D data, in the case of organs, the lack of distinct features and large organ deformation requires the incorporation of soft-tissue mechanics in order to find correct correspondences.

In design, our method is similar to the work of Suwelack et al. [Suwelack2014], who propose to morph the preoperative volume mesh into the intraoperative partial surface by simulating electric charge on the surfaces and solving the problem using the FEM. Similar to our work, they use the two meshes as input and output a displacement field for the preoperative mesh. However, like many previous approaches, their method requires manual assignment of boundary conditions and parameters, can become unstable if wrong values are chosen and their displacement estimation is much slower than ours.

2 Methods

The aim of the proposed V2S-Net is to learn to recover a displacement field which deforms the input volume in such a way that it is well aligned with the partial surface, when given only the preoperative volume mesh and a partial intraoperative surface as input. We train the network on synthetic data, since a real data set consisting of volume meshes as well as known displacement fields is not available. In addition to simulating deformations, we also generate the preoperative 3D meshes randomly, to ensure that the network will translate directly to new, unseen liver meshes.

a)

b)

c)

d)

e)

V2S-Net
Figure 2: a) Random preoperative volume mesh , b) simulated intraoperative state (green) and partial surface (orange), c) signed distance field of preoperative volume , d) distance field of partial surface , e) known displacement U (magnitude), which the V2S-Net learns to predict from the two distance fields. Afterwards, this displacement field can be used to infer the position of internal structures, such as vessels and tumors.

2.1 Data Generation

2.1.1 Simulation

A random 3D surface mesh is generated by first creating an icosphere and then applying a set of extrusion, boolean, remeshing and smoothing operators. We use Gmsh [Gmsh2009] to fill the surface mesh with tetrahedral elements. The resulting volume mesh is considered to be our preoperative volume organ mesh (Fig. 2, a). Next, up to three random forces (max. magnitude 1.5 N) and a zero-displacement boundary condition are assigned to random areas of the surface of the mesh. The steady-state deformation is calculated by the Elmer [Elmer2013] finite element solver, using a neo-Hookean hyperelastic material model with a random Young’s Modulus (2 kPa to 5 kPa) and Poisson’s ratio of 0.35. The resulting deformed volume acts as the intraoperative state of the organ (Fig. 2 b). For every vertex in

, we now know the displacement vector that needs to be applied in order to reach the intraoperative state

, resulting in the displacement field .

We extract a random surface patch of this deformed volume mesh. To simulate intraoperative noise and difference in imaging modalities, we resample this patch, delete random vertices and add uniform random displacements to the position of every vertex. Furthermore, random parts of the patch are deleted to simulate areas where surrounding tissues (like the falciform ligament) occlude the surface from the perspective of the laparoscope. The result is our intraoperative partial surface mesh .

The use of random values sometimes leads to the generation of meshes for which the creation of tetrahedral elements fails or simulations for which the finite element solver does not find a solution. These samples are discarded. We also discard samples where the maximum displacement is larger than 20 cm or the amount of visible surface is less than 10%, since we assume these cases to be unrealistic.

2.1.2 Voxelization

To pass the meshes to the network, we represent them in the form of distance fields on a regular grid. For this, we define a grid of points. At each point, we calculate the distance to the nearest surface point of the preoperative volume as well as the intraoperative surface . For the preoperative mesh, we flip the sign of all grid points that lie within the organ volume, resulting in a signed distance field for the preoperative and a distance field for the intraoperative mesh (see Fig. 2 c and d).

Additionally, we interpolate the target displacement field

into the same grid with a gaussian kernel, resulting in an interpolated vector field U. For points outside the preoperative volume, U is set to .

We use the outlined process to generate samples, which are split into training data (90%) and validation data (10%). By flipping each training sample along each combination of X-, Y- and Z-axes, we further increase the amount of training data by a factor of eight, resulting in roughly samples.

2.2 Network Architecture and Training

To estimate the displacement field U, the network is given the full preoperative volume and the partial intraoperative surface . Formally, we want our network to estimate the function:

(1)

Similar to [Mendizabal2019] and [Pfeiffer2019], we choose a fully convolutional architecture with 3D convolutions, an encoder-decoder structure and skip connections to allow for an abstract low-resolution representation while preserving high-resolution details. The precise architecture is shown in the supplementary material. Following [Pfeiffer2019], we let the network output the displacement field at multiple resolutions . These are compared to the (downsampled) target displacement fields using the mean absolute error . We find that this process speeds up loss convergence during training. The final loss is a weighted sum of these errors:

(2)

We train the network with a one cycle learning rate scheduler [superconvergence2017] and the AdamW optimizer [adamW2017]

for 100 epochs, after which the mean registration error on the validation data has converged to roughly 6 mm.

3 Experiments and Results

Since our network is trained on randomly generated meshes, it is vital to test on real patient data. However, reference data for real laparoscopic interventions is very difficult to obtain, since interventional CT scans during laparoscopy are very limited. To nevertheless capture all necessary aspects of the registration process, we conduct three experiments: one experiment with simulated liver deformations, one with real human liver deformations under breathing motion and one qualitative experiment on data from a laparoscopic setting. In all experiments, the used preoperative liver meshes were extracted from patient CT scans automatically using a segmentation CNN [LiTS]. The process of estimating the displacement field (including upload to the GPU) takes roughly 130 ms.

3.1 Liver Deformations (In Silico)

We generate a synthetic dataset as described in section 2.1. However, instead of generating random meshes we use the mesh of a patient’s liver and simulate 1725 deformations and partial intraoperative surfaces. This new dataset is not used for training and thus allows us to test how our method translates to a real liver mesh: We apply the network to all samples and calculate the displacement error for every point in the estimated displacement fields (results in Fig. 3). When the visible surfaces are large enough, registration errors tend to be small, even when the target deformations become large. Smaller visible areas lead to more interesting results, for example when the network cannot infer which part of the volume should be matched to the partial surface (see Fig. 4).

Figure 3: Mean displacement error of each sample over the mean target displacement. As expected, the registration error tends to increase with target displacement and decrease with larger visible surfaces. With few exceptions, samples with a displacement error larger than 1 cm are those with very little visible surface.

0 cm

6.4 cm

a)

b)
Figure 4:

From a real preoperative liver mesh (grey), a deformed intraoperative liver state (purple) is simulated. Given the preoperative mesh and a partial intraoperative surface (yellow), the V2S-Net estimates the deformed state (images 2 and 4). The magnitude of the error (red) indicates a successful registration for case a) (max. error 1.5 cm) and an unsuccessful registration for case b) (max. error 6.4 cm). When the randomly picked visible surface contains no distinct features (as is the case in b), or captures no areas of significant deformation, the network may estimate an incorrect but plausible deformation, resulting in the outliers in Fig.

3. This suggests that it is more important which areas of the liver are visible than the size of the visible surface.

3.2 Liver with Breathing Deformation (In Vivo)

During breathing motion, the human liver moves and deforms considerably. To assess whether our network translates to real human liver deformation, we evaluate on human breathing data. We extract liver meshes from two CT scans (one showing an inhaled and one an exhaled state) and let these represent the preoperative volume and intraoperative surface . We search for clearly visible vessel bifurcations and mark their positions in both scans. Given the distance fields of the two meshes, the network then estimates the displacement of every voxel in the inhaled state. The resulting displacement field is interpolated to the positions of the marked bifurcation points (gaussian interpolation kernel, radius 1 cm) and is used to displace them. We carry out this experiment for two patients. Average displacements and remaining errors after registration are shown in Table 1 and the result for Patient 1 is depicted in Fig. 5.

Avg. displacement (mm) Avg. error (mm)
Patient 1 21.4 (max: 31.1) 5.7 (max: 9.8)
Patient 2 28.3 (max: 32.7) 4.8 (max: 5.4)
Table 1: Displacement of marked bifurcations (due to inhaling) and remaining errors after registration.
Figure 5: Registration for Patient 1. The liver is deformed by breathing motion (inhaled and exhaled states, left). Four vessel bifurcations are marked in the first CT scan (black) and in the second CT scan (red). The network uses the surfaces to calculate a displacement field which registers the two liver states to each other (right). Applying this displacement field to the markers indicates a good approximation of the internal deformation. Results for Patient 2 can be found in the supplementary material.

3.3 Laparoscopic Liver Registration (In Vivo)

To validate whether our method works for our target task of navigated laparoscopy, we perform a qualitative evaluation on human in-vivo data acquired from a da Vinci (Intuitive Surgical) stereo laparoscope. We first reconstruct the intraoperative surface from the video stream. For this, a 10 second camera-sweep of the liver is performed at the beginning of the surgery. Since we lack positional information of the laparoscope, the OrbSLAM2 [ORBSLam2] algorithm is used to estimate the camera pose for each frame. Furthermore, the disparity between each pair of stereo frames is estimated (using a disparity CNN [disparityCNN2019]) and a semantic segmentation is performed on each left camera frame to identify pixels which show the liver (using a segmentation CNN [Pfeiffer2019Generating]). We reproject the pixel information into the scene using the disparity, camera calibration parameters and estimated camera pose, obtaining a 3D point cloud with color and segmentation information. While the camera moves, points which are close together and are similar in hue are merged together. After the sweep, we discard points if they have not been merged multiple times (i.e. points which were not seen from multiple view points) or if they were not segmented as liver in at least 70% of the frames in which they were seen. Additionally, we use a moving least squares filter (radius 0.5 cm) to smoothen the surface. After a manual rigid alignment of and , the distance fields for the pre- and intraoperative data are calculated and our network is used to estimate a displacement of the liver volume. Qualitative results are shown in Fig. 1 and in the supplementary material.

4 Conclusion

In this work, we have shown that a CNN can learn to register a full liver surface to a deformed partial surface of the same liver. The network was never trained on a real organ or real deformation during the training process, and yet it learned to solve the surface correspondence problem as well as the underlying biomechanical constraints.

Since our method works directly on the surfaces, no assignment of boundary conditions and no search for correspondences is necessary. The breathing motion experiment shows that, even though the segmentation process creates some artifacts in the surfaces, the V2S-Net finds a valid solution. The network was able to find the displacement without the need for a prior rigid alignment, likely because the full intraoperative surface was used. In cases with less visible surface, we find that the solution depends on the prior rigid alignment, which should be investigated further in future work.

In our method, we outsource the complex and time consuming simulation of soft-tissue behavior to the data generation stage. This could be further exploited by adding additional information to the simulations, such as inhomogeneous material properties, surrounding and connecting tissues and more complex boundary conditions. Where these properties are measurable for a patient, they could be passed to the network as additional input channels.

Our results show that there may be cases where the solution is ambiguous, for example when too little information is given and the network must guess how the hidden side of the liver is deformed. This issue is likely inherent to the ill-posed laparoscopic registration problem in itself and not confined to our method. However, in contrast to conventional registration methods, neural networks can estimate how confident they are of a solution [BayesianDL2016]. This probabilistic output could be used to assess how a solution should be interpreted and could give additional information to the surgeon.

References