In navigated surgical interventions, the aim is to aid surgeons in finding structures of interest - such as tumors, vessels or planned resection lines. Often, a detailed, accurate and highly informative preoperative computer tomography (CT) scan is available and the challenge is to align this model with the intraoperative scene. Whenever deforming soft-tissue is involved, this requires a non-rigid registration. However, usually, only limited, sparse and noisy data can be acquired during the intervention. Furthermore, it is usually very difficult to find one-to-one correspondences between the pre- and intraoperative data, since they are subject to different types of noise and can look substantially different. Together with the large deformation, only partially visible surface and many unknown parameters - such as the organ’s elastic properties, acting forces and boundary conditions - this makes the soft-tissue registration a challenging problem.
This work explores a deep-learning based approach to performing a non-rigid organ registration. We focus on laparoscopic liver surgery and register a given preoperative liver volume mesh to an intraoperative, partial surface of the same organ (obtained from a laparoscopic stereo video stream). We use a fully convolutional neural network (CNN), which analyses the two meshes and outputs a displacement field to register the preoperative organ volume to the intraoperative surface (see Fig.1). To reach this aim, the network must learn to find a plausible solution for the surface correspondence problem, while at the same time learning about biomechanical deformation in order to constrain itself to physically realistic solutions. For training, a deformed intraoperative state of each preoperative mesh is simulated, using the Finite Element Method (FEM). We use synthetically generated, random organ-like meshes, which allows the trained network to generalize to new patients without the need to re-train for every liver. We call the approach Volume-To-Surface Registration Network (V2S-Net) and publish the code as well as the pretrained network online111https://gitlab.com/nct_tso_public/Volume2SurfaceCNN.
1.0.1 Related Work
The application of deep learning to simulate organ deformation has recently received a lot of attention, mainly due to the very low inference times [Mendizabal2019, MendizabalTagliabue2019, PellicerValero2020, Pfeiffer2019]. It has been shown that these data-driven models can achieve a high accuracy [Mendizabal2019, PellicerValero2020], can deal with partial surface information [Brunet2019, Pfeiffer2019] and can even learn to deform real organs after training on synthetic data only [Pfeiffer2019]. However, all of the mentioned methods require the use of boundary conditions similar to the FEM. These are very difficult to obtain in a real surgical setting, since, for example, forces are extremely difficult to measure and estimating the surface displacement would require known surface correspondences.
On the other hand, the machine learning community has developed a large number of data-driven feature descriptors to efficiently describe and register 3D structures[DeepLearning3DReview2019, 3DKeypointDescriptors2018, PointNet2017]. While this shows the ability of neural networks to interpret 3D data, in the case of organs, the lack of distinct features and large organ deformation requires the incorporation of soft-tissue mechanics in order to find correct correspondences.
In design, our method is similar to the work of Suwelack et al. [Suwelack2014], who propose to morph the preoperative volume mesh into the intraoperative partial surface by simulating electric charge on the surfaces and solving the problem using the FEM. Similar to our work, they use the two meshes as input and output a displacement field for the preoperative mesh. However, like many previous approaches, their method requires manual assignment of boundary conditions and parameters, can become unstable if wrong values are chosen and their displacement estimation is much slower than ours.
The aim of the proposed V2S-Net is to learn to recover a displacement field which deforms the input volume in such a way that it is well aligned with the partial surface, when given only the preoperative volume mesh and a partial intraoperative surface as input. We train the network on synthetic data, since a real data set consisting of volume meshes as well as known displacement fields is not available. In addition to simulating deformations, we also generate the preoperative 3D meshes randomly, to ensure that the network will translate directly to new, unseen liver meshes.
2.1 Data Generation
A random 3D surface mesh is generated by first creating an icosphere and then applying a set of extrusion, boolean, remeshing and smoothing operators. We use Gmsh [Gmsh2009] to fill the surface mesh with tetrahedral elements. The resulting volume mesh is considered to be our preoperative volume organ mesh (Fig. 2, a). Next, up to three random forces (max. magnitude 1.5 N) and a zero-displacement boundary condition are assigned to random areas of the surface of the mesh. The steady-state deformation is calculated by the Elmer [Elmer2013] finite element solver, using a neo-Hookean hyperelastic material model with a random Young’s Modulus (2 kPa to 5 kPa) and Poisson’s ratio of 0.35. The resulting deformed volume acts as the intraoperative state of the organ (Fig. 2 b). For every vertex in
, we now know the displacement vector that needs to be applied in order to reach the intraoperative state, resulting in the displacement field .
We extract a random surface patch of this deformed volume mesh. To simulate intraoperative noise and difference in imaging modalities, we resample this patch, delete random vertices and add uniform random displacements to the position of every vertex. Furthermore, random parts of the patch are deleted to simulate areas where surrounding tissues (like the falciform ligament) occlude the surface from the perspective of the laparoscope. The result is our intraoperative partial surface mesh .
The use of random values sometimes leads to the generation of meshes for which the creation of tetrahedral elements fails or simulations for which the finite element solver does not find a solution. These samples are discarded. We also discard samples where the maximum displacement is larger than 20 cm or the amount of visible surface is less than 10%, since we assume these cases to be unrealistic.
To pass the meshes to the network, we represent them in the form of distance fields on a regular grid. For this, we define a grid of points. At each point, we calculate the distance to the nearest surface point of the preoperative volume as well as the intraoperative surface . For the preoperative mesh, we flip the sign of all grid points that lie within the organ volume, resulting in a signed distance field for the preoperative and a distance field for the intraoperative mesh (see Fig. 2 c and d).
Additionally, we interpolate the target displacement fieldinto the same grid with a gaussian kernel, resulting in an interpolated vector field U. For points outside the preoperative volume, U is set to .
We use the outlined process to generate samples, which are split into training data (90%) and validation data (10%). By flipping each training sample along each combination of X-, Y- and Z-axes, we further increase the amount of training data by a factor of eight, resulting in roughly samples.
2.2 Network Architecture and Training
To estimate the displacement field U, the network is given the full preoperative volume and the partial intraoperative surface . Formally, we want our network to estimate the function:
Similar to [Mendizabal2019] and [Pfeiffer2019], we choose a fully convolutional architecture with 3D convolutions, an encoder-decoder structure and skip connections to allow for an abstract low-resolution representation while preserving high-resolution details. The precise architecture is shown in the supplementary material. Following [Pfeiffer2019], we let the network output the displacement field at multiple resolutions . These are compared to the (downsampled) target displacement fields using the mean absolute error . We find that this process speeds up loss convergence during training. The final loss is a weighted sum of these errors:
We train the network with a one cycle learning rate scheduler [superconvergence2017] and the AdamW optimizer [adamW2017]
for 100 epochs, after which the mean registration error on the validation data has converged to roughly 6 mm.
3 Experiments and Results
Since our network is trained on randomly generated meshes, it is vital to test on real patient data. However, reference data for real laparoscopic interventions is very difficult to obtain, since interventional CT scans during laparoscopy are very limited. To nevertheless capture all necessary aspects of the registration process, we conduct three experiments: one experiment with simulated liver deformations, one with real human liver deformations under breathing motion and one qualitative experiment on data from a laparoscopic setting. In all experiments, the used preoperative liver meshes were extracted from patient CT scans automatically using a segmentation CNN [LiTS]. The process of estimating the displacement field (including upload to the GPU) takes roughly 130 ms.
3.1 Liver Deformations (In Silico)
We generate a synthetic dataset as described in section 2.1. However, instead of generating random meshes we use the mesh of a patient’s liver and simulate 1725 deformations and partial intraoperative surfaces. This new dataset is not used for training and thus allows us to test how our method translates to a real liver mesh: We apply the network to all samples and calculate the displacement error for every point in the estimated displacement fields (results in Fig. 3). When the visible surfaces are large enough, registration errors tend to be small, even when the target deformations become large. Smaller visible areas lead to more interesting results, for example when the network cannot infer which part of the volume should be matched to the partial surface (see Fig. 4).
3.2 Liver with Breathing Deformation (In Vivo)
During breathing motion, the human liver moves and deforms considerably. To assess whether our network translates to real human liver deformation, we evaluate on human breathing data. We extract liver meshes from two CT scans (one showing an inhaled and one an exhaled state) and let these represent the preoperative volume and intraoperative surface . We search for clearly visible vessel bifurcations and mark their positions in both scans. Given the distance fields of the two meshes, the network then estimates the displacement of every voxel in the inhaled state. The resulting displacement field is interpolated to the positions of the marked bifurcation points (gaussian interpolation kernel, radius 1 cm) and is used to displace them. We carry out this experiment for two patients. Average displacements and remaining errors after registration are shown in Table 1 and the result for Patient 1 is depicted in Fig. 5.
|Avg. displacement (mm)||Avg. error (mm)|
|Patient 1||21.4 (max: 31.1)||5.7 (max: 9.8)|
|Patient 2||28.3 (max: 32.7)||4.8 (max: 5.4)|
3.3 Laparoscopic Liver Registration (In Vivo)
To validate whether our method works for our target task of navigated laparoscopy, we perform a qualitative evaluation on human in-vivo data acquired from a da Vinci (Intuitive Surgical) stereo laparoscope. We first reconstruct the intraoperative surface from the video stream. For this, a 10 second camera-sweep of the liver is performed at the beginning of the surgery. Since we lack positional information of the laparoscope, the OrbSLAM2 [ORBSLam2] algorithm is used to estimate the camera pose for each frame. Furthermore, the disparity between each pair of stereo frames is estimated (using a disparity CNN [disparityCNN2019]) and a semantic segmentation is performed on each left camera frame to identify pixels which show the liver (using a segmentation CNN [Pfeiffer2019Generating]). We reproject the pixel information into the scene using the disparity, camera calibration parameters and estimated camera pose, obtaining a 3D point cloud with color and segmentation information. While the camera moves, points which are close together and are similar in hue are merged together. After the sweep, we discard points if they have not been merged multiple times (i.e. points which were not seen from multiple view points) or if they were not segmented as liver in at least 70% of the frames in which they were seen. Additionally, we use a moving least squares filter (radius 0.5 cm) to smoothen the surface. After a manual rigid alignment of and , the distance fields for the pre- and intraoperative data are calculated and our network is used to estimate a displacement of the liver volume. Qualitative results are shown in Fig. 1 and in the supplementary material.
In this work, we have shown that a CNN can learn to register a full liver surface to a deformed partial surface of the same liver. The network was never trained on a real organ or real deformation during the training process, and yet it learned to solve the surface correspondence problem as well as the underlying biomechanical constraints.
Since our method works directly on the surfaces, no assignment of boundary conditions and no search for correspondences is necessary. The breathing motion experiment shows that, even though the segmentation process creates some artifacts in the surfaces, the V2S-Net finds a valid solution. The network was able to find the displacement without the need for a prior rigid alignment, likely because the full intraoperative surface was used. In cases with less visible surface, we find that the solution depends on the prior rigid alignment, which should be investigated further in future work.
In our method, we outsource the complex and time consuming simulation of soft-tissue behavior to the data generation stage. This could be further exploited by adding additional information to the simulations, such as inhomogeneous material properties, surrounding and connecting tissues and more complex boundary conditions. Where these properties are measurable for a patient, they could be passed to the network as additional input channels.
Our results show that there may be cases where the solution is ambiguous, for example when too little information is given and the network must guess how the hidden side of the liver is deformed. This issue is likely inherent to the ill-posed laparoscopic registration problem in itself and not confined to our method. However, in contrast to conventional registration methods, neural networks can estimate how confident they are of a solution [BayesianDL2016]. This probabilistic output could be used to assess how a solution should be interpreted and could give additional information to the surgeon.