Spatio-Temporal Deep Learning Methods for Motion Estimation Using 4D OCT Image Data

04/21/2020 ∙ by Marcel Bengs, et al. ∙ 0

Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Recently, motion estimation between a template and a moving OCT image has been studied with deep learning methods to overcome the shortcomings of conventional, feature-based methods. Methods. We investigate whether using a temporal stream of OCT image volumes can improve deep learning-based motion estimation performance. For this purpose, we design and evaluate several 3D and 4D deep learning methods and we propose a new deep learning approach. Also, we propose a temporal regularization strategy at the model output. Results. Using a tissue dataset without additional markers, our deep learning methods using 4D data outperform previous approaches. The best performing 4D architecture achieves an correlation coefficient (aCC) of 98.58 85.0 strategy at the output further improves 4D model performance to an aCC of 99.06 towards image rotations and motion distortions. Conclusions. We propose 4D spatio-temporal deep learning for OCT-based motion estimation. On a tissue dataset, we find that using 4D information for the model input improves performance while maintaining reasonable inference times. Our regularization strategy demonstrates that additional temporal information is also beneficial at the model output.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Optical coherence tomography (OCT) is an image modality that is based on optical backscattering of light and allows for volumetric imaging with a high spatial and temporal resolution siddiqui2018high . The imaging modality has been integrated into intraoperative microscopes lankenau2007combining with applications to neurosurgery finke2012automatic or ophthalmic surgery ehlers2014integrative . Moreover, OCT has been used for monitoring laser cochleostomy pau2008imaging .

While OCT offers a high spatial and temporal resolution, its field of view (FOV) is typically limited to a few millimeters or centimeters kraus2012motion . Therefore, during intraoperative imaging, the current region of interest (ROI) can be lost quickly due to tissue or surgical tool movement, which requires constant tracking of the ROI and corresponding adjustment of the FOV. Performing the adjustment manually can disrupt the surgical workflow which is why automated motion compensation would be desirable. In addition to that, some surgical procedures such as laser cochleostomy also require adjustment of a surigcal tool in case patient motion occurs zhang2014optical . Due to the small scale of the cochlea structure, accurate adjustment is critical to avoid damaging surrounding tissue bergmeier2017workflow . Both motion compensation for the adjustment of the OCT’s FOV and the adjustment of surgical tools require accurate motion estimation.

One approach is to use an external tracking system for motion estimation. For example, Vienola et al. used this approach with a scanning laser ophthalmoscope for motion estimation in the context of FOV adjustment vienola2012real . Also, external tracking systems have been used in the context of cochleostomy eilers2009navigated ; du2013robustness . Alternatively, the OCT images can be used directly for motion estimation as OCT already offers a high spatial resolution. For example, Irsch et al. estimated the tissue surface distance from A-scans for axial FOV adjustment irsch2018motion . Also, Laves et al. used conventional features such as SIFT lowe1999object and SURF bay2006surf with 2D maximum intensity projects for motion estimation in the context of volume of interests stabilization with OCT laves2017feature . Another approach for high-speed OCT tracking relied on phase correlation for fast motion estimation from OCT images schluter2019feasibility . These approaches rely on hand-crafted features which can be error-prone and the overall motion estimation accuracy is often limited laves2019deep

. Therefore, deep learning methods have been proposed for motion estimation from OCT data. For example, Gessert et al. proposed using 3D convolutional neural networks (CNNs) for estimating a marker’s pose from single 3D OCT volumes

gessert2018deep . For estimating the motion between two subsequent OCT scans, Laves et al. adopted a deep learning-based optical flow method ilg2017flownet using 2.5D OCT projections laves2019deep . Similarly, Gessert et al. proposed a deep learning approach for motion estimation where the parameters for a motion compensation system are directly learned from 3D OCT volumes by a deep learning model gessert2019two .

So far, deep learning-based motion estimation with OCT relied on an initial template volume and a moving image, following the concept of registration-based motion estimation, for example, using phase correlation schluter2019feasibility . This can be problematic if motion between the original template and the current state is very large as the overlap between the images becomes small. Modern OCT systems could overcome this problem by acquiring entire sequences of OCT volumes, following the motion trajectory, as very high acquisition rates have been achieved wang2016heartbeat . Therefore, more information can be made available between an initial state and the current state which could be useful for motion estimation. While deep learning approaches using two images could follow the trajectory with pair-wise comparisons, we hypothesize that processing an entire sequence of OCT volumes at once might provide more consistence and improved motion estimation performance.

In this paper we compare several deep learning methods and investigate whether using 4D spatio-temporal OCT data can improve deep learning-based motion estimation performance, see Figure 1

. Using 4D data with deep learning methods is challenging in terms of architecture design due to the immense computational and memory requirements of high-dimensional data processing. In general, there are only few approaches that studied 4D deep learning. Examples include application to functional magnetic resonance imaging

zhao2018modeling ; bengs2019a and computed tomography clark2019convolutional ; van2019stacked . This work focuses on studying the properties of deep learning-based motion estimation and the challenging problem of learning from high-dimensional 4D spatio-temporal data. First, we design a 4D convolutional neural network (CNN) that takes an entire sequence of volumes as the input. Second, we propose a mixed 3D-4D CNN architecture for more efficient processing, that performs spatial 3D processing first, followed by full 4D processing. Third, we also make use of temporal information at the model output by introducing a regularization strategy that forces the model to predict motion states for previous time steps within the 4D sequence. For comparison, we consider a deep learning approach using a template and a moving volume as the input gessert2019two which is common for motion estimation laves2019deep . In contrast to previous deep learning approaches gessert2018deep ; gessert2019two , we do not use an additional marker and estimate motion for a tissue dataset. We evaluate our best performing method with respect to robustness towards image rotations and motion distortions. In summary, our contributions are three fold. First, we provide an extensive comparison of different deep learning architectures for estimating motion from high-dimensional 4D spatio-temporal data. Second, we propose a novel architecture that significantly outperforms previous deep learning methods. Third, we propose a novel regularization strategy, demonstrating that additional temporal information is also beneficial at the model output.

Figure 1: Our approach for motion estimation in comparison to previous methods. The approach is illustrated for 2D OCT images for simplicity. Note, we perform all experiments with 3D volumetric OCT images and thus 4D spatio-temporal data.

2 Methods

2.1 Experimental Setup

For evaluation of our motion estimation methods, we employ a setup which allows for automatic data acquisition and annotation, see Figure 2. We use a commercially available swept-source OCT device (OMES, OptoRes) with a scan head, a second scanning stage with two mirror galvanometers, lenses for beam focusing and a robot (ABB IRB 120). The OCT device is able to acquire a single volume in 1.2 ms. A chicken breast sample is attached with needles to a holder of the robot. Our OCT-setup allows for shifting the FOV without moving the scan head by using the second mirror galvanometers stage and by changing the pathlength of the reference arm. Two stepper motors control the mirrors of the second scanning stage, which shift the FOV in the lateral directions. A third stepper motor changes the pathlength of the reference arm to translate the FOV in the axial dimension. For evaluation of our methods we consider volumes of size with a corresponding FOV of approximately .

Chicken breast sample

Robot

OCT scan head

Galvos

Lens
Figure 2: The experimental setup for data acquisition and annotation. The chicken breast sample is attached with needles to a holder of the robot. The OCT device itself is not shown.

2.2 Data Acquisition

We consider the task of motion estimation of a given ROI with respect to its initial position. To assess our methods on various tissue regions, we consider 40 randomly chosen ROIs of a chicken breast sample with the same size as the OCT’s FOV.

For motion estimation, only the relative movement between the FOV and ROI is relevant, hence moving the ROI and using a steady FOV is equivalent to moving the FOV and using a steady ROI. This can be exploited for generation of both OCT and ground-truth labels. By keeping the ROI steady and moving the FOV by a defined shift in stepper motor space, we simulate relative ROI movement. At the same time, the defined shift provides a ground-truth motion as we can transform the shift in motor space to the actual motion in image space using a hand-eye calibration.

Initially, the FOV completely overlaps with the target ROI. After acquiring an initial template image volume of the ROI, we use the stepper motors to translate the FOV by such that the target ROI only partially overlaps with the FOV. Now, we acquire an image volume for the corresponding translation . This step can be repeated multiple times, resulting in a sequence of shifted volumes and known relative translations between the initial ROI and a translated one. Note, each translation is relative to the initial position of a ROI. The procedure is illustrated in Figure 3.

In this way, we formulate a supervised learning problem where we try to learn the relative translation

of an ROI experiencing motion with respect to its initial position, given a sequence of volumes .

For generation of a single motion trajectory, we consider a sequence of five target translations, i.e., target motor shifts . To generate a smooth motion pattern, we randomly generate

and use spline interpolation between

, and a randomly generated connection point . We sample the intermediate target shifts from the spline function. This results in various patterns where the FOV drifts away from the ROI. By using different distances between and we simulate different magnitudes of motions and obtain various different motor shift distances between subsequent volumes. Example trajectories are shown in Figure 4. We use a simple calibration between galvo motor steps and image coordinates, to transform the shifts from stepper motor space to image space, resulting in a shift in millimeters.

For data acquisition we use the three following steps. First, we use the the robot for randomly choosing an ROI. Then, the initial state of the three motors corresponds to an FOV completely overlapping with the ROI. Second, we randomly generate a sequence of five target motor states, as described above, which shifts the FOV out of the ROI. Third, at each of the target motor states, an OCT volume is acquired.

Overall, for each ROI, we acquire OCT volumes of 200 motion patterns, where each movement consists of five target translations and five OCT volumes.

Moreover, we evaluate how the estimation performance is affected by relative rotations between volumes of a sequence. Note, our current scanning setup is designed for translational motion as rotation is difficult to perform using galvo mirrors. Therefore, we add rotations in a post-processing step, by rotating acquired volumes of a sequence around the axial axis. We define a maximal rotation and transform each volume of a sequence with while . Note, is the rotation matrix for rotations around the depth axis. First, we consider rotations as noise that is applied to the image data. Second, we incorporate the rotation into our motion and adapt the ground truth with respect to the rotation.

Last, we also consider the effect of fast and irregular motion, such as high frequency tremors that may cause distortion within an image. This effect is unlikely to occur with our current setup as our high acquisition frequency prevents common motion artifacts zawadzki2007correction . Nevertheless, we perform experiments with simulated motion artifacts due to relevance for slower OCT systems. We follow the findings of previous works xu2012alignment ; kraus2012motion ; zawadzki2007correction and consider motion distortions as lateral and axial shifts between B-scans of an OCT volume that has been acquired without motion distortions. In this way, we are able to augment our data with defined motion distortions in a post-processing step. To simulate different intensities of motion distortions we introduce a factor

that defines the probability that a B-scan is shifted. Also, we compare shifting the B-scans one or two pixels randomly along the spatial dimensions.

Figure 3: Our data acquisition strategy. For motion estimation only the relative movement is relevant, hence we use a fixed ROI and move the FOV step-wise by . This results in a sequence of OCT volumes with the corresponding relative translation between the initial volume and the last volume of a sequence.
Figure 4: Shown are 30 example trajectories for the translations in the spatial dimensions, each trajectory consists of a sequence of five target shifts (circle).

2.3 Deep Learning Models

All our deep learning architectures consist of an initial processing block and a baseline block. For the baseline block we adapt the idea of densely connected neural networks (densenet) huang2017densely

. Our baseline block consists of three densenet blocks connected by average pooling layers. Each densenet block consists of 2 layers with a growth rate of 10. After the final densenet block we use a global average pooling layer (GAP) for connecting the three dimensional linear regression output layer. Note, the output

of the architecture are the relative translations between volume and in all spatial directions. Using this baseline block, we evaluate five different initial processing concepts for motion estimation based on 4D OCT data, shown in Figure 5.

First, we follow the idea of a two path architecture for OCT-based motion estimation gessert2019two . This architecture individually processes two OCT volumes up to a concatenation point by a two-path CNN with shared weights. At the concatenation point the outputs of the two paths are stacked into the channel dimension and subsequently processed jointly by a 3D CNN architecture. In this work, we use three CNN layers for the initial two-path part and our densenet baseline block with 3D convolutions (DensNet3D) for processing after the concatenation point. In the first instance we only consider the initial volume and the last volume of a sequence to estimate the relative translation. We refer to this architecture as Two-Path-3D.

Second, we use Two-Path-3D and consider predicting the relative translation between the initial and last volume, based on the sum of the relative translations between two subsequent volumes of a sequence. In this way, the network obtains information from the entire sequence. The network receives the input pairs [, ], [, ], [, ], [, ], [, ] and the estimations are added to obtain the final network prediction . Note, we train our network end-to-end based on the relative translation between the initial and the last volume and the network prediction . We refer to this architecture as S-Two-Path-3D.

Third, we extent the idea of a two-path architecture to processing of an entire sequence of volumes, instead of using only two volumes as the networks input. For this purpose, we extend the two-path architecture to a multi-path architecture, while the number of paths is equal to number of volumes used. Note, similar to the two-path CNN, the multi-path layers consists of three layers with shared weights, followed by our densenet baseline block with 3D convolutions (DensNet3D). We refer to this architecture as Five-Path-3D.

Fourth, we use a 4D convolutional neural network, which employs 4D spatio-temporal convolutions and hence jointly learns features from the temporal and spatial dimensions. The input of this network is four dimensional, (three spatial and one temporal dimension) using a sequence of volumes. This method consists of an initial convolutonal part with three layers, followed by our densenet block using 4D convolutions throughout the entire network. We refer to this architecture as Dense4D.

Fifth, we combine the idea of 4D spatio-temporal CNNs and multi-path architectures. At first, we split the input sequence and use a multi-path 3D CNN to individually process each volume of the sequence. However, instead of concatenating the volumes along the feature dimension at the output of the multi-path CNN, we reassemble the temporal dimension by concatenating the outputs into a temporal dimension. Then, we employ our DenseNet4D baseline block. We refer to this architecture as Five-Path-4D.

2.4 Training and Evaluation

We train our models to estimate the relative motion of an ROI using OCT volumes. Hence, we minimize the mean squarred error (MSE) loss function between the defined target motions

and our predicted motions .

(1)

Our goal is to estimate the relative motion between an initial volume and a final volume , corresponding to the target shift . Given the nature of our acquisition setup, the intermediate shifts are also available. As these additional shifts represent additional motion information, we hypothesize that they could improve model training by enforcing more consistent estimates and thus regularize the problem.

We incorporate the additional motion information by forcing our models to also predict the relative shifts of previous volumes and . Thus, we also consider the relative translations and and we extent the network output by also predicting and . Note, the additional output and is only considered during training and not required for application.

For optimization, we propose and evaluate the following loss function and introduce parameters for weighting of the additional temporal information, introduced as a regularization term.

(2)

We train all our models for 150 epochs, using Adam for optimization with a batch size of 50. To evaluate our models on previously unseen tissue regions we randomly choose five independent ROIs for testing and validating each. For training we use the remaining 30 ROIs.

Figure 5: Our proposed network architectures. The networks receive volumes from a stream of volumes to predict the motion between the volumes and . Note, for the multi path architectures, the weights are shared across the paths.

3 Results

First, we compare the different methods and report the mean absolute error (MAE), the relative mean absolute error (rMAE) and average correlation coefficient (aCC) for our experiments in Table 1

. The MAE is given in mm based on the calibration between galvo motor steps and image coordinates. The rMAE is calculated by dividing the MAE by targets’ standard deviation. We state the number of parameters and inference times for all models, see Table 

2. For all experiments, we test our results for significant differences in the median of the rMAE using Wilcoxon signed-rank test with significance level. Overall, using a sequence of volumes improves performance significantly and Five-Path-4D performs best with a high aCC of %. Comparing Five-Path-4D to Two-Path-3D, the rMAE is reduced by a factor of approximately 2.6. Moreover, employing the two-path architecture on subsequent volumes and adding the estimations (S-Two-Path-3D) performs significantly better than directly using the initial and the last volume (Two-Path-3D) of a motion sequence.

Second, we extent the comparison of our models and present the MAE over different motion magnitudes, shown in Figure 6. The error increases with an increasing magnitude of the motion for all models. Comparing the different models shows that the error increases only slightly for Five-Path-4D, compared to the other models.

Third, Table 3 shows how rotations affect the performance for our best performing model Five-Path-4D during evaluation. First, we consider rotations as noise during motion and do not transform the target shifts. Second, we consider rotations as part of the motion and transform the target shifts accordingly. For small rotation angels performance is robust and hardly reduced. For larger rotations angels lateral estimation performance is affected when rotations are considered as noise, while performance remains similar when rotations are considered as part of the motion.

Fourth, Table 4 demonstrates how motion distortions affect performance. We evaluate different magnitudes of motion distortions. The results show that performance is hardly reduced when only few motion distortions are present (). However, as we increase the amount of motion distortions, performance is notably affected, yet, performance is recovered when distortions are also considered during training.

Fifth, we address the temporal regularization strategy, see Table 5 for our best performing model Five-Path-4D. We report performance metrics for various weighting factors and . Our results demonstrate that using the regularization strategy improves performance. Fine tuning the weights improves performance significantly with a high aCC of for a weighting of and .

MAE MAE MAE rMAE aCC (%)
Two-Path-3D
S-Two-Path-3D
Five-Path-3D
Dense4D
Five-Path-4D
Table 1: Comparison of the different models for motion estimation. Our errors refer to the translation between the template and the last volume of a motion sequence. Errors are given in .
Number of Parameters Inf. Time
Two-Path-3D ms
S-Two-Path-3D ms
Five-Path-3D ms
Dense4D ms
Five-Path-4D ms
Table 2: Number of parameters and inference times for all models.
Figure 6: MAE for increasing motion magnitudes. Results are shown for four motion groups, covering increasing magnitudes of motion.
MAE MAE MAE rMAE aCC (%)

Noise

Motion

Table 3: Evaluation of the performance for different rotation angels during motion. We evaluate the rotation as noise or as part of the motion, where the ground truth is rotated accordingly. The rotation angle refers to the relative rotation between the initial template volume and the last volume of a sequence. Results are shown for the architecture Five-Path-4D. The errors refer to the translation between the template and the last volume of a motion sequence and are given in .

. Type MAE MAE MAE rMAE aCC (%) E-1 50% E-1 25% E-1 10% E-2 50% E-2 25% E-2 10% T/E-2 50%

Table 4: Results for Five-Path-4D when motion distortions are applied during evaluation; refers to the probability that a B-scan is shifted. We evaluate shifting the B-scans one (E-1) or two pixels (E-2) during evaluation. Also, we consider motion distortions of two pixels during training and evaluation (T/E-2). Our errors refer to the translation between the template and the last volume of a motion sequence. Errors are given in
MAE MAE MAE rMAE aCC (%)
0 0
1 0
0.75 0
0.5 0
0.25 0
1 1
0.75 0.75
0.75 0.5
Table 5: Evaluation of the temporal loss regularization using different weighing factors , . Results are shown for the architecture Five-Path-4D with respect to predicting the motion between the template and the last volume of a sequence. Errors are given in .

4 Discussion

Motion estimation is a relevant problem for intraoperative OCT applications, for example in the context of motion compensation irsch2018motion and surgical tool navigation zhang2014optical

. While previous approaches for motion estimation relied on a template and moving images, we learn a motion vector from an entire sequences of OCT volumes. This leads to the challenging problem of 4D spatio-temporal deep learning.

We design three new CNN models that address 4D spatio-temporal processing in different ways. While Five-Path-3D is an immediate extension of the previous two-path approach gessert2019two , our Five-Path-4D and Dense4D models perform full 4D data processing. For a fair comparison, we also consider pairwise motion estimation along the sequence using Two-Path-3D, aggregated to a final estimate. Our results in Table 1 show that the two-path method using only the start and the end volume perform worse than the other methods. This demonstrates that there is not enough information for motion estimation or the motion is too large.

For using a full sequence of volumes the Five-Path-3D CNN performs significantly worse than the other deep learning approaches. This indicates that stacking multiple volumes in the models feature channel dimension is not optimal for temporal processing. This has also been observed for spatio-temporal problems in the natural image domain tran2015learning . This is also supported by pair-wise processing with S-Two-Path-3D which shows a significantly higher performance than the feature stacking approach and a higher performance than Dense4D. Our proposed 4D architecture outperforms all other approaches, including the previous deep learning concepts using two volumes laves2019deep ; gessert2019two and pair-wise processing. Thus, we demonstrate the effective use of full 4D spatio-temporal information with a new deep learning model.

Next, we also consider the effect of different motor shift distances for our problem. Note, faster movements lead to larger distance between subsequent volumes of a sequence and to reduced overlap, making motion estimation harder as there are fewer features for finding correspondence. The results in Figure 6 show the performance for different distances between volumes. As expected, we observe a steady increase with larger distances for all models. For the approaches using just two volumes, the increase is substantial while it remains moderate for the 4D spatio-temporal models. Thus, 4D data is also beneficial for various magnitudes of motion to be estimated and we demonstrate that the models effectively deal with different spatial distances between time steps.

Moreover, Table 3 shows how rotations affect performance for our best performing method when applied during evaluation. When rotations are considered as noise, only for large rotations performance is notably reduced. However, when rotations are considered as part of the motion, performance remains similar even for larger rotations. As rotations were not present in the training data, the results indicate that our models are robust with respect to rotations.

Furthermore, we consider the problem of potential motion artifacts. The OCT device we employ is able to acquire an OCT volume in . According to Zawadzki et al., motion artifacts are not present for volume acquisition speeds below zawadzki2007correction . However, to ensure that our methods are applicable to slower OCT devices as well, we consider the effect of fast and irregular motion that may cause image distortions. We consider motion distortions as lateral or axial shifts between B-scans of an OCT volume, similar to previous works xu2012alignment ; kraus2012motion ; zawadzki2007correction . The results in Table 4 demonstrate that motion distortions applied only during evaluation can affect performance. This highlights the importance of fast volumetric imaging when 4D data is used for motion estimation. However, when motion artifacts are also considered during training, performance can be recovered. These results indicate that using deep learning with 4D data is a viable approach, even if data is affected by fast and irregular motion distortions.

As temporal information appears to be beneficial at the model input, we also consider usage at the model output. Here, we introduce a regularization strategy which forces the model to learn consecutive motion steps. We also introduce weighting factors for fine tuning of our approach. Our results in Table 5 demonstrate that the regularization method appears to be effective. While a weighting equal to one does not lead to an immediate performance improvement, using a weighing of , improves performance notably up to an aCC of %. As a result, providing more information on the trajectory during training appears to be helpful for 4D motion estimation.

While our 4D deep learning methods significantly improve performance, their more costly 4D convolution operations also affect inference times which is important for application when real-time processing is required. Inference times in comparison to model size are shown in Table 2. While, Five-Path-4D significantly outperforms S-Two-Path-3D in terms of motion estimation performance, S-Two-Path-3D allows for faster predictions. Thus, there is a trade-off between performance and inference time for the different architectures. However, with an inference time of 107 Hz our 4D deep learning methods are already a viable approach for real-time motion estimation which could be improved in the future by using more powerful hardware or additional low-level software optimization.

5 Conclusion

We investigate deep learning methods for motion estimation using 4D spatio-temporal OCT data. We design and evaluate several 4D deep learning methods and compare them to previous approaches using a template and a moving volume. We demonstrate that our novel 3D-4D deep learning method significantly improves estimation performance on a tissue data set, compared the previous deep learning approach of using two volumes. We observe that large motion is handled well by the 4D deep learning methods. Also, we demonstrate the effectiveness of using additional temporal information at the network’s output by introducing a regularization strategy that forces the 4D model to learn an extended motion pattern. These results should be considered for future applications such as motion compensation or the adjustment of surgical tools during interventions. Also, our 4D spatio-temporal methods could be extended to other problems such as ultrasound-based motion estimation.

Compliance with ethical standards

Funding: This work was partially funded by Forschungszentrum Medizintechnik Hamburg (grants 04fmthh16).
Conflict of interest: The authors declare that they have no conflict of interest.
Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent: Not applicable

References

  • (1) Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: ECCV, pp. 404–417. Springer (2006)
  • (2) Bengs, M., Gessert, N., Schlaefer, A.: 4d spatio-temporal deep learning with 4d fmri data for autism spectrum disorder classification. In: International Conference on Medical Imaging with Deep Learning (2019)
  • (3) Bergmeier, J., Fitzpatrick, J.M., Daentzer, D., Majdani, O., Ortmaier, T., Kahrs, L.A.: Workflow and simulation of image-to-physical registration of holes inside spongy bone. International Journal of Computer Assisted Radiology and Surgery 12(8), 1425–1437 (2017)
  • (4) Clark, D., Badea, C.: Convolutional regularization methods for 4d, x-ray ct reconstruction. In: Medical Imaging 2019: Physics of Medical Imaging, vol. 10948, p. 109482A. International Society for Optics and Photonics (2019)
  • (5) Du, X., Assadi, M.Z., Jowitt, F., Brett, P.N., Henshaw, S., Dalton, J., Proops, D.W., Coulson, C.J., Reid, A.P.: Robustness analysis of a smart surgical drill for cochleostomy. The International Journal of Medical Robotics and Computer Assisted Surgery 9(1), 119–126 (2013)
  • (6) Ehlers, J.P., Srivastava, S.K., Feiler, D., Noonan, A.I., Rollins, A.M., Tao, Y.K.: Integrative advances for oct-guided ophthalmic surgery and intraoperative oct: microscope integration, surgical instrumentation, and heads-up display surgeon feedback. PloS one 9(8), e105224 (2014)
  • (7) Eilers, H., Baron, S., Ortmaier, T., Heimann, B., Baier, C., Rau, T.S., Leinung, M., Majdani, O.: Navigated, robot assisted drilling of a minimally invasive cochlear access. In: 2009 IEEE International Conference on Mechatronics, pp. 1–6. IEEE (2009)
  • (8) Finke, M., Kantelhardt, S., Schlaefer, A., Bruder, R., Lankenau, E., Giese, A., Schweikard, A.: Automatic scanning of large tissue areas in neurosurgery using optical coherence tomography. The International Journal of Medical Robotics and Computer Assisted Surgery 8(3), 327–336 (2012)
  • (9) Gessert, N., Gromniak, M., Schlüter, M., Schlaefer, A.: Two-path 3d cnns for calibration of system parameters for oct-based motion compensation. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, p. 1095108. International Society for Optics and Photonics (2019)
  • (10)

    Gessert, N., Schlüter, M., Schlaefer, A.: A deep learning approach for pose estimation from volumetric oct data.

    Medical image analysis 46, 162–179 (2018)
  • (11) Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
  • (12) Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470 (2017)
  • (13) Irsch, K., Lee, S., Bose, S.N., Kang, J.U.: Motion-compensated optical coherence tomography using envelope-based surface detection and kalman-based prediction. In: Advanced Biomedical and Clinical Diagnostic and Surgical Guidance Systems XVI, vol. 10484, p. 104840Q. International Society for Optics and Photonics (2018)
  • (14) Kraus, M.F., Potsaid, B., Mayer, M.A., Bock, R., Baumann, B., Liu, J.J., Hornegger, J., Fujimoto, J.G.: Motion correction in optical coherence tomography volumes on a per a-scan basis using orthogonal scan patterns. Biomedical optics express 3(6), 1182–1199 (2012)
  • (15) Lankenau, E., Klinger, D., Winter, C., Malik, A., Müller, H.H., Oelckers, S., Pau, H.W., Just, T., Hüttmann, G.: Combining optical coherence tomography (oct) with an operating microscope. In: Advances in medical engineering, pp. 343–348. Springer (2007)
  • (16) Laves, M.H., Ihler, S., Kahrs, L.A., Ortmaier, T.: Deep-learning-based 2.5 d flow field estimation for maximum intensity projections of 4d optical coherence tomography. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, p. 109510R. International Society for Optics and Photonics (2019)
  • (17) Laves, M.H., Schoob, A., Kahrs, L.A., Pfeiffer, T., Huber, R., Ortmaier, T.: Feature tracking for automated volume of interest stabilization on 4d-oct images. In: Medical Imaging 2017: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10135, p. 101350W. International Society for Optics and Photonics (2017)
  • (18) van de Leemput, S.C., Prokop, M., van Ginneken, B., Manniesing, R.: Stacked bidirectional convolutional lstms for deriving 3d non-contrast ct from spatiotemporal 4d ct. IEEE transactions on medical imaging (2019)
  • (19) Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, pp. 1150–1157 (1999)
  • (20) Pau, H., Lankenau, E., Just, T., Hüttmann, G.: Imaging of cochlear structures by optical coherence tomography (oct). temporal bone experiments for an oct-guided cochleostomy technique. Laryngo-rhino-otologie 87(9), 641–646 (2008)
  • (21) Schlüter, M., Otte, C., Saathoff, T., Gessert, N., Schlaefer, A.: Feasibility of a markerless tracking system based on optical coherence tomography. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, p. 1095107. International Society for Optics and Photonics (2019)
  • (22) Siddiqui, M., Nam, A.S., Tozburun, S., Lippok, N., Blatter, C., Vakoc, B.J.: High-speed optical coherence tomography by circular interferometric ranging. Nature photonics 12(2), 111 (2018)
  • (23) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015)
  • (24) Vienola, K.V., Braaf, B., Sheehy, C.K., Yang, Q., Tiruveedhula, P., Arathorn, D.W., de Boer, J.F., Roorda, A.: Real-time eye motion compensation for oct imaging with tracking slo. Biomedical optics express 3(11), 2950–2963 (2012)
  • (25) Wang, T., Pfeiffer, T., Regar, E., Wieser, W., van Beusekom, H., Lancee, C.T., Springeling, G., Krabbendam-Peters, I., van der Steen, A.F., Huber, R., van Soest, G.: Heartbeat oct and motion-free 3d in vivo coronary artery microscopy. JACC: Cardiovascular Imaging 9(5), 622–623 (2016)
  • (26) Xu, J., Ishikawa, H., Wollstein, G., Kagemann, L., Schuman, J.S.: Alignment of 3-d optical coherence tomography scans to correct eye movement using a particle filtering. IEEE transactions on medical imaging 31(7), 1337–1345 (2012)
  • (27) Zawadzki, R.J., Fuller, A.R., Choi, S.S., Wiley, D.F., Hamann, B., Werner, J.S.: Correction of motion artifacts and scanning beam distortions in 3d ophthalmic optical coherence tomography imaging. In: Ophthalmic Technologies XVII, vol. 6426, p. 642607. International Society for Optics and Photonics (2007)
  • (28) Zhang, Y., Pfeiffer, T., Weller, M., Wieser, W., Huber, R., Raczkowsky, J., Schipper, J., Wörn, H., Klenzner, T.: Optical coherence tomography guided laser cochleostomy: Towards the accuracy on tens of micrometer scale. BioMed research international 2014 (2014)
  • (29) Zhao, Y., Li, X., Zhang, W., Zhao, S., Makkie, M., Zhang, M., Li, Q., Liu, T.: Modeling 4d fmri data via spatio-temporal convolutional neural networks (st-cnn). In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 181–189. Springer (2018)