Dueling Deep Q-Network for Unsupervised Inter-frame Eye Movement Correction in Optical Coherence Tomography Volumes

by   Yasmeen M. George, et al.
NYU Langone Medical Center

In optical coherence tomography (OCT) volumes of retina, the sequential acquisition of the individual slices makes this modality prone to motion artifacts, misalignments between adjacent slices being the most noticeable. Any distortion in OCT volumes can bias structural analysis and influence the outcome of longitudinal studies. On the other hand, presence of speckle noise that is characteristic of this imaging modality, leads to inaccuracies when traditional registration techniques are employed. Also, the lack of a well-defined ground truth makes supervised deep-learning techniques ill-posed to tackle the problem. In this paper, we tackle these issues by using deep reinforcement learning to correct inter-frame movements in an unsupervised manner. Specifically, we use dueling deep Q-network to train an artificial agent to find the optimal policy, i.e. a sequence of actions, that best improves the alignment by maximizing the sum of reward signals. Instead of relying on the ground-truth of transformation parameters to guide the rewarding system, for the first time, we use a combination of intensity based image similarity metrics. Further, to avoid the agent bias towards speckle noise, we ensure the agent can see retinal layers as part of the interacting environment. For quantitative evaluation, we simulate the eye movement artifacts by applying 2D rigid transformations on individual B-scans. The proposed model achieves an average of 0.985 and 0.914 for normalized mutual information and correlation coefficient, respectively. We also compare our model with elastix intensity based medical image registration approach, where significant improvement is achieved by our model for both noisy and denoised volumes.



There are no comments yet.


page 3

page 5


Two-path 3D CNNs for calibration of system parameters for OCT-based motion compensation

Automatic motion compensation and adjustment of an intraoperative imagin...

Deep OCT Angiography Image Generation for Motion Artifact Suppression

Eye movements, blinking and other motion during the acquisition of optic...

FlowReg: Fast Deformable Unsupervised Medical Image Registration using Optical Flow

We propose FlowReg, a deep learning-based framework for unsupervised ima...

Deep learning based 2.5D flow field estimation for maximum intensity projections of 4D optical coherence tomography

In laser microsurgery, image-based control of the ablation laser can lea...

Efficient and high accuracy 3-D OCT angiography motion correction in pathology

We propose a novel method for non-rigid 3-D motion correction of orthogo...

Non-rigid image registration using spatially region-weighted correlation ratio and GPU-acceleration

Objective: Non-rigid image registration with high accuracy and efficienc...

Fused Detection of Retinal Biomarkers in OCT Volumes

Optical Coherence Tomography (OCT) is the primary imaging modality for d...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Optical coherence tomography (OCT) technology provides clinicians with real-time and high-resolution images of ocular structures which are of great use in diagnosing and monitoring retinal diseases, evaluating progression, and assessing response to therapy [fujimoto2016development]. Three generations for OCT since it was invented in 1991 [huang1991optical] and over the past few decades, different commercially available OCT instruments were developed. Each device is characterized by several parameters such as lateral and axial resolutions, penetration depth and imaging speed. Fast image acquisition of OCT volumes, i.e. A-scan rate, is very important to reduce retinal motion. Yet, it is limited by the camera read-out rate in OCT scanner device [sanchez2019review]. Another type of motion artifacts is the axial eye motion with a frequency varies from 3 to 12 Hz and its exact mechanism has not been clearly known. In addition to the involuntary fixational eye movements that are usually categorized as high frequency tremors, rapid microsaccades, or slow drifts, depending on their frequency and magnitude. These types of motion artifacts during OCT volume scanning result in deformed 3D data of the retina [spaide2015image]. The correction of distorted data is very essential to improve OCT image quality and therefore better diagnoses.

Retinal motion vary on amplitude, direction, and frequency, making its combination difficult to predict [sanchez2019review]

. Moreover, retinal motion may differ significantly between individuals, hindering the development of a generalized theoretical or learning models for retinal motion prediction. This issue can be tackled by using advanced deep learning (DL) models with a large OCT dataset that covers different types of retinal motion. In this regard, ground-truth for inter-frame misalignment is needed, which is very expensive and difficult to manually annotate. Further, unsupervised learning models with OCT retinal cubes are greatly biased towards the speckle noise misleading the final outcomes of the alignment. In this work, we propose an unsupervised inter-frame movement correction approach that works really well even in the presence of speckle noise. The approach is based on deep reinforcement learning, in which an artificial agent is trained using deep q-network to find a strategy of sequential actions that best improves the alignment between 2d-slices in OCT data.

Related Work. The literature eye movement correction methods can be divided into feature-based and intensity-based approaches. Feature-based registration methods use landmarks of the image, such as vasculature, vessel intersections, and retinal layers to correct OCT data misalignment, while intensity based approaches rely on the similarity between images such as correlation and mutual information [sanchez2019review, baghaie2015state, baghaie2017involuntary]. Further, the literature also presents the use of deep reinforcement learning for medical image registration [liao2017artificial, krebs2017robust], where the agent uses ground truth transformation parameters for training.

We summarize our contributions as follow:

  • For the first time, we propose a dueling deep q-network for OCT inter-frame image alignment (DDQN-OCT) that does not require landmarks or transformation parameters ground truth.

  • We use a combined intensity based image similarity metrics to guide the rewarding system for training the artificial agent in an unsupervised fashion.

  • The proposed approach does not require the removal of speckle noise, which is a common preprocessing step for all 2D and 3D OCT registration methods.

  • Our approach has significant improvement over elastix intensity based medical image registration.

2 Methodology

2.1 Reinforcement Learning Framework

We formulate the inter-frame movement correction in OCT volumes as a 2D rigid registration problem that matches two adjacent B-scans in the fast scanning plane ( and ). This is accomplished by finding the optimal spatial transformation that aligns with . The 2D-rigid transformation has 3 parameters: two translations () and one rotation (). For this purpose, we use deep reinforcement learning (DRL) to solve this optimization problem in an unsupervised manner. Figure 1 shows the framework of the proposed approach. Specifically, an artificial agent learns by interacting with an environment () to maximize the cumulative reward signals () throughout the agent’s lifetime. At every iteration , given a state that represents the difference image of the two adjacent B-scans, the agent selects an action that is associated with a scalar reward signals . is the set of states that the agent can see and is the set of discrete actions that the agent can take. consists of 6 candidate transformations that lead to the change of in one parameter of . During training, the agent learns the optimal policy (i.e. a strategy of sequential actions) that maps a current state to an optimal action that best improves the alignment by maximizing the sum of reward signals seen over the agent’s lifetime. The optimal action-selection policy is identified by learning an action value function (i.e. -function) that measures the quality of taking an action given a state , as defined by Watkins et al. [Watkins1992]. The -function can be solved using Bellman iterative approach [bellman2013dynamic] as in Equation 1.


where is the future rewards discount factor. and are the next state and action.

Figure 1: Deep reinforcement learning framework for eye motion correction in OCT volumes

2.2 Dueling Deep Q-Network for Optimal Policy Estimation

In this paper, we follow Mnih et al. [mnih2015human]

who proposed a deep Q-network (DQN) to approximate the action-value function using deep neural network (DNN) as in Equation



where is a bonus reward value the agent receives when it finds the best alignment. This is reached when the distance between the reference B-scan and the aligned B-scan is within the distance threshold . The immediate reward for a state-action pair is calculated using Equation 3.


where and refer to the transformation values before and after action is selected that is parameterized by . is the dissimilarity metric.

Finding a good measure of dissimilarity is very crucial for the success of the agent learning process. The literature papers in this area rely on the ground truth transformation parameters [liao2017artificial, krebs2017robust], where . In our work, we propose the use of intensity based image similarity metrics to train the agent in an unsupervised manner that was not proposed in the literature before. The proposed dissimilarity metric is based on two statistical measures, namely correlation coefficient () and structural similarity index measure (SSIM) as in Equation 4. measures the degree of change in one causes the change in the other, while SSIM considers the perceptual likeness and structural information.


where and

are the average and variance of reference image x, and

and are the average and variance of the transformed image y. Also, refers to the covariance between x and y. and are stabilization variables.

We also adopt the action-state value function split notion, proposed by Wang et al. [wang2015dueling], called dueling DQN. In which, is decomposed into action-independent () and action-dependent value (

) functions. This has shown to provide robust state value estimates. The architecture of our DNN network is shown in Figure


, which consists of 4 convolutional layers, each is followed by batch normalization and ReLU activation, except the first convolutional layer which does not have batch normalization. The convolutional layers have incremental number of the filters of 32-32-64-64 with kernel sizes of 5-5-4-3 in order, and stride of 2 for all layers. This is followed by 2D max-pooling layer with size of 2 and a fully-connected layer with 512 nodes. Then the output of fully-connected layer is passed to two branches for action-dependent and action-independent value functions with 6 and 1 nodes in order. Finally, a fully-connected layer connects the sum of the two branches to the output layer with 6 nodes, each corresponds to one of the actions in

. The input to the network is computed by subtracting the reference B-scan and the transformed one, i.e. for each of the previous

steps. This is to obtain more stable search trajectories and prevent the agent from oscillation issue. Also, the loss function is calculated using mean squared error as shown in Equation



3 Data and Implementation Details

Dataset. The dataset contains 10,370 OCT macular scans from both eyes of 1678 individuals, acquired on a Cirrus SD-OCT Scanner (Zeiss; Dublin, CA, USA) over multiple visits. The dataset has 427 healthy scans from 109 individuals and the other scans with different ocular conditions including glaucoma, optic neuropathy, plateau iris and others. The scans have 2002001024 (a-scansb-scansdepth) voxels per cube covering an area of 662 . This is an observational study that was conducted in accordance with the tenets of the Declaration of Helsinki and the Healthy Insurance Portability and Accountability Act.
Training and Validation. The OCT volumes are divided into a training (7290 scans), validation (1608) and testing (1472) subsets, while it is ensured that eyes belonging to the same patient are not split across subsets. A 20 B-scans from each volume are randomly selected, each has a size of and normalized to have pixel values from 0 to 1. A random window around retinal layers with size is selected with spacing of 4 and 2 in the and directions in order. Then, a random rigid transformation is applied on each cropped window separately. The range of simulated transformation parameters is chosen to be from -5 to 5 for , , and . This is to cover all possible eye movements. The cropped B-scan and its corresponding transformed B-scan represent the environment that the agent interacts with throughout its lifetime (i.e. one episode). During training, we use a replay experience memory of size to store transitions of . Then a batch of size 256 is randomly selected. Each input sample has the size , where 4 represents the previous action steps taken by the agent. The network is trained using Adam optimizer with a learning rate of for epochs, each has steps with a maximum of steps per episode.

4 Experimental Results and Discussion

The proposed DDQN-OCT model is implemented using Python and TensorFlow on a single V100 GPU. The exploration rate for the artificial agent starts with 1 and linearly decreases to reach 0.1 in epoch #20, followed by another linear decrease till epoch #100 as shown in Figure

2-(a). Also, Figure 2-(b) and (c) plot the training and validation loss curves, and image distances in order, which show a very good convergence for the model without overfitting.

(a) (b) (c)
Figure 2: Artificial agent training graphs. (a) Exploration and exploitation rates, (b) Training loss, and (c) Image dissimilarity measures

To visualize how our method works, we test DDQN-OCT model trained on noisy scans, on both noisy and denoised B-scans. We denoise the B-scans using the Generative Adversarial Network (GAN) model proposed in [halupka2018retinal]. Instead of cropping a random window, we resize the whole B-scan to match the network input shape (See Figure 3). The figure shows the agent results using randomly selected noisy B-scan (left column) and denoised B-scan (right column). The figure has 4 rows, each row displays the reference B-scan, aligned B-scan and agent screen at steps 3, 5, 9, and 11. From the figure, the agent reaches the best alignment after 11 steps. Also, the ground truth transformation parameters and agent current transformation are displayed in yellow.

For quantitative evaluation, we compute normalized mutual information (NMI), cross-correlation coefficient (

), agent score (i.e. cumulative reward) and execution time for each sample in our test set that contains 29,440 B-scans. We then compute the average, standard deviation, lower, middle, and upper quartiles for each statistical measure across the test set as reported in Table

1. The proposed model achieves an average of 0.985 and 0.914 for NMI and , respectively. Furthermore, to quantify the impact of speckle noise on the agent’s training, we retrain the model using denoised B-scans. Statistical results show an increase of 4% in correlation measure (), while a decrease of 2% for NMI measure (Table 1).

(a) Noisy B-scan (b) Denoised B-scan
Figure 3: Visual results for the proposed DDQN-OCT model
Statistical measure NMI Episode score Time (sec)

Noisy B-scans

Average Std 0.985 0.064 0.914 0.128 3.343 3.036 0.445 0.498
Lower quartile (25%) 0.990 0.826 0.218 0.040
Median (50%) 1.00 1.00 5.455 0.061
Upper quartile (75%) 1.00 1.00 5.562 1.027

Denoised B-scans

Average Std 0.969 0.094 0.957 0.076 3.621 2.623 0.607 0.787
Lower quartile (25%) 0.986 0.945 0.280 0.058
Median (50%) 0.989 0.984 5.254 0.105
Upper quartile (75%) 1.00 1.00 5.347 1.195
Table 1: Statistical measures for our proposed DDQN-OCT model

As a comparative study, we evaluate the performance of elastix intensity-based medical image registration approach, described in [klein2009elastix]. We use same test set and also evaluate using noisy and denoised B-scans separately. Results are reported in Table 2, where our DDQN-OCT model has a significant improvement comparing to elastix registration approach with more than 50% and 10% for NMI and , respectively. The table also shows that elastix approach works much better on denoised scans than noisy scans with an improvement of 10% and 4% for NMI and , respectively. We also record the execution time for all our experiments as shown in Tables 1 and 2 where our model takes much less time than elastix approach with an average of 0.5 second per images.

Statistical measure NMI Time (sec)

Noisy B-scans

Average Std 0.344 0.140 0.814 0.083 38.322 14.978
Lower quartile (25%) 0.281 0.762 35.545
Median (50%) 0.306 0.826 36.581
Upper quartile (75%) 0.331 0.875 37.867

Denoised B-scans

Average Std 0.448 0.103 0.847 0.072 6.840 0.392
Lower quartile (25%) 0.386 0.802 6.580
Median (50%) 0.421 0.854 6.851
Upper quartile (75%) 0.465 0.900 7.063
Table 2: Statistical measures for the literature approach: elastix registration

We also compare the proposed dueling DQN with other DQN architectures that have been recently presented in the literature. For example, Van Hasselt et al. [van2016deep] proposed a double DQN, by decoupling the selected action from the target network that reduced the observed overestimation and better performance. Also, in [alansary2019evaluating], combination of double dueling approaches have shown to outperform the original DQN. In this experiment, We train four DQN variants namely, DQN, Double DQN, Dueling DQN and Double Dueling DQN. For evaluation, we apply 2d rigid transformation on random crops from our test set that contains 29,440 B-scans (within 10 pixels and 10 degrees for a crop size of 8484). Performance measures are reported in Table 3 which shows a very slight performance differences between the variants. Double DQN has the best performance with NMI and of 0.98 and 0.97 in order. Also, original DQN has achieved the least performance which aligns with the results presented in [wang2015dueling, van2016deep, alansary2019evaluating].

Furthermore, we compare our unsupervised rewarding approach with the supervised one as in [liao2017artificial] (i.e. using ground truth transformation parameters). Surprisingly, the unsupervised rewarding approach outperforms the supervised based training with roughly 4% improvement.

NMI Episode score Time (sec)
Unsupervised Training
DQN 0.969 0.083 0.958 0.077 3.796 2.687 0.3350.469
Dueling DQN 0.9730.085 0.959 0.080 9.4754.341 0.249 0.265
Double 0.9780.070 0.9650.066 9.7384.487 0.198 0.216
Double Dueling DQN 0.9740.081 0.9600.075 9.5944.010 0.2480.264
Supervised Training
Dueling DQN 0.9340.147 0.902 0.121 5.97913.399 0.3880.218
Table 3: Statistical measures for different variants of DQN with supervised and unsupervised training using 29,440 B-scans for evaluation


Registration is a critical step in automated analysis of medical images for monitoring patients, and access to an accurate unsupervised registration method is of immense value, given the costly practice of curating annotated images as needed in supervised methods. In this paper, we lay out a novel framework for unsupervised 2D rigid registration of medical images, in particular OCT volumes of retina, which takes advantage of intensity-based techniques, resulting to state-of-the-art performance. In doing so, an artificial agent is presented that is able to efficiently align two B-scans by finding the 2D transformation parameters. The agent is trained using dueling deep Q-network in an unsupervised manner, where a combination of intensity based image similarity measures are used to guide the rewarding system. The proposed DDQN-OCT model markedly outperforms the elastix intensity based medical image registration approach. Also, the proposed framework has shown the strong potential to be applied to other applications.