Deep Reinforcement Learning for L3 Slice Localization in Sarcopenia Assessment

by   Othmane Laousy, et al.

Sarcopenia is a medical condition characterized by a reduction in muscle mass and function. A quantitative diagnosis technique consists of localizing the CT slice passing through the middle of the third lumbar area (L3) and segmenting muscles at this level. In this paper, we propose a deep reinforcement learning method for accurate localization of the L3 CT slice. Our method trains a reinforcement learning agent by incentivizing it to discover the right position. Specifically, a Deep Q-Network is trained to find the best policy to follow for this problem. Visualizing the training process shows that the agent mimics the scrolling of an experienced radiologist. Extensive experiments against other state-of-the-art deep learning based methods for L3 localization prove the superiority of our technique which performs well even with limited amount of data and annotations.


Controlled Deep Reinforcement Learning for Optimized Slice Placement

We present a hybrid ML-heuristic approach that we name "Heuristically As...

Knowledge Transfer in Deep Reinforcement Learning for Slice-Aware Mobility Robustness Optimization

The legacy mobility robustness optimization (MRO) in self-organizing net...

Deep Reinforcement Learning for Organ Localization in CT

Robust localization of organs in computed tomography scans is a constant...

Reinforcement Learning-based Automatic Diagnosis of Acute Appendicitis in Abdominal CT

Acute appendicitis characterized by a painful inflammation of the vermif...

Effective Medical Test Suggestions Using Deep Reinforcement Learning

Effective medical test suggestions benefit both patients and physicians ...

Spreading Factor and RSSI for Localization in LoRa Networks: A Deep Reinforcement Learning Approach

Recent advancements in Internet of Things (IoT) technologies have result...

Active Phase-Encode Selection for Slice-Specific Fast MR Scanning Using a Transformer-Based Deep Reinforcement Learning Framework

Purpose: Long scan time in phase encoding for forming complete K-space m...

1 Introduction

Sarcopenia corresponds to muscle atrophy which may be due to ageing, inactivity, or disease. The decrease of skeletal muscle is a good indicator of the overall health state of a patient [11]. In oncology, it has been shown that sarcopenia is linked to outcome in patients treated by chemotherapy [14, 5], immunotherapy [20], or surgery [9]. There are multiple definitions of sarcopenia [7, 23] and consequently multiple ways of assessing it. On CT imaging, the method used is based on muscle mass quantification. Muscle mass is most commonly assessed at a level passing through the middle of the third lumbar vertebra area (L3), which has been found to be representative of the body composition [29]. After manual selection of the correct CT slice at the L3 level, segmentation of muscles is performed to calculate the skeletal muscle area [8]. In practice, the evaluation is tedious, time-consuming, and rarely done by radiologists, highlighting the need for an automatic diagnosis tool that could be integrated into clinical practice. Such automated measurement of muscle mass could be of great help for introducing sarcopenia assessment in daily clinical practice.

Muscle segmentation and quantification on a single slice have been thoroughly addressed in multiple works using simple 2D U-Net like architectures [4, 6]. Few works, however, focus on L3 slice detection. The main challenges for solving this task rely on the inherent diversity in patient’s anatomy, the strong resemblance between vertebrae, and the variability of CT fields of view as well as their acquisition and reconstruction protocols.

The most straightforward approach to address L3 localization is by investigating methods for multiple vertebrae labeling in 3D images using detection [25] or even segmentation algorithms [22]. Such methods require a substantial volume of annotations and are computationally inefficient when dealing with the entire 3D CT scan. In fact, even if our input is 3D, a one-dimensional output as the -coordinate of the slice is sufficient to solve the L3 localization problem.

In terms of L3 slice detection, the closest methods leverage deep learning [3, 13]

and focus on training simple convolutional neural networks (CNN). These techniques use maximal intensity projection (MIP), where the objective is to project voxels with maximal intensity values into a 2D plane. Frontal view MIP projections contain enough information towards the body and vertebra’s bone structure differentiation. On the sagittal view, restricted MIP projections are used to focus solely on the spinal area. In 

[3] the authors tackle this problem through regression, training the CNN with parts of the MIP that contain the L3 vertebra only. More recently, in [13] a UNet-like architecture (L3UNet-2D) is proposed to draw a 2D confidence map over the position of the L3 slice.

In this paper, we propose a reinforcement learning algorithm for accurate detection of the L3 slice in CT scans, automatizing the process of sarcopenia assessment. The main contribution of our paper is a novel formulation for the problem of L3 localization, exploiting different deep reinforcement learning (DRL) schemes that boost the state of the art for this challenging task, even on scarce data settings. Moreover, in this paper we demonstrate that the use of 2D approaches for vertebrae detection provides state of the art results compared to 3D landmark detection methods, simplifying the problem, reducing the search space and the amount of annotations needed. To the best of our knowledge, this is the first time that a reinforcement learning algorithm is explored on vertebrae slice localization reporting performances similar to medical experts and opening new directions for this challenging task.

2 Background

Reinforcement Learning is a fundamental tool of machine learning which allows dealing efficiently with the exploration/exploitation trade-off 

[24]. Given state-reward pairs, a reinforcement learning agent can pick actions to reach unexplored states or increase its accumulated future reward. Those principles are appealing for medical applications because they imitate a practitioner’s behavior and self-learn from experience based on ground-truth. One of the main issues of this class of algorithm is its sample complexity: a large amount of interaction with its environment is needed before obtaining an agent close to an optimal state [26]. However, those techniques were recently combined with deep learning approaches, which efficiently addressed this issue [17]

by incorporating priors based on neural networks. In the context of highly-dimensional computer vision applications, this approach allowed RL algorithms to obtain outstanding accuracy 

[12] in a variety of tasks and applications.

In medical imaging, model-free reinforcement learning algorithms are highly used for landmark detection [10] as well as localization tasks [16]. In [1], a Deep Q-Network (DQN) that automates the view planning process on brain and cardiac MRI was proposed. This framework takes as an input a single plane and updates its angle and position during the training process until convergence. Moreover, in [27] the authors again present a DQN framework for the localization of different anatomical landmarks introducing multiple agents that act and learn simultaneously. DRL has also been studied for object or lesion localization. More recently, in [19] the authors propose a DQN framework for the localization of different organs from CT scans achieving comparable to supervised CNNs performance. This framework uses a D volume as input with different actions to generate bounding boxes for these organs. Our work is the first to explore and validate a RL scheme on MIP representations for a single slice detection using the discrete and 2D nature of the problem.

3 Reinforcement Learning Strategy

In this paper, we formulate the slice localization problem as a Markov Decision Process (MDP), which contains a set of states , actions , and rewards .

States : For our formulation, the environment that we explore and exploit is a D image representing the frontal MIP projection of the D CT scans. This projection allows us to reduce our problem’s dimensionality from a volume of size ( being the varying heights of the volumes) to an image of size . The reinforcement learning agent is self-taught by interacting with this environment, executing a set of actions, and receiving a reward linked to the action taken. An input example is shown in Figure 1. We define a state as an image of size in . We consider the middle of the image to be the slice’s current position on a -axis. To highlight this, we assign a line of maximum intensity pixel value to the middle of each image provided as input to our DQN.

Actions : We define a set of discrete actions . corresponds to a positive translation (going up by one slice) and corresponds to a negative translation (going down by one slice). These two actions allow us to explore the entirety of our environment . Experiments with a stop action did not improve the agent’s accuracy since adding more actions increases the complexity of the task to learn.

Rewards : In reinforcement learning designing a good reward function is crucial in learning the goal to achieve. To measure the quality of taking an action we use the distance over between the current slice and the annotated slice . The reward for non-terminating states is computed with:


where we denote as and the current and next state and the ground truth annotation. Moreover, our positions and are the -coordinates of the current and next state respectively. is the Euclidean distance between both coordinates over the -axis. The reward is non-sparse and binary and helps the agent differentiate between good and bad actions. A good action being when the agent gets closer to the correct slice. For a terminating state, we assign a reward of .

Starting States: An episode starts by randomly sampling a slice over the -axis and ends when the agent has achieved its goal of finding the right slice. The agent then executes a set of actions and collects rewards until the episode terminates. When reaching the upper or lower borders of an image, the current state is assigned to the next state (i.e., the agent does not move), and a reward of is appointed to this action.

Final States: During training, a terminal state is defined as a state in which the agent has reached the right slice. A reward of is assigned in this case, and an episode is terminated. During testing, the termination of an episode happens when oscillations occur. We adopted the same approach as [1], and chose actions with the lowest -value, which have been found to be closest to the right slice since the DQN outputs higher -values to actions when the current slice is far from the ground truth.

3.1 Deep Q-Learning

To find the optimal policy of the MDP, a state-action value function

can be learned. In Q-Learning, the expected value of the accumulated discounted future rewards can be estimated recursively using the Bellman optimality equation:


In practice, since the state is not easily exploitable, we can take advantage of neural networks as universal function approximators to approximate  [18]. We utilize an experience replay technique that consists in storing the agent’s experience at each time step in a replay memory . To break the correlation between consecutive samples, we will uniformly batch a set of experiences from . The Deep Q-Network (DQN) will iteratively optimize its parameters

by minimizing the following loss function:


with and being the parameters of the policy and the target network respectively. To stabilize rapid policy changes due to the distribution of the data and the variations in Q-values, the DQN uses , a fixed version of that is updated periodically. For our experiments, we update every 50 iterations.

3.2 Network Architecture

Our Deep Q-Network takes as input the state

and passes it through a convolutional network. The network contains four convolution layers separated by parametric ReLU in order to break the linearity of the network, and four linear layers with LeakyReLU. Contrary to 

[1], we chose not to add the history of previously visited states in our case. We opted for this approach since there is a single path that leads to the right slice. This approach allows us to simplify our problem even more. Ideally, our agent should learn, just by looking at the current state, whether to go up or down when the current slice is respectively below or above the L3 slice. An overview of our framework is presented in Figure 1.

Figure 1: The implemented Deep Q-Network architecture for L3 slice localization. The network takes as input an image of size with a single channel. The output is the q-values corresponding to each of the two actions.

We also explore dueling DQNs from [28]. Dueling DQNs rely on the concept of an advantage which calculates the benefit that each action can provide. The advantage is defined as with being our state value function. This algorithm will use the advantage of the Q-values to distinguish between actions from the state’s baseline values. Dueling DQNs were shown to provide more robust agents that are wiser in choosing the next best action. For our DQN, we use the same architecture as the one in Figure 1 but change the second to last fully connected layer to compute state values on one side, and action values on the other.

3.3 Training

Since our agent is unaware of the possible states and rewards in , the exploration step is implemented first. After a few iterations, our agent can start exploiting what it has learned on . In order to balance between exploration and exploitation, we use an -greedy strategy. This strategy consists of defining an exploration rate , which is initialed to with a decay of , allowing the agent to become greedy and exploit the environment. A batch size of and an experience replay of

are used. The entire framework was developed in Pytorch 

[21] library using an NVIDIA GTX 1080Ti GPU. We trained our model for episodes, requiring approximately - hours. A visualization of the training process is provided as supplementary material.

4 Experiments and Results

4.1 Dataset

A diverse dataset of CT scans has been retrospectively collected for this study. CT scans were acquired on 4 different CT models from 3 manufacturers (Revolution HD from GE healthcare, Milwaukee, WI; Brillance 16 from Philips Healthcare, Best, Netherlands; and Somatom AS+ & Somatom Edge from Siemens Healthineer, Erlangen, Germany). Exams were either abdominal, thoracoabdominal, or thoraco-abdominopelvic CT scans acquired with or without contrast media injection. Images were reconstructed using abdominal kernel with either filtered back-projection or iterative reconstruction. Slice thickness ranged from to mm, and the number of slices varied from to . The heterogeneity of our dataset highlights the challenges of the problem from a clinical perspective.

Experienced radiologists manually annotated the dataset, indicating the position of the middle of the L3 slice. Before computing the MIP, all of the CT scans are normalized to over the -axis. This normalisation step harmonises our network’s input, especially since the agent performs actions along the -axis. After the MIP, we apply a threshold of HU (Hounsfield Unit) to HU allowing us to eliminate artifacts and foreign metal bodies while keeping the skeleton structure. The MIP are finally normalized to [0,1]. From the entire dataset, we randomly selected patients for testing and the rest for training and validation. For the testing cohort, annotations of L3 from a second experienced radiologist have been provided to measure the interobserver performance.

4.2 Results and Discussion

Our method is compared with other techniques from the literature. The error is calculated as the distance in millimeters () between the predicted L3 slice and the one annotated by the experts. In particular, we performed experiments with the L3UNet-2D [13] approach and the winning SC-Net [22] method of the Verse2020111 challenge. Even if SC-Net is trained on more accurate annotations with 3D landmarks as well as vertebrae segmentations, and addresses a different problem, we applied it to our testing cohort. The comparison of the different methods is summarised in Table 1. SC-Net reports CT scans with an error higher than mm. Moreover, L3UNet-2D [13] reports a mean error of mm mm when the method is trained on the entire training set, giving only scans with an error higher than mm for the L3 detection. Our proposed method gives the lowest errors with a mean error of mmmm, proving its superiority. Finally, we evaluated our technique’s performance with a Duel DQN strategy, reporting higher errors than the proposed one. This observation could be linked to the small action space that is designed for this study. Duel DQNs were proven to be powerful in cases with higher action spaces and in which the computation of the advantage function makes a difference.

Method # of samples Mean Std Median Max Error
Interobserver - 2.04 4.36 1.30 43.19 1
SC-Net [22] - 6.78 13.96 1.77 46.98 12
L3UNet-2D [13] 900 4.24 6.97 2.19 40 7
Ours (Duel-DQN) 900 4.30 5.59 3 38 8
Ours 900 3.77 4.71 2.0 24 9
L3UNet-2D [13] 100 145.37 161.91 32.8 493 68
Ours 100 5.65 5.83 4 26 19
L3UNet-2D [13] 50 108.7 97.33 87.35 392.02 86
Ours 50 6.88 5.79 6.5 26 11
L3UNet-2D [13] 10 242.85 73.07 240.5 462 99
Ours 10 8.97 8.72 7 56 33
Table 1: Quantitative evaluation of the different methods using different number of training samples (metrics in mm).

For the proposed reinforcement learning framework, trained on the full training set, 9 CTs had a detection error of more than mm. These scans were analysed by a medical expert who indicated that of them have a lumbosacral transitional vertebrae (LSTV) anomaly [15]. Transitional vertebrae cases are common and observed in 15-35% of the population [2] highlighting once again the challenges of this task. For both cases, the localization of the L3 vertebra for sarcopenia assessment is ambiguous for radiologists and consequently for the reinforcement learning agent. In fact, the only error higher than in the interobserver comparison corresponds to an LSTV case where each radiologist chose a different vertebrae as a basis for sarcopenia assessment. Even if the interobserver performance is better than the one reported by the algorithms, our method reports the lowest errors, proving its potential.

Qualitative results are displayed in Figure 2

as well as in the supplementary materials. The yellow line represents the medical expert’s annotation and the blue one the prediction of the different employed models. One can notice that all of the different methods converge to the correct L3 region with our method reporting great performance. It is important to note that for sarcopenia assessment, an automatic system does not need to be at the exact middle of the slice; a few millimeters around will not skew the end result since muscle mass in the L3 zone does not change significantly. Concerning prediction times for the RL agent, they depend on the initial slice that is randomly sampled on the MIP. Computed inference time for a single step is approximately 0.03 seconds.

Figure 2: Qualitative comparison of different localization methods for two patients. First left to right represents: interobserver (/ ), SC-Net (/), L3UNet-2D (/), Ours (/). In the parenthesis we present the reported errors for the first and second row respectively. The yellow line represents the ground truth and the blue one the prediction.

To highlight the robustness of our network on a low number of annotated samples, we performed different experiments using 100, 50, and 10 CTs corresponding respectively to 10%, 5% and 1% of our dataset. We tested those 3 agents on the same 100 patients test set and report results in Table 1. Our experiments prove the robustness of reinforcement learning algorithms compared to traditional CNN based ones [13] in the case of small annotated datasets. One can observe that the traditional methods fail to be trained properly with a small number of annotations, reporting errors higher than mm for all three experiments. Learning a valid policy from a low number of annotations is one of the strengths of reinforcement learning. In our case, trained with less data increase the reported detection error, however trained on only 10 CTs with the same number of iterations and memory size, our agent was able to learn a correct policy and achieve a mean error of mm mm. Traditional deep learning techniques rely on pairs of images and annotations in order to build a robust generalization. Thus, each pair is exploited only once by the learning algorithm. Reinforcement learning, however, relies on experiences, each experience being a tuple of state, action, reward, next state and next action. Therefore, a single CT scan can provide multiple experiences to the self-learning agent, making our method ideal for slice localization problems using datasets with limited amount of annotations.

5 Conclusion

In this paper, we propose a novel direction to address the problem of CT slice localization. Our experiments empirically prove that reinforcement learning schemes work very well on small datasets and boost performance compared to classical convolutional architectures. One limitation of our work lies in the fact that our agent is always moving independently of the location, slowing down the process. In the future, we aim to explore different ways to adapt the action taken depending on the current location, with one possibility being to incentivize actions with higher increments. Future work also includes the use of reinforcement learning in multiple vertebrae detection with competitive or collaborative agents.


  • [1] A. Alansary, L. Le Folgoc, G. Vaillant, O. Oktay, Y. Li, W. Bai, J. Passerat-Palmbach, R. Guerrero, K. Kamnitsas, B. Hou, et al. (2018) Automatic view planning with multi-scale deep reinforcement learning agents. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 277–285. Cited by: §2, §3.2, §3.
  • [2] A. Apazidis, P. A. Ricart, C. M. Diefenbach, and J. M. Spivak (2011) The prevalence of transitional vertebrae in the lumbar spine. The Spine Journal 11 (9), pp. 858–862. External Links: ISSN 1529-9430 Cited by: §4.2.
  • [3] S. Belharbi, C. Chatelain, R. Hérault, S. Adam, S. Thureau, M. Chastan, and R. Modzelewski (2017)

    Spotting l3 slice in ct scans using deep convolutional network and transfer learning

    Computers in biology and medicine 87, pp. 95–103. Cited by: §1.
  • [4] P. Blanc-Durand, J.-B. Schiratti, K. Schutte, P. Jehanno, P. Herent, F. Pigneur, O. Lucidarme, Y. Benaceur, A. Sadate, A. Luciani, O. Ernst, A. Rouchaud, M. Creze, A. Dallongeville, et al. (2020) Abdominal musculature segmentation and surface prediction from ct using deep learning for sarcopenia assessment. Diagnostic and Interventional Imaging 101 (12), pp. 789–794. Cited by: §1.
  • [5] F. Bozzetti (2017) Forcing the vicious circle: sarcopenia increases toxicity, decreases response to chemotherapy and worsens with chemotherapy. Annals of Oncology 28 (9), pp. 2107–2118. Note: A focus on esophageal squamous cell carinoma External Links: ISSN 0923-7534 Cited by: §1.
  • [6] J. Castiglione, E. Somasundaram, L. A. Gilligan, A. T. Trout, and S. Brady (2021) Automated segmentation of abdominal skeletal muscle in pediatric ct scans using deep learning.

    Radiology: Artificial Intelligence

    , pp. e200130.
    Cited by: §1.
  • [7] A. J. Cruz-Jentoft, G. Bahat, J. Bauer, Y. Boirie, O. Bruyère, T. Cederholm, C. Cooper, F. Landi, Y. Rolland, A. A. Sayer, et al. (2019) Sarcopenia: revised european consensus on definition and diagnosis. Age and ageing 48 (1), pp. 16–31. Cited by: §1.
  • [8] B. A. Derstine, S. A. Holcombe, R. L. Goulson, B. E. Ross, N. C. Wang, J. A. Sullivan, G. L. Su, and S. C. Wang (2017) Quantifying Sarcopenia Reference Values Using Lumbar and Thoracic Muscle Areas in a Healthy Population. J Nutr Health Aging 21 (10), pp. 180–185. Cited by: §1.
  • [9] Y. Du, C. J. Karvellas, V. Baracos, D. C. Williams, and R. G. Khadaroo (2014) Sarcopenia is a predictor of outcomes in very elderly patients undergoing emergency surgery. Surgery 156 (3), pp. 521–527. External Links: ISSN 0039-6060 Cited by: §1.
  • [10] F. C. Ghesu, B. Georgescu, S. Grbic, A. Maier, J. Hornegger, and D. Comaniciu (2018-08) Towards intelligent robust detection of anatomical structures in incomplete volumetric data. Med Image Anal 48, pp. 203–213. Cited by: §2.
  • [11] L. A. Gilligan, A. J. Towbin, J. R. Dillman, E. Somasundaram, and A. T. Trout (2020-04-01) Quantification of skeletal muscle mass: sarcopenia as a marker of overall health in children and adults. Pediatric Radiology 50 (4), pp. 455–464. External Links: ISSN 1432-1998 Cited by: §1.
  • [12] J. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, et al. (2020)

    Bootstrap your own latent: a new approach to self-supervised learning

    arXiv preprint arXiv:2006.07733. Cited by: §2.
  • [13] F. Kanavati, S. Islam, E. O. Aboagye, and A. Rockall (2018) Automatic l3 slice detection in 3d ct images using fully-convolutional networks. External Links: 1811.09244 Cited by: §1, §4.2, §4.2, Table 1.
  • [14] J. Lee, C. Chang, J. Lin, M. Wu, F. Sun, Y. Jan, S. Hsu, and Y. Chen (2018) Skeletal muscle loss is an imaging biomarker of outcome after definitive chemoradiotherapy for locally advanced cervical cancer. Clinical Cancer Research 24 (20), pp. 5028–5036. Cited by: §1.
  • [15] J. Lian, N. Levine, and W. Cho (2018-05-01) A review of lumbosacral transitional vertebrae and associated vertebral numeration. European Spine Journal 27 (5), pp. 995–1004. External Links: ISSN 1432-0932 Cited by: §4.2.
  • [16] G. Maicas, G. Carneiro, A. P. Bradley, J. C. Nascimento, and I. Reid (2017) Deep reinforcement learning for active breast lesion detection from dce-mri. In Medical Image Computing and Computer Assisted Intervention - MICCAI 2017, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne (Eds.), Cham, pp. 665–673. External Links: ISBN 978-3-319-66179-7 Cited by: §2.
  • [17] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Cited by: §2.
  • [18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis (2015-02-01) Human-level control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. External Links: ISSN 1476-4687 Cited by: §3.1.
  • [19] F. Navarro, A. Sekuboyina, D. Waldmannstetter, J. C. Peeken, S. E. Combs, and B. H. Menze (2020) Deep reinforcement learning for organ localization in ct. In Medical Imaging with Deep Learning, pp. 544–554. Cited by: §2.
  • [20] N. Nishioka, J. Uchino, S. Hirai, Y. Katayama, A. Yoshimura, N. Okura, K. Tanimura, S. Harita, T. Imabayashi, Y. Chihara, N. Tamiya, Y. Kaneko, T. Yamada, and K. Takayama (2019) Association of sarcopenia with and efficacy of anti-pd-1/pd-l1 therapy in non-small-cell lung cancer. Journal of Clinical Medicine 8 (4). External Links: ISSN 2077-0383 Cited by: §1.
  • [21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §3.3.
  • [22] C. Payer, D. Štern, H. Bischof, and M. Urschler (2020) Coarse to fine vertebrae localization and segmentation with spatialconfiguration-net and u-net. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, Vol. 5, pp. 124–133. External Links: Document Cited by: §1, §4.2, Table 1.
  • [23] V. Santilli, A. Bernetti, M. Mangone, and M. Paoloni (2014) Clinical definition of sarcopenia. Clinical cases in mineral and bone metabolism 11 (3), pp. 177. Cited by: §1.
  • [24] R. S. Sutton and A. G. Barto (2018) Reinforcement learning: an introduction. MIT press. Cited by: §2.
  • [25] A. Suzani, A. Seitel, Y. Liu, S. Fels, R. N. Rohling, and P. Abolmaesumi (2015) Fast automatic vertebrae detection and localization in pathological ct scans - a deep learning approach. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), pp. 678–686. Cited by: §1.
  • [26] J. Tarbouriech, E. Garcelon, M. Valko, M. Pirotta, and A. Lazaric (2020) No-regret exploration in goal-oriented reinforcement learning. In International Conference on Machine Learning, pp. 9428–9437. Cited by: §2.
  • [27] A. Vlontzos, A. Alansary, K. Kamnitsas, D. Rueckert, and B. Kainz (2019) Multiple landmark detection using multi-agent reinforcement learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 262–270. Cited by: §2.
  • [28] Z. Wang, N. de Freitas, and M. Lanctot (2015) Dueling network architectures for deep reinforcement learning. CoRR abs/1511.06581. External Links: Link, 1511.06581 Cited by: §3.2.
  • [29] D. Zopfs, S. Theurich, N. Große Hokamp, J. Naetlitz, L. Gerecht, J. Borggrefe, M. Schlaak, and D. Pinto dos Santos (2019-11) Single-slice ct measurements allow for accurate assessment of sarcopenia and body composition. European radiology 30, pp. . Cited by: §1.