Log In Sign Up

Decoding the Rejuvenating Effects of Mechanical Loading on Skeletal Maturation using in Vivo Imaging and Deep Learning

by   Pouyan Asgharzadeh, et al.

Throughout the process of aging, deterioration of bone macro- and micro-architecture, as well as material decomposition result in a loss of strength and therefore in an increased likelihood of fractures. To date, precise contributions of age-related changes in bone (re)modeling and (de)mineralization dynamics and its effect on the loss of functional integrity are not completely understood. Here, we present an image-based deep learning approach to quantitatively describe the dynamic effects of short-term aging and adaptive response to treatment in proximal mouse tibia and fibula. Our approach allowed us to perform an end-to-end age prediction based on μCT images to determine the dynamic biological process of tissue maturation during a two week period, therefore permitting a short-term bone aging prediction with 95% accuracy. In a second application, our radiomics analysis reveals that two weeks of in vivo mechanical loading are associated with an underlying rejuvenating effect of 5 days. Additionally, by quantitatively analyzing the learning process, we could, for the first time, identify the localization of the age-relevant encoded information and demonstrate 89% load-induced similarity of these locations in the loaded tibia with younger bones. These data suggest that our method enables identifying a general prognostic phenotype of a certain bone age as well as a temporal and localized loading-treatment effect on this apparent bone age. Future translational applications of this method may provide an improved decision-support method for osteoporosis treatment at low cost.


page 5

page 8

page 10

page 12

page 13

page 15

page 28

page 29


Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects

We consider the task of identifying and estimating the causal effect of ...

Micro-level Reserving for General Insurance Claims using a Long Short-Term Memory Network

Detailed information about individual claims are completely ignored when...

On the effect of age perception biases for real age regression

Automatic age estimation from facial images represents an important task...

Image-based Treatment Effect Heterogeneity

Randomized controlled trials (RCTs) are considered the gold standard for...

SkyCam: A Dataset of Sky Images and their Irradiance values

Recent advances in Computer Vision and Deep Learning have enabled astoni...

A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age

This study applies a technique to expand the number of images to a level...

G2Φnet: Relating Genotype and Biomechanical Phenotype of Tissues with Deep Learning

Many genetic mutations adversely affect the structure and function of lo...

1 Introduction

Bone is a hierarchical, dynamic, living tissue whose primary function is mechanical integrity, providing protection to internal organs and enabling mobility. Internal and external stimuli cause continuous (re)modeling making bone a highly dynamic structure. Both, bone stiffness and strength depend on properties such as mass and shape, the distribution of mass (geometry and architecture), micro-architecture and microscopic material property distribution Seeman and Delmas (2006). Human and animal studies show that skeletal maturation and aging affect both, bone micro-architecture Szulc et al. (2006); Riggs et al. (2008); Willie et al. (2013) and tissue material properties Roschger et al. (2008); Koehne et al. (2014). Formation and resorption dynamics in trabecular Birkhold et al. (2014a) and cortical bone Birkhold et al. (2014b) are altered with aging in a site-specific manner Birkhold et al. (2017, 2016). As a result, with increasing age a net bone loss occurs Szulc et al. (2006); Ahlborg et al. (2003), often resulting in osteoporosis and a subsequent increase in fragility fracture risk Kanis (2002). The rules governing age-related alterations in bone composition, organization, and elasticity across structural hierarchies are, however, to date not completely understood.

Given the fact that osteoporosis causes worldwide more than 8.9 million fractures per year Hernlund et al. (2013), it is essential to develop a precise and comprehensive analysis of phenotypic changes and abnormalities at all relevant length scales. Assessing the onset of osteoporosis and disease progression is therefore challenging. Within clinical practice, dual energy X-ray absorptiometry (DXA) and biochemical markers remain the standard methods of monitoring osteoporotic patients receiving pharmacological treatments. The T-score is derived from measurements of the areal bone mineral density (aBMD), which is obtained by DXA Kanis et al. (2008). DXA is a useful clinical tool, but has several limitations including restriction to a two-dimensional image, lack of distinction between trabecular and cortical bone, lack of information on bone microarchitecture, difficulties in edge detection and projection artefacts. Additionally, the predictive ability of this method is low Riggs and Melton (2002); Bolotin (2007) with less than half of all nonvertebral fractures occurring in postmenopausal women having an osteoporotic T-score Schuit et al. (2004). Biochemical markers are indirect indicators of the rates of formation and resorption of bone and give no insight into its quality or mechanical properties. Furthermore, like all biochemical markers they are subject to pre-analytical, analytical and post-analytical sources of variability and the results may be affected by a range of non-skeletal conditions. High-resolution peripheral quantitative computed tomography (HR-pQCT) is emerging as a powerful non-invasive bone imaging modality capable of assessing volumetric BMD, microarchitecture and strength, and distinguishing cancellous and cortical bone. Additionally, micro-finite element and homogenized finite element models based on HR-pQCT imaging are increasingly used to predict bone stiffness and strength Dall’Ara et al. (2012); Zysset et al. (2013). The Bone Microarchitecture International Consortium (BoMIC), which combined individual-level prospective data from eight cohorts (7254 individuals, mean age: years), recently reported that HR-pQCT parameters improved fracture prediction beyond femoral neck aBMD or fracture risk assessment tool (FRAX) scores alone Samelson et al. (2019).

Preclinical studies using micro-computed tomography (CT) aiming to assess bone maturation and (re)modeling have focused on selectively extracting mechanical or morphological features such as mineralization Ferguson et al. (2003) or bone volume Halloran et al. (2002); Ko et al. (2011); Moustafa et al. (2009) and their alterations Lukas et al. (2013); Birkhold et al. (2017). Although these approaches decode certain aspects of structural changes in bone, they neglect the underlying interplay and concurrency of (re)modeling and (de)mineralization. The measures extracted from these properties are selective and therefore not sufficient to predict fracture in diseases such as osteoporosis. To provide more precise descriptions of the disease phenotype, the diverse manifestations must be captured allowing one to distinguish healthy bones from diseased ones and young bones from aged ones to define disease onset and progression into sub-classes. This would permit a much more precise understanding of bone quality, as well as a better prediction of fracture risk and treatment outcome.

A major challenge in disease diagnosis is interpreting information-rich (imaging) data. This challenge is at the same time a great opportunity, as there exits nowadays artificial intelligence-based methods that have the capabilities and power to analyze relationships within rich datasets, e.g., relationships of particular dynamic biological phenomena. Artificial intelligence, for example, has been used to diagnose Alzheimer disease based on Magnetic Resonance Imaging (MRI)

Suk et al. (2014) or to analyze skin lesion for diagnosing malignancy Esteva et al. (2017). Similar to the previous two examples, one can also use artificial intelligence to analyze (re)modeling of bone using X-ray images (eg. CT, HR-pQCT). As scatter and attenuation information of CT images contain information about material composition, distribution and amount, they potentially contain all structural information that is needed to asses bone maturation Mosekilde (1988). Despite the fact that recent studies can extract from 2D and 3D X-ray image data more features describing bone quality through assessment of vBMD and microstructure  Burghardt et al. (2010); Bouxsein et al. (2010); Mader et al. (2013); Birkhold et al. (2015); Rüegsegger et al. (1976); Macdonald et al. (2011), information on bone (re)modelling rates are only obtained through invasive histomophometry analysis of iliac crest bone biopsies. Fortunately, recent advances in artificial intelligence towards deep learning now enable further data analysis by utilizing high-throughput image data. Compared to traditional machine learning methods Shen et al. (2017); Litjens et al. (2017)

, deep learning methods do not only exhibit an improved prediction accuracy, but also provide the ability to visualize learned features, to link discovered features with clinical relevance. The first applications for bone age assessment in pediatrics using deep neural networks (DNN) showed already some success in classifying/predicting bone age from 2D X-ray images

Spampinato et al. (2017); Torres et al. (2017); Lee et al. (2017). Further, they provide confidence that DNN-based methods can also provide insights into the underlying processes of skeletal maturation and bone (re)modeling.

In this study, we present a deep learning approach applied to 2D projection X-ray images of bones as an end-to-end tool for site-specific, spatio-temporal assessment of bone tissue maturation and intervention effects. By simultaneous evaluating several relevant hierarchies, our method allows us to reconstruct continuous biological processes such as aging or adaptation of bone. We developed and evaluated our method on pre-clinical CT data of mouse bones and investigated bone adaptation in response to in vivo tibial compressive loading. To do so, we first evaluate, if our method allows identifying short-term, skeletal maturation-related changes in the proximal tibiae and fibulae based on CT images. Therefore, we analyze short-term (15 days) dynamic skeletal maturation processes in adult female mice bones. Second, we evaluate, if our model can be used to identify treatment results and relate these to load-induced surface (re)modeling ("rejuvenation")-effects. Furthermore, by analyzing the learning process of our DNN through saliency maps, we quantify the spatial localization of the network attention, permitting determination of where in the bone the "skeletal age" information is manifested.

2 Materials and Methods

2.1 In vivo Mechanical Loading

As reported in Birkhold et al. (2014a); Willie et al. (2013), mice (female C57Bl/6J, 26 weeks old at beginning of experiments, Jackson Laboratories, Sulzfeld, Germany) underwent two weeks of in vivo cyclic compressive loading of the left tibia. Loading was applied days/week (M-F) for weeks while mice were anesthetized ( cycles applied daily at Hz, delivering N loads, on the medial surface of the tibial mid-shaft, determined by prior in vivo strain gauging experiments; Fig. 1A). The right tibia served as physiologically loaded internal control. The first loading session occurred on the first day of in vivo imaging.

Figure 1:

Overall work flow and network architecture. (A) Experimental setup of applying load to the bone. (B) Imaged ROI and 3D raw data containing both right (control) and left (loaded) tibiae and fibulae. (C) Cropped 3D image containing only one tibia and fibula. Blue plane indicates an exemplary slice on which the padding step is applied. (D) Each slice (inside the colored stripes) is padded with border stripes of pixels (colored stripes: padding) in each direction: yellow and orange

posterior-anterior, red and green

lateral-medial directions. (E) Sum intensity projection of the 3D image in C after padding. (F) BAAM network architecture including convolutional layers (Con.), pooling layers (Pool), flattened layer (Flat), softmax layer and the four output classes.

2.2 In vivo Monitoring of Tissue Maturation and Adaptation

In total, CT image sets of both proximal tibiae and fibulae were collected during two weeks of tissue maturation (right limb) and adaptation (left limb). Both limbs, including the tibia and fibula, were scanned and combined within one imaging procedure (cf. Fig. 1B). In vivo CT was performed at an isotropic voxel size of (vivaCT40, Scanco Medical, Switzerland; kVp, mA, integration time, no frame averaging). The mice were scanned starting from the growth plate and was extended for slices () in the distal direction. To prevent motion artifacts and obtain reproducible scan regions and bone orientations, mice were anesthetized and kept during the scans in a fixed position using a custom-made mouse bed. In time intervals of days between imaging sessions, sets of images were acquired. Two weeks of aging in mice are approximately equal to one year of aging in humans Dutta and Sengupta (2016). Datasets were previously morphometrically analyzed using formation and resorption dynamics analysis software Birkhold et al. (2014a, b, 2015). Furthermore, previous analysis showed that repeated radiation (4 scans) did not effect bone microstructure Willie et al. (2013). The scanner was calibrated weekly against a hydroxyapative (HA) mineral phantom; and monthly for determining in-plane spatial resolution. Animal experiments were carried out according to the policies and procedures approved by the local legal representative (LAGeSo Berlin, G0333/09).

2.3 Image Preprocessing Pipeline

Raw data were reconstructed using standard filtered backprojection implemented in the software of the scanner. Resulting images were cropped to contain the tibia and fibula in independent image sets (Fig. 1C). These images varied considerably in size, location and orientation of the bone. Therefore, a preprocessing pipeline that standardizes the images is essential for training a deep learning model. The first step of this pipeline normalizes the sizes of the input images, in which, the algorithm determines the maximum extension in lateral-medial and anterior-posterior direction as well as the most distal bone part inside the image. Second, preserving their aspect ratios, a padding of the images in lateral-medial and anterior-posterior direction is performed (Fig. 1D). Therefore, a stripe of additional voxels with the same gray values is placed at the border of the image on the padded plane perpendicular to the padding direction. Next, images are cropped to the minimum -stack number (of all the images) from distal direction. The images are projected in medial-lateral direction onto the anterior-posterior / distal-proximal plane ensuing a image, which, due to the medial-lateral symmetry, nullifies the symmetry-related skeletal difference between the left and right tibia and fibula. After pre-processing, all 2D images are () by () pixels in size (Fig. 1E).

2.4 Assigning Datasets

Datasets were separated into three groups: 1) A training and validation set, 2) a test set to further eliminate the possibility of overfitting of the trained model, and 3) an application set. The training and validation sets consists of images from mice at all time points of the right control tibia. The test set contains images of two randomly selected mice separated from the training and validation set ( image per time point). The application set contains images of the left loaded tibia and fibula.

2.5 Data Augmentation

DNNs require a large amount of training data for stable convergence and high classification accuracy. We therefore performed data augmentation, where we synthetically increase the size of the training set with geometric transformations. As suggested by Krizhevsky et al. (2012); Simard et al. (2003), images of the training and validation sets are augmented by applying rotations ( to ; steps) and translations ( to ; steps in both directions) to increase training accuracy and further prevent overfitting. Maximum augmentation values were chosen to cover potential occurring deviations between images due to the imaging setup. This results in a total of images. Last, images are randomly separated into training and validation sets containing and of the augmented images, respectively.

2.6 Bone Age Assessment Model (BAAM) Network

A customized DNN consisting of layers with four convolutional, two pooling and one fully connected layer (Fig. 1F) was designed. The output layer consists of four classes: day , day , day , day . Its performance compared to other possible architectures was evaluated (Supplemental data A).

By passing an image through the convolutional layers, feature-maps of the image are produced at which the position of the encrypted patterns in kernels are accentuated. This leads to extraction of hidden features of the structure. The deeper the image goes through the network, the more complicated patterns are recognized by the network to perform a classification. At each convolutional layer, the feature map extraction is performed by activating the neurons with the rectified linear function and adding a set of bias terms. These weights and bias values are optimized at each training iteration to provide a higher accuracy.

Feature-maps are down-sampled after the and

convolutional layer with a window size and stride equal to

. The kernels in all convolutional layers are with a stride of . In the consecutive convolutional layers, and

feature maps are computed, respectively. We flatten the activation of the last convolutional layer into a vector and pass it to the ending fully connected layer with

and features from which the last one represents the

age classes. At last, a softmax function is applied on the flattened layer to calculate the probability distribution for each age class.

2.7 Training Algorithm

We trained the model with the Adam optimization algorithm Kingma and Ba (2014)

. The network is initialized with a truncated normal distribution function (standard deviation:

). Training is carried out for iterations with a batch size of images. At every step the model is applied on a batch of images of the validation set. The initial learning rate is set to , then an exponential decay at every step with a

rate is performed. Implementation, training, validation and testing of the network was performed using Google TensorFlow

Abadi et al. (2016) on a computer with a single Nvidia Geforce GTX 1070 GPU.

2.8 Network Performance Evaluation and Validation

Architecture and hyper parameters of BAAM network are chosen based on its accuracy in age prediction in validation and test sets. Based on a sensitivity analysis (A), the best performing DNN was chosen and its age prediction performance was validated by determining the accuracy of the network at three groups to i) verify the capability of BAAM to perform an end-to-end age prediction based on 3D CT image data, ii) eradicate the possibility of overfitting during prediction, iii) demonstrate the capability of transferring the age prediction capability from right to left tibia and fibula, and iv) demonstrate the robustness of the model to predict the age of mice that it has not been trained with: 1) randomly selected images of the validation set, 2) the images of the test set, which contains only images of mice it has not seen before, 3) all (n=) images of day 0 of the loaded left tibia and fibula of the application set. These samples represent physiologically loaded tibia and fibula, as at day of the experiment the left limb has not been subjected to loading treatment yet. Confusion matrices (CM) are determined for all three evaluations, further sensitivity was calculated for all time points.

2.9 Analysis of Key Skeletal Maturation Regions and Features

The trained BAAM network (after last training and validation step) applied to the only physiological loaded right tibia (each time point) is further evaluated to identify key regions and features describing the age of the bones. To determine the regions in the images that the network is focusing on for age prediction, saliency maps are calculated. Respectively, a backpropagation is calculated for computing the vanilla gradients

Simonyan et al. (2013); Erhan et al. (2009). The loss gradient is additionally backpropagated to the input data layer. By taking the L1-norm of the loss gradient of the input layer, the resulting heat map intuitively represents the importance of each pixel for age prediction. These maps convey the locations in the image at which the network focuses to predict the age of the bone. At last, saliency maps are normalized to a range to enable comparability between different images.

To determine the spatial localization of attention of the network in the process of age assessment, six subregions were defined within the proximal tibia and fibula. Therefore, first the tibia and fibula were manually segmented. Next, these two labels were further divided into regions with the same heights ( mm), o.e., proximal (T1, F1), middle (T2, F2), and distal (T3,F3), cf., Fig. 2

. The summation of intensity values of the saliency map in each region normalized to the summation of intensity values of the saliency map in the bone region (tibia and fibula) are defined as a measure to indicate the importance of each region for the age estimation (Attention,


Figure 2: Extracted labels for tibia (purple) and fibula (green) and the 6 regions with equal proximal-distal height on each label. T1-T3 and F1-F3 regions belong to tibia and fibula, accordingly.

2.10 Predicting Rejuvenation Effects on Skeletal Age

The bone age prediction model is further applied to the application set (Fig. 3). The predicted age of each bone at each time point is compared to the actual bone age to investigate rejuvenating effects of load-induced bone (re)modeling on skeletal age. Rejuvenation is defined as the delta age predicted at day 0 and day x divided by the delta time between day 0 and day x. Key regions and features describing the estimated rejuvenated age of the bones are determined (saliency maps).

Figure 3: First, the network is trained on physiologically loaded bones to predict age (top). Second, the trained network is applied on images of bones treated with additional loading to investigate the rejuvenation effects of treatment (bottom).

2.11 Correlations to 3D Bone Volume Changes

A set of 104 3D images were Gaussian filtered and binarized using a global threshold of 273/1000 (456 mg HA/cc), which was determined based on the grey value histogram of the whole ROI. Fibula and tibia were manually separated, automated segmentation was performed to separate trabecular and cortical bone regions of tibiae, as described earlier

Birkhold et al. (2015). Total tibia bone volume (tibia BV, ), tibia trabecular bone volume (tibia tb.BV, ), tibia cortical bone volume (tibia ct.BV, ), and fibula bone volume (fibula BV, ) are determined. Correlations between bone volumes and real/predicted age are determined.

2.12 Statistical Analysis

The effect of loading (left loaded tibia, right control tibia), region (tibia: T1, T2, T3; fibula: F1, F2, F3) and time point (day , , ,

) as well as interactions between terms was assessed using repeated measures ANOVAs. Differences between actual bone age and predicted bone age as well as between loaded and control bones were assessed by paired Student’s t-tests. Values are presented as mean

standard deviation and statistical significance was set at .

3 Results

3.1 Performance Evaluation and Bone Age Prediction

After training iterations, the accuracy of age prediction (training and validation sets) reached and with a loss of and , respectively. After iterations, the accuracy and loss values for training and validation sets were , , and , respectively (Fig. 4A-B). Therefore, the network was considered as trained after iterations.

Figure 4: Age prediction results and validation. (A) Accuracy of age prediction on training set (blue ) and validation set (black ) vs. training step. (B) Calculated loss value of age prediction for training set (blue ) and validation set (black ) vs. training step. Accuracy and loss of age prediction on the validation set is calculated at every

training step. (C) Confusion matrix of applying BAAM on validation set. (D) Confusion matrix of applying BAAM on test set. (E) Confusion matrix of applying BAAM on images with age of day

of application set. (C-E): Predicted age of samples are compared to their actual age. Values are normalized by the number of images per age class.

For randomly selected images of the validation set (, , and images of days , , and , respectively), the network correctly predicted all classes except for day , where were assigned correctly to day 10 and were assigned to day . The resulting confusion matrix for comparison of predicted and true age of the images is almost completely diagonal (Fig. 4C). In the test set, out of images were correctly predicted (Fig. 4D). Only one image of day 15 was wrongly classified as day 0. In the group of images of the left tibia acquired at day , out of images were predicted correctly, only one image was wrongly classified as day , resulting in accuracy (Fig. 4E). With these accuracy values above in all evaluations, we consider our network performing sufficient for age-prediction.

Accuracy in age prediction for validation and test sets of five similar, evaluated network architectures compared to BAAM are detailed in the supplemental data (A).

3.2 Decoding Bone Tissue Maturation Process

Saliency maps were calculated for all correctly classified images. One further image was excluded due to a different orientation of the bone. The different regions received different amounts of attention from the network during the process of age estimation (ANOVA; ; Fig. 5 A-F). Comparing the network attention devoted to different regions of the bone revealed, that at day 0 () and day 5 () T3 received significantly higher attention than all other regions (). At day 10, T2 and T3 () and at day 15 T1, T2 and T3 () received significantly higher attention than the other regions. In general, all tibial regions received at day 0 and day 15 higher attention than all fibular regions (). At day 5 and day 10, only T3 received a higher attention than all fibular regions ().

Figure 5: Maturation process.(A-F) Temporal changes in attention devoted by network to the six regions for age classification in control group. (A-C) Proximal tibia T1, T2, and T3, (D-F) Proximal fibula F1, F2, F3. Data shown as mean standard deviation. * indicates a significant difference. (G-J) 3D visualization of the same tibia and fibula with a longitudinal cut exposing bone micro-architecture for day 0, day 5, day 10 and day 15. The arrow indicates one area in which microstructural remodeling occurred over the 15 days. (K-N) Samples of projected images (gray scale image, top), saliency map (color coded from blue to yellow indicating low to high attention, middle) and the overlay of projected image and its saliency map (bottom). (K) day , (L) day , (M) day , (N) day .

The attention of the network to the different regions changed with time (ANOVA; ). In the tibia, the attention on T1 for age prediction is on day 0 () significantly higher than on day 5 (; ) and day 10 (, ). At day 15, only a trend could be identified (; ; Fig. 5A). T2 receives similar attention at all time points; , , , and . In regions T3 a continuous decrease in attention during maturation takes place (), leading to a significant reduced attention from day 10 onward (). In the fibula, attention to the F1 regions remains small throughout the 15 days (, , , ). The attention to F2 significantly increases with maturation (, ). It also significantly increases in each time step except the time step from day 10 to 15 (day , ). In region F3, attention jumps from at day 0 to at day 5 (). Afterwards a slight drop from (day 10) to occurs at day 15 (). A cross-sectional cut through a 3D representation of one bone at each time-point shows similarities and changes of the structures (Fig. 5G-J). Representative saliency maps are shown in (Fig. 5 K-N). Qualitative analysis of the attention devoted to tibia and fibula reveals that the network focuses on clusters of pixels for its analysis.

3.3 Identifying Rejuvenation Effects of in vivo Loading

To study potential rejuvenation effects of mechanical loading on bones, images of bones subjected to in vivo loading are analyzed (application set). At day , of bones were classified with their actual age. For the loaded bones at day 5, 10, and 15, predicted age differed noticeably from actual age (Fig. 6A). After days of loading, of the images were classified as being 5 days older, as being 5 days younger, and only were identified with their actual age. After days of loading, of the bones were classified as younger than their actual age ( as day and as day ). A total of were identified with their actual age and were classified as to be older. After days of loading, of the bones were classified as younger than their actual age. For , BAMM predicted that the images belong to mice that were 15 days younger, appeared to be 10 days younger and were classified as days younger than their actual age. Only of the bones were classified by its actual age, i.e., day .

Figure 6: Rejuvenation process. (A) Confusion matrix of applying the BAAM network on application set consisting of images of bones treated with extra loading. Predicted age of samples are compared to their actual age. Values are normalized by the number of images per age class. (B) Rejuvenation effect of extra loading treatment vs loading duration. Normalized number of rejuvenated images vs loading period: green. Normalized number of images classified with their actual age vs loading period: gray. Normalized number of images classified older than their actual age vs loading period: blue. The lines are degree polynomial fits to the scattered data. (C) Mean rejuvenation per age group with standard deviation as highlighted area. (D-G) Attention in proximal tibia (T1-T3) and fibula (F1-F3) in control and loaded bones. (D) Day 0. (E) Day 5. (F) Day 10. (G) Day 15. Values are shown as meanstandard deviation. * indicates a significant differences between loaded and control bones (student’s t-test, ).

Investigating the rejuvenation effects during the course of loading (Fig. 6B, green line) reveals that with increasing duration of loading (day day day ) higher percentage of images are classified younger than their actual age (). On the other hand, , , and of samples at day , and , respectively were classified older than their actual age. The amount of samples categorized with their actual age increases from day to day but decreases as loading treatment continues (, and for day , , and , respectively). Resulting mean assigned age was at day 5, at day 10 and at day 15. This results in rejuvenation effects of at day 5, at day 10 () and at day 15 (; Fig. 6C).

Loading affected the attention of the DNN network (ANOVA; ). Additionally, the distribution of attention between regions was affected by loading (ANOVA; ). Comparison of attention in different regions between loaded and control limb reveals that – with the exception of F1 () – there is no significant difference in attention in different regions between the left and right limbs at day 0 (Fig. 6D). After 5 days of loading; there is no significant difference in attention in T1-T3 and F1-F3 (Fig. 6E). After 10 and 15 days of loading; however, the attention in T3 regions is significantly higher in loaded compared to control limbs ( vs. , and vs. , at days 10 and 15, accordingly, (Fig. 6F-G).

3.4 Decoding Rejuvenation Effects of in vivo Loading

To analyze the importance of each region within the process of rejuvenation, saliency maps of images of day 15 of loaded bones predicted as days 0, 5, 10, and 15 were compared to the distribution of attention in control image of each of these specific time points (Fig. 7A-B).

Figure 7: Rejuvenation process. (A, B) Attention at bones subjected to 15 days of loading and classified as day 0, 5, 10, or 15 compared to attention of these groups in the control limbs. (A) Attention to T1-T3 regions (B) Attention to F1-F3 regions. Data shown as mean standard deviation. * indicates a significant difference (student’s t-test, ). (C-F) Samples of projected images (gray scale image, top), saliency map (color coded from blue to yellow indicating low to high attention, middle) and the overlap of projected image and its saliency map (bottom) from loaded group. (C) day . (D) day . (E) day . (F) day .

After 15 days of loading, the region T1, T2, and T3 received , , and T3 of the attention, respectively. F1, F2, adn F3 received , , and of the attention, respectively. In all tibial regions, no significant difference between bones predicted younger and control bones of these specific ages were found for any of the four time points. The single bone (imaged at day 15) predicted as day 0 received attention at T1 (control: ), at T2 (control: ) and at T3 (control: ). The bones predicted as day 5 received at T1 (control: ; ), at T2 (control: ; ) and at T3 (control: ; ). The bones predicted as day 10 received at T1 (control: ; ), at T2 (control: ; ) and at T3 (control: ; ). The bones predicted as day 15 received at T1 (control: ; ) and at T3 (control: ; ). Only for the bones, which were predicted as a loaded bone of age day 15, i.e., no effect of the loading was observed, a significant difference with respect to the control bones being imaged at day 15 could be identified at the T2 region (, vs. control: ; ).

The single bone predicted as day 0 received attention at F1 (control: ), at F2 (control: ) and at F3 (control: ). The bones predicted as day 5 received significant less attention at all fibular regions: F1 (control: ; ), F2 (control: ; ) and F3 (control: ; ). The bones predicted as day 10 received at F1 and F3 comparable attention as the control ones of day 10 ( control: ; ; ; control: ; ). F2 received less attention than the control bones of this age (; control: ; ). The bones predicted as day 15 received at F1 and F3 attention that is comparable to the control ones of day 15 ( control: ; ; ; control: ; ) and F2 received more attention than the control bones of this age (; control: ; ; Fig. 7B). Saliency maps for selected images from each of the four time points are given in Fig. 7C-F.

In summary, distribution of attention was in of the after 15 days analyzed loaded left tibia comparable to the right control of the time point (day 5, 10 or 15) they were predicted to resemble. In the fibula, of comparison cases showed no significant difference (9 comparison cases per tibia/fibula = 3 regions 3 time points). This further confirms that the load-induced bone (re)modeling in these mice generates similarities between older loaded bones and younger control bones, therefore a rejuvenation effect of loading can be concluded.

3.5 Correlations between Rejuvenation and Morphological Changes

The tibia and fibula total bone volume and cortical bone volume did not change significantly within the 15 days (Table 1). Tibial trabecular bone volume decreases significantly from day 0 () to day 15 (; ). Loading significantly affected total tibia bone volume (all time points; ), cortical tibia bone volume (all time points; ) and trabecular tibia bone volume (day 0 and 15; , Table 2). A significant, but weak correlation was found between Tibia BV in loaded limbs and real age, Tibia ct.BV and real age, and Tibia ct.BV and estimated age (Table 3). Additionally, trabecular bone volume (Tibia tb.BV) in control limb correlated significantly, but weak with real age. Trabecular bone volume (Tibia tb.BV) in loaded limb correlated weak, but significantly with estimated age (Table 3).

Morphometry Control Bones
day 0 day 5 day 10 day 15
Fibula BV []
Tibia BV []
Tibia tb.BV []
Tibia ct.BV []
Table 1: Morphometry parameters of tibia and fibula determined based on in vivo CT images at day 0, 5, 10, 15 of right control fibula and tibia subjected only to physiological loading. Data shown as mean standard deviation.
Morphometry Loaded Bones
day 0 day 5 day 10 day 15
Fibula BV []
Tibia BV []
Tibia tb.BV []
Tibia ct.BV []
Table 2: Morphometry parameters of tibia and fibula determined based on in vivo CT images at day 0, 5, 10, 15 of left fibula and tibia subjected to additional axial compression loading. Data shown as mean standard deviation. indicates a significant difference between loaded and control bones (student’s t-test, ).
Correlations Bone Volume vs. Time
Loaded Control
Real age Predicted age Real age
Fibula BV [] 0.01 0.59 0.02 0.29 0.00 0.96
Tibia BV [] 0.15 0.01 0.05 0.11 0.00 0.65
Tibia tb.BV [] 0.00 0.98 0.10 0.12
Tibia ct.BV [] 0.14 0.11 0.02 0.01 0.43
Table 3: Regression analysis: Correlations between morphology (bone volume) and time points (real age and predicted age). Weak correlation are shown in bold, indicates significance of correlations ().

4 Discussion

It is known that bone fracture resistance increases with skeletal maturation and aging, as well as in response to certain therapy. Whole bone fracture resistance is determined by bone quantity, which encompasses geometric, microarchitectural, and material properties (i.e., trabecular architecture, mineralization, crosslinking, microcracks). Little is known about the interplay between all of these properties/factors contributing to compromised or recovered bone quality. Most previous studies have focused on individual features, while lacking global optimization, as individual contributions of static (e.g. trabecular bone volume or bone mineral density distribution) or dynamic (e.g. bone formation rate) features. Here, we propose a deep learning approach to tackle this challenge by creating a model that utilizes the complete content of X-ray attenuation images. We developed a CT image-based method enabling an end-to-end age prediction with high accuracy, that allows identifying bone sub-regions relevant for this classification. We utilized the developed method to study skeletal tissue maturation and the localized rejuvenation effects of in vivo dynamic controlled loading on mouse bone tissue age.

Our results show that complex processes, such as bone tissue maturation and adaptation, can be reconstructed by in vivo imaging-based deep learning and quantitative analysis of learned features. Even subtle changes, as occurring during one remodeling cycle of mice bones Birkhold et al. (2015), could be identified, as our model predicts four time points between 26 and 28 week old mice based on in vivo CT images with an accuracy of more than . This classification is presumably linked to bone mass, shape, micro-architecture and material properties alteration through different length scales, as all these processes change in a site-specific manner during growth, maturation and aging. These changes take place by modeling, remodeling, mineralization and demineralization processes Parfitt (2001); Birkhold et al. (2014a); Buenzli et al. (2018); Lukas et al. (2013); Somerville et al. (2004); Lynch et al. (2011) and are directly or indirectly encoded in the image created by interaction of photons with matter, as the resulting attenuation is, besides bone mass in the photon beam, dependent on the local atomic number, and therefore calcium content.

Additionally, we show a link between the age prediction dynamics to age-related trabecular bone loss occurring between the age of 26 and 28 weeks Willie et al. (2013), which is in accordance with other studies reporting a loss in trabecular bone volume in C57Bl/6 mice from 26 weeks onward Glatt et al. (2007). Therefore, these mice are expected to be in a phase of starting trabecular bone loss. However, in the proximal fibula, we observed an increase of cortical bone volume in the same mice between 19 to 22 weeks Moustafa et al. (2009). Unfortunately, this and other studies investigated the fibula bone volume changes only in young mice during skeletal maturation Moustafa et al. (2009); Ko et al. (2011). In adult humans, higher deterioration of the tibia than the fibula with aging has been reported McNeil et al. (2009). Here we detected no significant changes in the fibula bone volume, therefore also no correlations between age prediction and fibula bone volume. However, it has to be taken into consideration, that our volume of interest was centered on the proximal tibia.

Here, we further showed, that localization of aging-related information in the bone identified by the model can be extracted in a quantitative manner. This localization of age-information effects, here quantified by attention distribution of the network extracted using a saliency map post-processing, differed between regions and was affected by time. The age information seems to be more manifested in tibia than in fibula, as at all time points more than of the attention of the network is focused on the tibia. In line with this, greatest structural changes could be identified in the trabecular region of the tibia, with a loss of bone volume. This trabecular loss has been previously quantified in more detail Birkhold et al. (2014a). In general, the most distal part of the proximal tibia (T3) received the greatest attention, but over time there was decreasing in attention. This is in line with previous described structural changes in the tibia, where a strong loss of trabeculae during adulthood Ferguson et al. (2003); Willie et al. (2013) and a nearly complete disappearance of trabeculae after 78 weeks Birkhold et al. (2014a) is reported. The second highest attention was received by the region located closest to the growth plate (T1), where longitudinal growth, and therefore modeling and primary (fast) mineralization occur. This growth persists with aging, although it slows down after puberty Halloran et al. (2002). Therefore, it can be assumed, that the network focuses on the regions with the most changes in bone volume and density occurring over the monitored 15 days. In the fibula the attention was the lowest in F1. However, it must be considered, that the most proximal part of the fibula is located below the tibia (see Fig. 5G-J). Attention to F2 and F3 varied with time, which might be linked to fibular (re)modeling. In general, attention was located in clusters (see Fig. 5K-N) not individual trabeculae, which let us conclude that mineralization (change in grey values) and (re)modeling together affect the age estimation.

Bones receiving additional loading were estimated to be younger. Already after day 5 of loading, predicted age starts to diverge from actual age, which might suggest a restructuring in response due to the new local loading conditions. From day 10 onward, the bones appear to be significantly younger (4 and 5 days younger after 10 and 15 days of loading, respectively). This might reflect an increase in bone strength after restructuring. However, this needs to be investigated in detail by analyzing orientation of individual trabeculae or in-silico modeling of bone strength. Previous studies showed in adult C57BL/6 mice adaptive adjustments in the shape of formation and resorption sites at trabecular Birkhold et al. (2014a); Lambers et al. (2015) and cortical sites Birkhold et al. (2017), mineralization dynamics Lukas et al. (2013) as well as material properties on a macro- and micro-scale Bergström et al. (2017) in response to loading. These adaptations have been shown to differ between different bone sites, such as endocortical and periosteal surface Birkhold et al. (2016) or at metaphyseal versus diaphyseal sites Birkhold et al. (2017); Bergström et al. (2017)

, leading to bone shape adaptation, as e.g. second moment of inertia at proximal metaphysis changes with loading in 22 week old mice

Carriero et al. (2018). All this information is potentially analyzed in a combined manner by the deep neural network. Here, we established a link to bone volume changes and found a significant correlation between predicted age and cortical as well as trabecular bone volume. Future studies will investigate further correlations, e.g., by quantifying structural changes in sites identified as age-determining.

To investigate, if the observed restructuring leads to a younger appearance of the bones, we compared the distribution of attention on loaded bones after 15 days, classified younger, to the control bones of the classified ages. The similarities of distribution of attention, together with the comparability of correlations in 1) predicted age and trabecular bone volume in loaded bones and 2) real age and trabecular bone volume in control limbs, lets us speculate, that loading induces tibial restructuring towards a younger appearing bone. This rejuvenation is strongly manifested in the dynamic trabecular structure. This is also the only bone compartment, where we detected an age-related short-term loss of bone volume in the control limb and a constant bone volume with time in the loaded limb. This finding is further supported by previous studies investigating trabecular bone gain in response to loading Lynch et al. (2011); Willie et al. (2013); Sugiyama et al. (2010) by comparing loaded proximal tibia of adult mice with internal nonloaded control limbs.

In the cortical compartment of the loaded tibia, a bone volume gain was found, which made this bone compartment most similar to the control bones of day 0. This finding is supported by Sugiyama et al., who found similar load-induced cortical volume gain in the proximal tibia of adult mice Sugiyama et al. (2010). In a previous study, we showed that adaptive bone formation in the lateral metaphyseal region is greater than in the medial region while medial adaptive resorption is greater than in the lateral region. A slight positive anterior and posterior remodeling balance was reported. Further, adaptation occurs by increasing periosteal formation Birkhold et al. (2017). This suggest a structural adaptation of cortical bone for the sake of increasing its moment of inertia as a response to the bending force. In the fibula, regardless of the observed rejuvenation, in out of comparisons, significant differences in attention distribution between loaded bones classified younger and the control bones of the classified age was observed. Therefore, since the fibula is in general assumed to be mechano-sensitive Moustafa et al. (2009), load-induced changes seem to trigger a fibula restructuring that does not lead to a younger appearance of the bone. However, it must be considered, that the imaging region was optimized for tibial analysis. Future studies might include a greater fibula region into the analysis as fibula adaptation is in general not well studied.

To conclude, loading seems to change the appearance of bones towards a younger age. In our study, this effect is greatest in tibiae than in fibulae and mainly manifested in the trabecular region, which is the compartment most affected by bone loss in early osteoporosis Osterhoff et al. (2016). Since fragility fractures in relatively young individuals are mainly vertebral compression fractures, therefore "trabecular fractures" Svedbom et al. (2014), an adaptation towards a younger trabecular bone might be in combination with an macroscopic adapted cortical shell contributing to reducing fracture risk in early osteoporosis.

Current methods assessing bone maturation and aging mainly focus on specific dynamic features Burghardt et al. (2010); Bouxsein et al. (2010); Mader et al. (2013); Birkhold et al. (2015); Rüegsegger et al. (1976); Macdonald et al. (2011). Recently developed machine learning methods, however, consume the complete image content while performing a classification task. Recently, first applications to preclinical and clinical image data of bone showed, that a machine learning classification into healthy and disease state is achievable Sharma et al. (2016); Singh et al. (2017)

. Going from there the next step, we developed a framework allowing for a tracking of dynamic bone changes in physiological aging/maturation conditions. Additionally, by training with physiologically loaded bones and applying the network on in vivo loaded bones, we demonstrate loading causes bone (re)modeling and we further analyzed the dynamics of this bone (re)modeling, without the need for transfer learning

Greenspan et al. (2016); Van Opbroek et al. (2015).

Compared to DNN developed for skeletal bone age assessment in pediatrics Spampinato et al. (2017); Torres et al. (2017); Lee et al. (2017), we therefore went further towards a prognostic tool. Saliency maps were previously suggested to qualitatively visualize the importance of different bones in maturation of pediatric hands Lee et al. (2017). Here, we show, for the first time, a quantitative analysis of the distribution of learned features by employing saliency maps Simonyan et al. (2013). This allows to identify the localization of age-relevant information in bone. Future investigations may analyze the dynamics of aging and treatment-induced regeneration in a site-specific manner.

Our study has also limitations. First, the chosen samples may not represent a larger population. However, we validated our method in a three-fold manner and received accuracy above . Additionally, structural parameters derived from CT and histomorphometric indices of bone formation, which we have measured previously in these mice, are similar to those reported in similarly aged female mice Holguin et al. (2013); Lynch et al. (2011). Adhering to the principles of the three Rs, specifically reduction of animal number, motivated us to re-examine these datasets, rather than perform new studies on additional mice. Second, skeletal aging in mice differs to that in humans Jilka (2013) and thus, translation of these results to human bone behavior requires further investigation. Since age-related bone loss in humans resembles to a certain extend findings in mice Ahlborg et al. (2003); Szulc et al. (2006), it can be speculated that a future application to human bone might resolve similar patterns.

Given the proven performance on reconstructing bone tissue maturation and adaptation processes, we expect our study to pave the way for future studies to investigate a wide variety of biological processes involving continuous morphological changes. This may include developmental and aging stages, the progression of diseases Yang et al. (2017); Pflanz et al. (2017) or the response to treatments, and dynamic processes that have often been reduced to binary classification problems. Automated computational analysis, as shown here, could reveal morphological changes at much earlier stages than recognized previously. Therefore, the effects of loading duration on overall bone adaptation might be another future application, as recently shorter loading durations have been suggested Yang et al. (2017); Sun et al. (2018). Furthermore, as features can be used to classify biological structures based on morphology Asgharzadeh et al. (2018), a combination of the proposed deep learning and saliency maps approach can lead to more detailed insights into the contribution of individual localized biological processes on the overall adaptation "state" of the bone.

Here, we described a method for estimating bone age as a surrogate for bone quality in mice imaged with CT. Our method can be rapidly translated to clinical applications by examining clinical CT (eg. HR-pQCT at a voxel size of 61 microns), which is increasingly used in clinical trials to investigate vBMD and microstructural changes in response to pharmacological treatments Cheung et al. (2014); Tsai et al. (2015).

5 Conclusions

We combined an experimental study with longitudinal imaging and a deep learning framework. This allowed for a bone (tissue) age prediction and an identification of bone sites that primarily contribute to age classification. Thus for the first time, we could directly track short-time aging in bone in a temporal manner. By quantitatively analyzing saliency maps of learned features, we could show that the metaphyseal parts closest and most distant to the growth plate are highly contributing to the temporal age information encoded in bone images during tissue maturation. We could further show that loading triggers dynamic processes leading to a younger appearance of the bone. More specifically we could temporally quantify these rejuvenating effects, as bone receiving 15 days of loading treatment was classified 5 days younger than the contra-lateral internal control. We demonstrated that our loading regime induces structural correspondence between younger and older tibiae, while in fibulae, despite causing (re)modeling, no rejuvenation effect could be detected. One possible biological interpretation of these findings is that loading recovers the age-related bone loss in the tibia - therefore it rejuvenates, whereas the fibula, which is at the age investigated in the present study does not incur bone loss, and therefore is adapting to the loading in a non-physiological manner (in terms of rejuvenation-related strengthening through bone (re)modeling). These findings and the introduced method provide an ideal framework to further improve our understanding of skeletal aging in mice as well as in humans. It further demonstrates that machine-learning based characterization can help to better monitor and understand dynamic changes in bones due to aging and disease and may help to optimize treatments for bone disease such as age-related bone loss and osteoporosis.


This study was partially supported by the Cluster of Excellence EXC 2075 “Data-Integrated Simulation Science (SimTech)” and by Shriners Hospital for Children-Canada.

Conflict of Interest Statement

Annette I. Birkhold (none), Bettina M. Willie (none), Pouyan Asgharzadeh (none), Oliver Röhrle (none).

Author’s Roles

Study design: AB and PA. Data collection: AB, BW. Contributing software tools: PA. Data analysis: PA, AB. AB and PA wrote the manuscript with the help of BW and OR. All authors take responsibility for the integrity of the data analysis.


  • Seeman and Delmas (2006) E. Seeman, P. D. Delmas, Bone quality—the material and structural basis of bone strength and fragility, N. Engl. J. Med. 354 (2006) 2250–2261.
  • Szulc et al. (2006) P. Szulc, E. Seeman, F. Duboeuf, E. Sornay-Rendu, P. D. Delmas, Bone fragility: failure of periosteal apposition to compensate for increased endocortical resorption in postmenopausal women, J. Bone Miner. Res. 21 (2006) 1856–1863.
  • Riggs et al. (2008) B. L. Riggs, L. J. Melton, R. A. Robb, J. J. Camp, E. J. Atkinson, L. McDaniel, S. Amin, P. A. Rouleau, S. Khosla, A population-based assessment of rates of bone loss at multiple skeletal sites: evidence for substantial trabecular bone loss in young adult women and men, J. Bone Miner. Res. 23 (2008) 205–214.
  • Willie et al. (2013) B. M. Willie, A. I. Birkhold, H. Razi, T. Thiele, M. Aido, B. Kruck, A. Schill, S. Checa, R. P. Main, G. N. Duda, Diminished response to in vivo mechanical loading in trabecular and not cortical bone in adulthood of female c57bl/6 mice coincides with a reduction in deformation to load, Bone 55 (2013) 335–346.
  • Roschger et al. (2008) P. Roschger, E. Paschalis, P. Fratzl, K. Klaushofer, Bone mineralization density distribution in health and disease, Bone 42 (2008) 456–466.
  • Koehne et al. (2014) T. Koehne, E. Vettorazzi, N. Küsters, R. Lüneburg, B. Kahl-Nieke, K. Püschel, M. Amling, B. Busse, Trends in trabecular architecture and bone mineral density distribution in 152 individuals aged 30–90years, Bone 66 (2014) 31–38.
  • Birkhold et al. (2014a) A. I. Birkhold, H. Razi, G. N. Duda, R. Weinkamer, S. Checa, B. M. Willie, The influence of age on adaptive bone formation and bone resorption, Biomaterials 35 (2014a) 9290–9301.
  • Birkhold et al. (2014b) A. I. Birkhold, H. Razi, G. N. Duda, R. Weinkamer, S. Checa, B. M. Willie, Mineralizing surface is the main target of mechanical stimulation independent of age: 3d dynamic in vivo morphometry, Bone 66 (2014b) 15–25.
  • Birkhold et al. (2017) A. I. Birkhold, H. Razi, G. N. Duda, S. Checa, B. M. Willie, Tomography-based quantification of regional differences in cortical bone surface remodeling and mechano-response, Calcif. Tissue Int. 100 (2017) 255–270.
  • Birkhold et al. (2016) A. I. Birkhold, H. Razi, G. N. Duda, R. Weinkamer, S. Checa, B. M. Willie, The periosteal bone surface is less mechano-responsive than the endocortical, Sci. Rep. 6 (2016) 23480.
  • Ahlborg et al. (2003) H. G. Ahlborg, O. Johnell, C. H. Turner, G. Rannevik, M. K. Karlsson, Bone loss and bone size after menopause, N. Engl. J. Med. 349 (2003) 327–334.
  • Kanis (2002) J. A. Kanis, Diagnosis of osteoporosis and assessment of fracture risk, The Lancet 359 (2002) 1929–1936.
  • Hernlund et al. (2013) E. Hernlund, A. Svedbom, M. Ivergård, J. Compston, C. Cooper, J. Stenmark, E. V. McCloskey, B. Jönsson, J. A. Kanis, Osteoporosis in the european union: medical management, epidemiology and economic burden, Arch. Osteoporos. 8 (2013) 136.
  • Kanis et al. (2008) J. Kanis, N. Burlet, C. Cooper, P. Delmas, J.-Y. Reginster, F. Borgstrom, R. Rizzoli, European guidance for the diagnosis and management of osteoporosis in postmenopausal women, Osteoporos. Int. 19 (2008) 399–428.
  • Riggs and Melton (2002) B. L. Riggs, L. J. Melton, Bone turnover matters: the raloxifene treatment paradox of dramatic decreases in vertebral fractures without commensurate increases in bone density, J. Bone Miner. Res. 17 (2002) 11–14.
  • Bolotin (2007) H. Bolotin, Dxa in vivo bmd methodology: an erroneous and misleading research and clinical gauge of bone mineral status, bone fragility, and bone remodelling, Bone 41 (2007) 138–154.
  • Schuit et al. (2004) S. Schuit, M. Van der Klift, A. Weel, C. De Laet, H. Burger, E. Seeman, A. Hofman, A. Uitterlinden, J. Van Leeuwen, H. Pols, Fracture incidence and association with bone mineral density in elderly men and women: the rotterdam study, Bone 34 (2004) 195–202.
  • Dall’Ara et al. (2012) E. Dall’Ara, D. Pahr, P. Varga, F. Kainberger, P. Zysset, Qct-based finite element models predict human vertebral strength in vitro significantly better than simulated dexa, Osteoporos. Int. 23 (2012) 563–572.
  • Zysset et al. (2013) P. K. Zysset, E. Dall’Ara, P. Varga, D. H. Pahr, Finite element analysis for prediction of bone strength, Bonekey Rep. 2 (2013) 386.
  • Samelson et al. (2019) E. J. Samelson, K. E. Broe, H. Xu, L. Yang, S. Boyd, E. Biver, P. Szulc, J. Adachi, S. Amin, E. Atkinson, C. Berger, L. Burt, R. Chapurlat, T. Chevalley, S. Ferrari, D. Goltzman, D. A. Hanley, M. T. Hannan, S. Khosla, C. Liu, M. Lorentzon, D. Mellstrom, B. Merle, M. Nethander, R. Rizzoli, E. Sornay-Rendu, B. Van Rietbergen, D. Sundh, A. Kin On Wong, C. Ohlsson, S. Demissie, D. P. Kiel, M. L. Bouxsein, Cortical and trabecular bone microarchitecture as an independent predictor of incident fracture risk in older women and men in the bone microarchitecture international consortium (bomic): a prospective study, Lancet Diabetes Endocrinol. 7 (2019) 34–43.
  • Ferguson et al. (2003) V. L. Ferguson, R. A. Ayers, T. A. Bateman, S. J. Simske, Bone development and age-related bone loss in male c57bl/6j mice, Bone 33 (2003) 387–398.
  • Halloran et al. (2002) B. P. Halloran, V. L. Ferguson, S. J. Simske, A. Burghardt, L. L. Venton, S. Majumdar, Changes in bone structure and mass with advancing age in the male c57bl/6j mouse, J. Bone Miner. Res. 17 (2002) 1044–1050.
  • Ko et al. (2011) C.-Y. Ko, D. H. Seo, H. S. Kim, Deterioration of bone quality in the tibia and fibula in growing mice during skeletal unloading: gender-related differences, J. Biomech. Eng. 133 (2011) 111003.
  • Moustafa et al. (2009) A. Moustafa, T. Sugiyama, L. K. Saxon, G. Zaman, A. Sunters, V. J. Armstrong, B. Javaheri, L. E. Lanyon, J. S. Price, The mouse fibula as a suitable bone for the study of functional adaptation to mechanical loading, Bone 44 (2009) 930–935.
  • Lukas et al. (2013) C. Lukas, D. Ruffoni, F. M. Lambers, F. A. Schulte, G. Kuhn, P. Kollmannsberger, R. Weinkamer, R. Müller, Mineralization kinetics in murine trabecular bone quantified by time-lapsed in vivo micro-computed tomography, Bone 56 (2013) 55–60.
  • Suk et al. (2014) H.-I. Suk, S.-W. Lee, D. Shen, A. D. N. Initiative, et al., Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis, NeuroImage 101 (2014) 569–582.
  • Esteva et al. (2017) A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature 542 (2017) 115.
  • Mosekilde (1988) L. Mosekilde, Age-related changes in vertebral trabecular bone architecture—assessed by a new method, Bone 9 (1988) 247–250.
  • Burghardt et al. (2010) A. J. Burghardt, H. R. Buie, A. Laib, S. Majumdar, S. K. Boyd, Reproducibility of direct quantitative measures of cortical bone microarchitecture of the distal radius and tibia by hr-pqct, Bone 47 (2010) 519–528.
  • Bouxsein et al. (2010) M. L. Bouxsein, S. K. Boyd, B. A. Christiansen, R. E. Guldberg, K. J. Jepsen, R. Müller, Guidelines for assessment of bone microstructure in rodents using micro–computed tomography, J. Bone Miner. Res. 25 (2010) 1468–1486.
  • Mader et al. (2013) K. S. Mader, P. Schneider, R. Müller, M. Stampanoni, A quantitative framework for the 3d characterization of the osteocyte lacunar system, Bone 57 (2013) 142–154.
  • Birkhold et al. (2015) A. I. Birkhold, H. Razi, R. Weinkamer, G. N. Duda, S. Checa, B. M. Willie, Monitoring in vivo (re) modeling: a computational approach using 4d microct data to quantify bone surface movements, Bone 75 (2015) 210–221.
  • Rüegsegger et al. (1976) P. Rüegsegger, U. Elsasser, M. Anliker, H. Gnehm, H. Kind, A. Prader, Quantification of bone mineralization using computed tomography, Radiology 121 (1976) 93–97.
  • Macdonald et al. (2011) H. M. Macdonald, K. K. Nishiyama, J. Kang, D. A. Hanley, S. K. Boyd, Age-related patterns of trabecular and cortical bone loss differ between sexes and skeletal sites: a population-based hr-pqct study, J. Bone Miner. Res. 26 (2011) 50–62.
  • Shen et al. (2017) D. Shen, G. Wu, H.-I. Suk, Deep learning in med. image anal., Annual review of Biomed. Eng. 19 (2017) 221–248.
  • Litjens et al. (2017) G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, C. I. Sánchez, A survey on deep learning in med. image anal., Med. Image Anal. 42 (2017) 60–88.
  • Spampinato et al. (2017) C. Spampinato, S. Palazzo, D. Giordano, M. Aldinucci, R. Leonardi, Deep learning for automated skeletal bone age assessment in x-ray images, Med. Image Anal. 36 (2017) 41–51.
  • Torres et al. (2017) F. Torres, M. A. Bravo, E. Salinas, G. Triana, P. Arbeláez,

    Bone age detection via carpogram analysis using convolutional neural networks,

    in: 13th International Conference on Medical Information Processing and Analysis, volume 10572, International Society for Optics and Photonics, 2017, p. 1057217.
  • Lee et al. (2017) H. Lee, S. Tajmir, J. Lee, M. Zissen, B. A. Yeshiwas, T. K. Alkasab, G. Choy, S. Do, Fully automated deep learning system for bone age assessment, J. Digit. Imaging 30 (2017) 427–441.
  • Dutta and Sengupta (2016) S. Dutta, P. Sengupta, Men and mice: relating their ages, Life Sci. 152 (2016) 244–248.
  • Krizhevsky et al. (2012) A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Adv. Neural. Inf. Process. Syst., 2012, pp. 1097–1105.
  • Simard et al. (2003) P. Y. Simard, D. Steinkraus, J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, in: null, IEEE, 2003, p. 958.
  • Kingma and Ba (2014) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint (2014) arXiv:1412.6980.
  • Abadi et al. (2016) M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, G. Brain, Tensorflow: a system for large-scale machine learning., in: OSDI, volume 16, 2016, pp. 265–283.
  • Simonyan et al. (2013) K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint (2013) arXiv:1312.6034.
  • Erhan et al. (2009) D. Erhan, Y. Bengio, A. Courville, P. Vincent, Visualizing higher-layer features of a deep network, University of Montreal 1341 (2009) 1.
  • Parfitt (2001) A. M. Parfitt, Skeletal heterogeneity and the purposes of bone remodeling: implications for the understanding of osteoporosis, in: Osteoporosis (Second Edition), Elsevier, 2001, pp. 433–447.
  • Buenzli et al. (2018) P. R. Buenzli, C. Lerebours, A. Roschger, P. Roschger, R. Weinkamer, Late stages of mineralization and their signature on the bone mineral density distribution, Connect. Tissue Res. 59 (2018) 74–80.
  • Somerville et al. (2004) J. Somerville, R. M. Aspden, K. Armour, K. Armour, D. M. Reid, Growth of c57bl/6 mice and the material and mechanical properties of cortical bone from the tibia, Calcif. Tissue Int. 74 (2004) 469–475.
  • Lynch et al. (2011) M. E. Lynch, R. P. Main, Q. Xu, T. L. Schmicker, M. B. Schaffler, T. M. Wright, M. C. van der Meulen, Tibial compression is anabolic in the adult mouse skeleton despite reduced responsiveness with aging, Bone 49 (2011) 439–446.
  • Glatt et al. (2007) V. Glatt, E. Canalis, L. Stadmeyer, M. L. Bouxsein, Age-related changes in trabecular architecture differ in female and male c57bl/6j mice, J. Bone Miner. Res. 22 (2007) 1197–1207.
  • McNeil et al. (2009) C. J. McNeil, G. H. Raymer, T. J. Doherty, G. D. Marsh, C. L. Rice, Geometry of a weight-bearing and non-weight-bearing bone in the legs of young, old, and very old men, Calcif. Tissue Int. 85 (2009) 22–30.
  • Lambers et al. (2015) F. M. Lambers, G. Kuhn, C. Weigt, K. M. Koch, F. A. Schulte, R. Müller, Bone adaptation to cyclic loading in murine caudal vertebrae is maintained with age and directly correlated to the local micromechanical environment, J. Biomech. 48 (2015) 1179–1187.
  • Bergström et al. (2017) I. Bergström, J. G. Kerns, A. Törnqvist, C. Perdikouri, N. Mathavan, A. Koskela, H. Henriksson, J. Tuukkanen, G. Andersson, H. Isaksson, Compressive loading of the murine tibia reveals site-specific micro-scale differences in adaptation and maturation rates of bone, Osteoporos. Int. 28 (2017) 1121–1131.
  • Carriero et al. (2018) A. Carriero, A. Pereira, A. Wilson, S. Castagno, B. Javaheri, A. Pitsillides, M. Marenzana, S. Shefelbine, Spatial relationship between bone formation and mechanical stimulus within cortical bone: Combining 3d fluorochrome mapping and poroelastic finite element modelling, Bone Rep. 8 (2018) 72–80.
  • Sugiyama et al. (2010) T. Sugiyama, J. S. Price, L. E. Lanyon, Functional adaptation to mechanical loading in both cortical and cancellous bone is controlled locally and is confined to the loaded bones, Bone 46 (2010) 314–321.
  • Osterhoff et al. (2016) G. Osterhoff, E. F. Morgan, S. J. Shefelbine, L. Karim, L. M. McNamara, P. Augat, Bone mechanical properties and changes with osteoporosis, Injury 47 (2016) S11–S20.
  • Svedbom et al. (2014) A. Svedbom, M. Ivergård, E. Hernlund, R. Rizzoli, J. A. Kanis, Epidemiology and economic burden of osteoporosis in switzerland, Arch. Osteoporos. 9 (2014) 187.
  • Sharma et al. (2016) G. B. Sharma, D. D. Robertson, D. A. Laney, M. J. Gambello, M. Terk, Machine learning based analytics of micro-MRI trabecular bone microarchitecture and texture in type 1 Gaucher disease, J. Biomech. 49 (2016) 1961–1968.
  • Singh et al. (2017) A. Singh, M. K. Dutta, R. Jennane, E. Lespessailles, Classification of the trabecular bone structure of osteoporotic patients using machine vision, Comput. Biol. Med. 91 (2017) 148–158.
  • Greenspan et al. (2016) H. Greenspan, B. Van Ginneken, R. M. Summers, Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique, IEEE Trans. Med. Imaging 35 (2016) 1153–1159.
  • Van Opbroek et al. (2015) A. Van Opbroek, M. A. Ikram, M. W. Vernooij, M. De Bruijne, Transfer learning improves supervised image segmentation across imaging protocols, IEEE Trans. Med. Imaging 34 (2015) 1018–1030.
  • Lee et al. (2017) H. Lee, S. Tajmir, J. Lee, M. Zissen, B. A. Yeshiwas, T. K. Alkasab, G. Choy, S. Do, Fully Automated Deep Learning System for Bone Age Assessment, J. Digit. Imaging 30 (2017) 427–441.
  • Holguin et al. (2013) N. Holguin, M. D. Brodt, M. E. Sanchez, A. A. Kotiya, M. J. Silva, Adaptation of tibial structure and strength to axial compression depends on loading history in both c57bl/6 and balb/c mice, Calcif. Tissue Int. 93 (2013) 211–221.
  • Jilka (2013) R. L. Jilka, The relevance of mouse models for investigating age-related bone loss in humans, Journals of Gerontology Series A: Biomed. Sci.s and Medical Sciences 68 (2013) 1209–1217.
  • Yang et al. (2017) H. Yang, L. Albiol, W.-L. Chan, D. Wulsten, A. Seliger, M. Thelen, T. Thiele, L. Spevak, A. Boskey, U. Kornak, S. Checa, B. Willie M., Examining tissue composition, whole-bone morphology and mechanical behavior of gorabprx1 mice tibiae: A mouse model of premature aging, J. Biomech. 65 (2017) 145–153.
  • Pflanz et al. (2017) D. Pflanz, A. I. Birkhold, L. Albiol, T. Thiele, C. Julien, A. Seliger, E. Thomson, I. Kramer, M. Kneissel, G. N. Duda, U. Kornak, S. Checa, B. Willie M., Sost deficiency led to a greater cortical bone formation response to mechanical loading and altered gene expr., Sci. Rep. 7 (2017) 9435.
  • Yang et al. (2017) H. Yang, R. E. Embry, R. P. Main, Effects of loading duration and short rest insertion on cancellous and cortical bone adaptation in the mouse tibia, PloS one 12 (2017) e0169519.
  • Sun et al. (2018) D. Sun, M. D. Brodt, H. M. Zannit, N. Holguin, M. J. Silva, Evaluation of loading parameters for murine axial tibial loading: Stimulating cortical bone formation while reducing loading duration, J. Orthop. Res. 36 (2018) 682–691.
  • Asgharzadeh et al. (2018) P. Asgharzadeh, B. Özdemir, R. Reski, A. I. Birkhold, O. Röhrle, Feature-based classification of protein networks using confocal microscopy imaging and machine learning, PAMM 18 (2018) 1–2.
  • Cheung et al. (2014) A. M. Cheung, S. Majumdar, K. Brixen, R. Chapurlat, T. Fuerst, K. Engelke, B. Dardzinski, A. Cabal, N. Verbruggen, S. Ather, et al., Effects of odanacatib on the radius and tibia of postmenopausal women: improvements in bone geometry, microarchitecture, and estimated bone strength, J. Bone Miner. Res. 29 (2014) 1786–1794.
  • Tsai et al. (2015) J. N. Tsai, A. V. Uihlein, S.-A. M. Burnett-Bowie, R. M. Neer, Y. Zhu, N. Derrico, H. Lee, M. L. Bouxsein, B. Z. Leder, Comparative effects of teriparatide, denosumab, and combination therapy on peripheral compartmental bone density, microarchitecture, and estimated strength: the data-hrpqct study, J. Bone Miner. Res. 30 (2015) 39–45.

Appendix A Sensitivity Analysis of Network Architecture and Hyper Parameters

Various networks with different architectures and hyper parameters have been trained on the training set and applied on validation and test sets. The best performing network based on the accuracy of age prediction on validation and test sets has been selected to analyze the application set. Although, there exist unlimited variations of these elements, only a subset could provide meaningful results for the aim of decoding the aging and rejuvenation processes. Therefore, we designed and tested different networks variations of network depth, Network layout (convolutional (C), pooling (P) and fully connected (F)) and the size of kernel in each convolutional layer. Here we present the comparison of network performances for five selected designed network in which the key parameters are tweaked to reach the highest prediction accuracy. The five networks have following details:

  1. Network 1:

    • Depth: 7.

    • Layout: C, C, P, C, C, P, F.

    • Kernel sizes: 8, 16, 16, 32, 64, 64, 64.

    • Number of iterations: 7000.

  2. Network 2:

    • Depth: 7.

    • Layout: C, C, P, C, C, P, F.

    • Kernel sizes: 16, 32, 32, 64, 128, 128, 128.

    • Number of iterations: 7000.

  3. Network 3:

    • Depth: 10.

    • Layout: C, C, P, C, C, P, C, C, P, F.

    • Kernel sizes: 4, 4, 4, 8, 8, 8, 16, 32, 32, 32.

    • Number of iterations: 7000.

  4. Network 4:

    • Depth: 8.

    • Layout: C, C, P, C, C, P, F, F.

    • Kernel sizes: 4, 8, 8, 16, 32, 32, 32, 32.

    • Number of iterations: 7000.

  5. Network 5:

    • Depth: 4.

    • Layout: C, C, P, F.

    • Kernel sizes: 4, 8, 8, 8.

    • Number of iterations: 7000.

Analyzing the accuracy and loss values of age prediction during training of different networks (Fig. 8) shows that despite a correlation between depth of the its accuracy, going infinitely deeper does not necessary lead to better results. Evidently, the network with lowest number of layers (network 5) has the weakest performance by reaching accuracy with loss value of (Fig. 8A-B gray line). While, BAAM, network 1, network 2 and network 3 with depth of , , and respectively have similarly the best performances ( accuracy: Fig. 8A-B black, green, red and yellow lines respectively). Therefore, after a certain depth, no extra performance is gained. The reached accuracy and loss values for all the networks after training iterations is shown in Table 4.

Figure 8: Network performances during training. (A) Accuracy of age prediction for training set vs training iterations for sample networks and BAAM. (B) Smoothed loss value of age prediction for training set vs training iterations for sample networks and BAAM.
Network Depth Validation Accuracy Loss Test Accuracy
Table 4: Network performances. The accuracy and loss values of the networks with different architecture at the end of training iterations.

Next, comparing the prediction results of different networks for validation set based on their confusion matrices further demonstrates that networks 1-4 (Fig. 9A-D) have similar good performance and network 5 (Fig. 9E) fails to predict day images correctly. It is further seen that BAAM (Fig. 9F) provides the most diagonal confusion matrix hence is the best performing network.

Figure 9: Results of age prediction for validation set. (A)-(E): Networks 1-5 accordingly. (F): BAAM. In A-F Predicted age of samples are compared to their actual age. Values are normalized by the number of images per age class.

At last, the age prediction performance of the networks on the test set demonstrates the superiority of BAAM to other designed networks. While, networks - achieve , , , and correct predictions out of images of test set respectively (Fig. 10A-E), BAAM reaches the highest number of correct age prediction with out (Fig. 10F).

Figure 10: Results of age prediction for test set. (A)-(E): Networks 1-5 accordingly. (F): BAAM. In A-F Predicted age of samples are compared to their actual age. Values are normalized by the number of images per age class.

The analysis of overall achievements of presented networks as examples of variance in characteristics of DNNs lead to designing the BAAM network . It over-performs the sample presented networks in age prediction for training, validation and test sets. Therefore, it is further utilized to decode the aging and rejuvenation processes.