Log In Sign Up

Local Causal Structure Learning and its Discovery Between Type 2 Diabetes and Bone Mineral Density

Type 2 diabetes (T2DM), one of the most prevalent chronic diseases, affects the glucose metabolism of the human body, which decreases the quantity of life and brings a heavy burden on social medical care. Patients with T2DM are more likely to suffer bone fragility fracture as diabetes affects bone mineral density (BMD). However, the discovery of the determinant factors of BMD in a medical way is expensive and time-consuming. In this paper, we propose a novel algorithm, Prior-Knowledge-driven local Causal structure Learning (PKCL), to discover the underlying causal mechanism between BMD and its factors from the clinical data. Since there exist limited data but redundant prior knowledge for medicine, PKCL adequately utilize the prior knowledge to mine the local causal structure for the target relationship. Combining the medical prior knowledge with the discovered causal relationships, PKCL can achieve more reliable results without long-standing medical statistical experiments. Extensive experiments are conducted on a newly provided clinical data set. The experimental study of PKCL on the data is proved to highly corresponding with existing medical knowledge, which demonstrates the superiority and effectiveness of PKCL. To illustrate the importance of prior knowledge, the result of the algorithm without prior knowledge is also investigated.


Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery

Recently, with the digitalization of medicine, the utilization of real-w...

Causal Discovery for Causal Bandits utilizing Separating Sets

The Causal Bandit is a variant of the classic Bandit problem where an ag...

Incorporating Causal Prior Knowledge as Path-Constraints in Bayesian Networks and Maximal Ancestral Graphs

We consider the incorporation of causal knowledge about the presence or ...

Factoring out prior knowledge from low-dimensional embeddings

Low-dimensional embedding techniques such as tSNE and UMAP allow visuali...

Improving Model Robustness Using Causal Knowledge

For decades, researchers in fields, such as the natural and social scien...

Data-driven causal path discovery without prior knowledge - a benchmark study

Causal discovery broadens the inference possibilities, as correlation do...

Discovery of causal paths in cardiorespiratory parameters: a time-independent approach in elite athletes

Training of elite athletes requires regular physiological and medical mo...

1 Introduction

Diabetes mellitus is one of the most common chronic diseases featured by high levels of blood glucose and type 2 diabetes mellitus (T2DM) is the most frequent subtype of diabetes mellitus. T2DM and its complications cause a variety of health problems and they bring heavy economic burdens to individuals worldwide [1]. Osteoporosis is a common skeletal system disease characterized by decreased bone density and normal bone microstructure deterioration predisposing to an increased risk of bone fracture [2]. Osteoporosis leads to a decrease in physical function and the impairment of quality of life. Moreover, bone fracture due to osteoporosis causes increased disability rate, mortality, and a great economic burden on family and society [3].

Measurement of bone mineral density by dual X-ray absorptiometry (DXA) is the most commonly used approach to diagnose osteoporosis[4]. Decreased BMD reflects the reduction in bone strength that is closely linked to increased bone fracture risk. Osteoporosis-related bone fracture frequently occurs in patients with T2DM[2, 5]. Notably, although patients with T2DM have higher risks of osteoporosis-related bone fracture than those in non-diabetic individuals, the BMD is not necessarily identical[6, 7]. As suggested in a recent meta-analysis by Vestergaard, BMD even increases in patients with T2DM compared with non-diabetic individuals[8].

Many factors affect BMD in diabetes conditions. The traditional large longitudinal prospective studies are helpful to unravel determinant factors of BMD in T2DM. However, these kinds of studies are very expensive in terms of cost and time that they are difficult to reach the conclusion within a short time. In addition, the studies on the determinants of BMD in T2DM need to carry out complicated data analyses and data processing due to the complexity and complications of T2DM. Existing methods to find the relationship between risk factors and BMD mostly rely on experts’ knowledge and artificial analysis of clinical data, which is time-consuming and cost-effective. Furthermore, they cannot identify the underlying causal mechanism between risk factors and BMD in T2DM.

To automatically identify the risk factors of BMD and discover the underlying casual mechanism among them, intelligent algorithms should be developed. Traditionally, Bayesian networks (BN) structure learning algorithms can learn the casual mechanism from the data. However, in the medical field, the number of clinical samples are not enough for a BN structure learning algorithm to discover the real underlying causal mechanism. Moreover, as BMD is affected by numerous factors, traditional BN structure learning algorithms can not be applied to such a large scale of factors. Considering that lots of existing medical knowledge are not exploited, this paper proposes a new BN structure learning algorithm (PKCL), which can learn the underlying causal mechanism between BMD and it’s factors, meanwhile, incorporating rich existing prior knowledge. With the advantage of incorporating prior knowledge when learning the BN structure, some of the parameters of the model are determined by the prior knowledge. Thus, PKCL can deal with the case of large number of factors. Benefiting from prior knowledge, PKCL provides insight into complicated diseases and offer useful information to clinical experiment. Our contributions are summarized into the following three aspects:

  1. Aiming to the clinical data with scarce samples but abundant prior knowledge, a new framework is present to learn a more accurate model.

  2. A structure learning algorithm, PKCL, is proposed to utilize the prior knowledge as well as the causal information to detect the causal relationships in clinical data.

  3. We conclude the prior knowledge of experts about BMD and its risk factors. Conditioned on that, we discover the underlying causal mechanism between BMD and risk factors.

2 Related Work

It is accepted that patients with T2DM have a higher risk of osteoporosis-related bone fracture than those without diabetes[9, 10, 5]. Measurement of BMD is used to diagnosing osteoporosis as the golden standard. Nevertheless, whether the BMD decreases in T2DM is paradoxical according to current clinical studies.

A number of factors affect the BMD in diabetes conditions, such as sex, body mass index (BMI), insulin, and glucose. The prevalence of higher BMD in T2DM is similar in men and women across racial and ethnic groups including Mexican American, white, and black people[11, 12, 13]. BMI is strongly associated with BMD in T2DM and might explain, in part, higher BMD in T2DM compared with non-diabetic individuals[14]. Insulin resistance and hyperinsulinemia, which are characteristics of T2DM, have effects on bone metabolism. High levels of circulating insulin may contribute to high BMD and there are evidences in preclinical models that altered insulin levels and insulin resistance affect bone remodeling via direct effects on osteoblasts, osteoclasts, and osteocytes, all of which express insulin receptors[15]. Hyperglycemia is associated with the accumulation of advanced glycation end-products (AGEs) in the bone matrix, and AGEs inhibit bone formation, an effect mediated at least in part by increased osteocyte sclerostin production[16, 17]. Given the determinants of BMD is complicated, the derivation of causality will contribute to elucidate the cause of bone mineral density in T2DM, which is beneficial to prevent and treat osteoporosis-related bone fracture in T2DM.

However, the current work about selecting the most relative risk factors is rarely studied. The existing approaches are mainly depended on the analysis and experience of the experts, which are not cost-effective and time-efficient. In addition, they can’t analyze the risk factors of a complicated disease from a data aspect.

In recent studies, feature selection (FS) has been applied to several tasks including classification, regression, and clustering. A number of FS methods

[18, 19, 20, 21], which exploit different criteria to select the most informative features, have been proposed in the literature. They can roughly be divided into three classes: filter, wrapper, and embedding methods[22]. However, these three classes can not discover the underlying causal relationship between features and targets. Moreover, their FS criteria lack a theoretical proof of the optimality. The Markov Blanket (MB) algorithms are showed to have a superior performance over the traditional FS algorithm, as the MB is proved to be the optimal feature subset[23, 24, 25]. And MB algorithms can discover the underlying causal mechanisms of the selected features utilizing causal feature selection and causal discovery.

Generally, MB discovery can be grouped into two main types: nontopology-based and topology-based. Nontopology-based MB algorithms exploit independent tests between feature variables and target variables to discover the MB heuristically. Koller-Sahami (KS)

[26] first proposed an approximate algorithm to find the MB, which minimizes the cross-entropy loss by pruning out some redundant variables in a backward way. Due to the unsoundness of KS, lots of nontopology-based algorithms are proposed to improve on it. The Growth and Shrink algorithm (GS)[27] first tests and adds variables, which are sorted by the mutual information with the target variables, into the MB set in the growth stage. Then the shrinking stage eliminates false-negative nodes from the previous MB sets. Based on GS, The increment associated MB algorithm (IAMB)[28] improves the performance of GS by resorting the variables each time the MB set changes. After that, numerous variants of IAMB have been proposed including IAMBnPC, inter-IAMB, and KIAMB[29]. However, with the size of variables growth, the need for samples grows exponentially. If the sample data isn’t enough, the performance of IAMB and its variants will degrade.

As the limited data in real-world applications, topology-based methods are proposed to solve the data efficiency while keeping a reasonable time cost. Min-max MB (MMMB)[30] discovers the MB set by finding the parent-and-children set first and then finding the spouses, in which way the sample size only relies on the Directed Acyclic Graph (DAG) structure rather than the size of variables. Although MMMB is later proved to be unsound[31], the two steps of discovering the MB set are the foundation of the following methods. HITON-MB [31] inherits the framework of MMMB and interweaving the two steps to exclude the false positives from parents and children (PC) sets as early as possible, which can decrease the number of independent tests (ITs) needed later. However, Both MMMB and HITON-MB are unsound due to the incorrectness of PC discovery. Parent children-based MB algorithm (PCMB)[29], the first sound topology-based MB algorithm, which utilizes a double check strategy to fix the errors in PC discovery, is then introduced by Pena et. al. After that, Iterative parent children-based MB (IPCMB) algorithm[32] are proposed based on PCMB and discover the PC set more efficiently. Recently, Simultaneous MB algorithm (STMB)[33] is developed to improve the time efficiency of MB algorithms by utilizing the property of coexisting between descendants and spouses.

Although MB algorithms can discover the underlying causal mechanism between variables and targets, they can’t recognize the direction of the dependency. By BN structure learning, a DAG over all nodes can be constructed using the local MB sets. One approach of learning BN structure is constraint-based, which discover the arcs between each node pairs by conditional independent test (CIs). However, the number of CIs needed growth exponentially with the increase of the nodes. Moreover, as each CIs is calculated based on the results of another, it will lead to inevitable escalated errors. Another approach to learning BN structure is score-based.

3 Notation and Definition

Let capital letters denote variables (such as ), lower-case letters (such as

) denote the value of random variables and capital bold italic (such as

) denote variable sets.

   Definition 1 (Bayesian Network [34])

Formally, a Bayesian Network is a triplet

, which denotes a joint probability distribution

over a random variable set and can be represented by a DAG where each node corresponds to a random variable.

If there is an arc from to , which means , then is said to be a of and is a children of . In addition, if is a or of ,they are said to be . Node and are said to be of each other if they have a common child and there is no arc between and .If there is a directed path from to in G, then is a descendant of . And the descendants and the parents of is represented as and . Further, we use and to denote the neighbors and the spouse of node in .

   Definition 2 (Markov Condition[34])

Every node in the BN is independent of its nondescendant nodes, given its parents. Thus, if a BN , according to the definition of Markov Condition, the joint probability can be decomposed into the product of a series conditional probabilities:

   Definition 3 (V-Structure[34])

Three nodes , , and are said to be a V-structure if there are two arcs from , to and is not adjacent to .

is said to be a collider if has two incoming arcs from and , no matter and are adjacent or not. On the condition that and are adjacent, we say is an unshielded collider for the path from to .

   Definition 4 (Blocked Path[34])

Any path from node in to node in is said to be blocked by a variable set iff: 1) comprises a head-to-tail () or tail-to-tail () chain, and . 2) comprises a head-to-head () chain, where and any node in .

   Definition 5 (d-Separation[34])

If all paths from to is blocked by , then is said to d-sperate and , denoted as

   Definition 6 (Faithfulness Condition[34])

Given a BN , and are faithful to each other iff: all and only the condition probabilities true in are entailed by . Formally, for any in and , in iff in

   Definition 7 (Markov Blanket[34])

Formally, given the MB of a target node , denoted as , is independent of .

   Definition 8 (PCMasking[35])

Let denotes the PC set of variable X. and denote two subsets of and . and are PCMaksing for variable X if

and are called MaskingPCs.

   Theorem 1 (MB Uniqueness[34])

Given a BN, if and are faithful to each other, then , is unique and is the node set of neighbors and spouses .In addition, is also unique.

No. Features Type Description No. Features Type Description
1 Duration Numeric Duration of disease 18 AST Numeric Aspartate aminotransferase
2 Sex Boolean 19 GGT Numeric Gamma glutamyltransferase
3 Age Numeric 20 20-OH-VD Numeric 25-hydroxyvitamin vitaminD
4 Height Numeric 21 UALB/Ucr Numeric Urine albumin creatinine ratio
5 Weight Numeric 22 BMD1 Numeric Lumbar spines (L1¨CL4)
6 BMI Numeric BMI=weight(kg)/height(m 23 BMD2 Numeric Distal radius
7 FPG Numeric Fasting plasma glucose 24 BMD3 Numeric Femoral neck
8 HbAlc Numeric Glycated hemoglobin 25 BMD4 Numeric Wards triangle
9 Cr Numeric Serum creatinine 26 BMD5 Numeric Greater trochanter
10 UA Numeric Serum uric acid 27 BMD6 Numeric Total hip
11 Ca Numeric Calcium 28 OC Numeric Osteocalcin
12 P Numeric Phosphorus 29 CTX Numeric C-terminal telopeptide of type I collagen
13 ALP Numeric Alkaline phosphatase 30 PINP Numeric N-terminal propeptide of type 1 procollagen
14 TG Numeric Triglyceride 31 SBP Numeric Systolic blood pressure
15 TC Numeric Total cholesterol 32 DBP Numeric Diastolic blood pressure
16 Alb Numeric Albumin 33 LDL-C Numeric low-density lipoprotein cholesterol
17 ALT Numeric Alanine aminotransferase 34 HDL-C Numeric High-density lipoprotein cholesterol

4 Methods

In this section, we propose a BN structure learning algorithm driven by prior knowledge. Section V-A demonstrates the structure of PKCL and Section V-B, Section V-C demonstrate two-stage of it.

4.1 Overview

In real-world applications, the number of samples is limited while the number of features is numerous. If directly develop a structure learning (SL) algorithm in the limited data set, the output DAG can hardly reflect the real underlying casual mechanism among variables. Meanwhile, the existing SL algorithms ignore the significance of experts’ prior knowledge, which leads to the poor performance of the algorithms. Motivated to incorporate the SL algorithm with the experts’ prior knowledge, we propose an SL algorithm, which learns the BN structure and adds the prior knowledge simultaneously to build a global structure.

PKCL algorithm works in two phases: the local stage and the global stage. The pseudo-code of PKCL algorithm is denoted as Algorithm 1. In the local stage (lines 1-4 of Algorithm 1), PKCL first discovers the neighbors of the target variables and then detects the MaskingPCs to eliminate the effect of them. After that, it finds the spouse of target variables utilizing the neighbors set. Thus, the skeleton of BN is constructed and the detail of this stage is discussed in Section V-B.

In the global stage (lines 6-9 of Algorithm 1), PKCL leverages the MB sets learned in the local stage to learn the global BN structure, in which prior knowledge is incorporated to guide the global learning phase. Specifically, it learns the casual direction between feature variables and target variables by combining the constraint-based method and score-based method. What’s more, in the learning phase, it automatically adds casual direction according to the prior knowledge. The detail of this stage is discussed in Section V-C.

Input: Data D on node set , Target node set , Prior rule set
Output: Directed acyclic graph

1:{Local stage}
2:for all   do
3:      CCMB()
4:end for
5:{Global stage}
7:if not  then
8:     ScoreSearch()
9:end if
Algorithm 1 PKCL

4.2 The Local Stage

In this stage, we present a cross-check and complement MB discovery (CCMB)[35]. CCMB is a topology-based MB discovery algorithm, different from other previews algorithm, it discovers the MB set of a node while repair the incorrect conditional independent (CI) tests via eliminating the PCMasking phenomenon.

The pseudo-code of CCMB is represented in algorithm 2. Specifically, it works in the following three steps.

Step1 (algorithm 3): Discover the neighbors of node . The pseudo-code of this step is represented in algorithm 3. For each target node in , the algorithm works in three phases: First, find the potential neighbors set of , then score and rank the potential neighbors to choose the best one from , and finally prune out the false variables.

Step2 (lines 4-11 of algorithm 2): Prune out MaskingPCs of node . Compared to other MB algorithms, this step is the key point that makes CCMB outperform them. CCMB exploits a cross-check method(lines 4-8) to discover the MaskingPCs and appends them into PCMasking table in the format of , where denotes the target variable and denotes the cross-checked variable. Specifically, if is the neighbor of while is not the neighbor of , the cross-check method will take and as MaskingPCs because of the asymmetry between them.

Step3 (algorithm 4): Discover the spouses of node . The pseudo-code of this step is represented in algorithm 4. If is the neighbors of and is the neighbor of , then find a node subset conditioned on which is independent of .

Input: Data D on node set , Target node
Output: The Markov boundary of

1:{Step 1: Find the Neighbors}
2: FindNeighbors()
3:{Step 2: Eliminate PCMasking phenomenon}
4:for all  do
5:     if  FindNeighbors() and  then
7:     end if
8:end for
9:for all  do
11:end for
12:{Step 3: Find the spouse set}
13: FindSpouse()
Algorithm 2 CCMB(D,T)

Input: Data D on node set N, Target node
Output: The PC subset

1:{ Step 1: Find the potential neighbors}
2:for all  do
3:     ,
4:     while  do
5:         for all  do
7:              if  then
9:              end if
10:         end for
11:         for all  do
12:              if  and  then
14:              end if
15:         end for
16:         for all  do
18:         end for
22:         for all  do
23:              if  then
25:              end if
26:         end for
27:     end while
28:end for
29:return The Neighbors set of nodes in
Algorithm 3 FindNeighbors(D,):

Input: Data D on node set N, Target node T, MB sets
Output: The spouse set of

1:for all  do
2:     for all  do
3:         if  then
4:              find and
5:              if  then
7:              end if
8:         end if
9:     end for
10:end for
11:return (T)
Algorithm 4 FindSpouse(D,T,):

Input: Data D on node set , sets, Prior rules
Output: Directed acyclic graph G

2:for all  do
3:     for all  do
4:         for all  is the common child of and  do
5:              if possible without introducing cycles and satisfies prior knowledge R then
6:                  add XY and YZ to
7:              end if
8:         end for
9:     end for
10:end for
Algorithm 5 FindCollider():

4.3 The Global Stage

After the MB sets discovered, local information can be integrated to get the structure of DAG. Traditionally, the next step is to determine the direction of the edge, and thus, the underlying causal mechanism is learned. However, this way of learning the DAG is totally depended on the clinical data, which means a lot of knowledge in the medical field are ignored. Thus, some of the causal relationships that only learned from clinical data in conflict with medical knowledge and some causal relationship that is already proved in medical literature can not be learned.

PKCL learns the structure of a DAG between nodes via leveraging the MB sets discovered in the local state. Different from other structure learning method, the learning process of PKCL is routed by the prior knowledge of experts, which means the PKCL works in a more data-efficient way while maintaining superior performance. Specifically, PKCL first discovers the colliders to constructed the overall DAG. if there is no collider discovered, we use a heuristic method with the constraints of MB sets and prior rules to construct the DAG of the underlying BN. Here, the heuristic algorithm we used is the steepest ascent hill-climbing with a TABU[36] list of the last 100 structures and a stopping criterion of 15 steps without improvement in the maximum score.

5 Experiments

5.1 Data Collection and discretization

In our study, the clinical data of patients with T2DM are collected in the Department of the First Affiliated Hospital of the University of Science and Technology of China. As some patient samples contain several missing values and abnormal values, the data set is cleaned and completed during the preprocessing procedure. After that, a clinical data set of PKCL patient sample are collected, in which each sample has 34 features including anthropometric indexes of patients, biochemical indexes, lipid profile, and vitamin D levels from the assay of patients¡¯ blood samples. And the 34 features are labeled from 1 to 34. The description of them is shown in Table 1. The prior knowledge used in the experiment is that feature 3, 5, 6, 7, and 8 are the causes of six BMDs. Before experiments, the data should be discretized. Here we use a packed toolkit Causal Explorer[37]. The detailed description is added in the appendix.

5.2 Quality analyzation of selected features

The overall experiment comprises two-stage. To demonstrate the superiority of PKCL, each stage of PKCL is analyzed. In the local stage, four traditional feature selection algorithms and four MB algorithms are also applied to the clinical data set. To evaluate the quality of selected features, five classifiers are learned based on the selected features of nine algorithms and the prediction accuracy is computed. In the global stage, to demonstrate that the casual relationship learned after incorporated prior knowledge is more reasonable than the causal relationship learned without prior knowledge, not only the DAG with prior knowledge is learned, but also the DAG without prior knowledge are learned.

At first, we randomly select 400 samples from the dataset to implement the CCMB algorithm and other four MB algorithms and four traditional feature selection algorithms, namely IAMB [28], PCMB [29], MBOR, STMB [33]

, mRMR, Fisher, FCBF, and RFS. Then, in order to demonstrate the superiority of PKCL, five classifiers, i.e., Support Vector Machine (SVM), k-Nearest Neighbors(kNN), AdaBoost, Random Forest (RF), and Naive Bayes (NB) are trained with their selected features. In addition, the classifiers are also trained with the original features to be considered as a baseline, which can demonstrate that the feature selection algorithms can improve the prediction accuracy of classifiers by extracting the informative features. The k is set to 10 in kNN classifier. Lastly, the rest 100 samples are used as testing data to evaluate the quality of the selected features.

The experimental results on six BMD are listed in Table LABEL:table2. As Table LABEL:table2 shows, when the label is BMD1, BMD2 or BMD5, the five classifiers using the features selected by PKCL achieves the best prediction accuracy. When the label is BMD3, BMD4 or BMD6, SVM, KNN, Adaboost, and Random Forest also achieves the best prediction accuracy using the feature selected by PKCL, although Naive Bayes achieves the best prediction using the feature selected by mRMR, the result is still competitive when using the feature selected by PKCL. Specifically, the five classifiers achieve 0.9-19.2% improvement of prediction accuracy in comparison to the result of using all features, which brings a significant improvement. In addition, the selected features are the input of the global stage of PKCL, if the selected features are more informative, the underlying causal mechanism will be more reasonable.

BMD1 SVM 55.3 65.283 60.427 63.75 72.354 69.423 67.941 68.014 64.57 76.219
KNN 58.393 69.6 60.646 67.087 74.127 70.086 72.467 69.929 66.522 80.175
Adaboost 55.915 67.009 60.444 64.187 68.64 66.923 68.02 67.546 63.123 72.875
Random Forest 59.013 69.038 60.921 68.167 75.773 74.146 72.909 70.628 63.151 78.903
Naive Bayes 59.475 68.923 60.581 67.396 74.339 68.131 72.458 71.278 65.027 75.242
BMD2 SVM 53.544 63.951 55.388 62.936 69.488 65.762 63.926 63.67 61.005 73.201
KNN 56.633 64.847 58.898 64.916 69.927 65.494 71.202 67.628 62.25 77.015
Adaboost 53.108 64.446 56.529 62.39 64.747 64.319 63.781 63.329 62.322 71.599
Random Forest 54.081 65.209 56.983 63.644 73.259 71.095 71.798 67.62 61.656 77.618
Naive Bayes 56.168 65.689 58.777 64.827 72.044 65.393 68.061 67.106 62.085 73.055
BMD3 SVM 54.093 63.519 57.711 62.894 69.368 67.011 65.427 65.038 62.501 74.378
KNN 56.891 68.333 59.328 65.395 72.031 68.046 71.813 67.889 63.685 78.917
Adaboost 53.4 63.928 57.85 63.386 66.364 65.656 65.904 65.571 61.473 71.084
Random Forest 57.376 67.778 60.181 65.916 72.127 71.206 72.25 69.043 61.777 76.949
Naive Bayes 56.704 66.208 59.913 65.701 73.765 65.993 69.108 68.93 61.975 72.93
BMD4 SVM 53.253 61.409 54.578 59.979 70.107 67.461 61.401 65.376 62.968 70.197
KNN 53.747 67.956 57.149 63.724 73.254 69.152 67.877 64.256 60.841 78.813
Adaboost 54.54 62.053 55.864 61.419 63.851 62.71 65.129 62.369 59.105 67.217
Random Forest 56.345 66.37 60.107 61.976 73.471 70.079 67.521 66.696 59.252 78.304
Naive Bayes 57.193 64.551 55.76 64.231 71.444 64.54 67.782 66.282 60.751 69.344
BMD5 SVM 50.583 61.595 56.123 59.385 66.177 64.351 61.391 64.528 59.907 73.295
KNN 52.876 68.313 59.261 63.652 72.242 67.172 71.41 65.663 63.704 74.675
Adaboost 51.036 65.207 57.615 59.669 67.302 62.788 63.952 61.715 61.853 70.117
Random Forest 53.231 64.202 59.92 65.132 73.453 69.624 68.88 65.835 60.399 75.876
Naive Bayes 53.374 65.446 59.815 64.959 72.456 63.077 68.431 66.953 62.478 73.045
BMD6 SVM 53.711 64.191 57.831 62.89 69.616 65.743 64.496 66.123 62.75 74.558
KNN 56.755 66.571 59.665 65.56 72.618 68.729 71.871 68.614 63.417 78.743
Adaboost 54.615 64.692 57.83 62.673 65.816 65.955 66.14 65.566 62.76 71.893
Random Forest 56.593 66.914 58.654 65.191 73.266 71.568 71.959 68.351 61.332 76.743
Naive Bayes 57.065 66.75 59.95 65.592 73.39 65.815 70.713 70.173 62.404 72.326
Table 2: The prediction accuracy (in %) on the test samples. The best result of each classifier is illustrated in bold. All denotes no feature selected algorithm is applied.

5.3 Learning the DAG with prior knowledge

To illustrate the significance of prior knowledge, the DAG that not incorporating prior knowledge is also learned. The overall DAG learned with prior knowledge and the overall DAG learned without prior knowledge are presented in the appendix. Here only six BMDs concerned are analyzed. Figure 1 is the local casual relationship of six BMDs and features. Each sub-figure presents the local casual relationships of one BMD, which contains both the local casual relationships incorporating prior knowledge and not incorporating prior knowledge.

In order to have an insight into the differences between the DAG that incorporating prior knowledge and the DAG that not incorporating prior knowledge, we first analyze the DAG that incorporating prior knowledge and then analyze the DAG not incorporating prior knowledge, finally, the superiority of the former and the inaccuracy of the latter are analyzed in detail.

The DAG incorporating prior knowledge is analyzed as follows. As Figure 1 shows, all BMDs have no effect on any feature, which means BMDs are the comprehensive effects of some risk factors and BMDs don’t have effects on any risk factors. In addition, feature 1, 2, 3, 5, 6, 7, 8, 11, 12, 15, 28, 29, 33 are the common causes of all six BMDs, which means these features have an underlying effect on the decrease of mine density. Feature 9,30 are the causes of BMD1. Feature 16, 30 are the causes of BMD2. Feature 9, 10, 20, 34 are the causes of BMD3. Feature 16, 17 are the causes of BMD4. The effect of BMD1 is feature 22. The effects of BMD2 are feature 7, 8, 26. The effect of BMD3 is feature 15. The effects of BMD4 are feature 6, 23, 25. The effects of BMD5 are feature 25, 32. BMD6 has no effect on any feature.

The DAG not incorporating prior knowledge is analyzed as follows. Feature 1, 2, 3 are the common causes of all six BMDs. Feature 11, 13, 15, 21, 28, 30, 33, 34 are the causes of BMD1. Feature 4, 10, 11, 12, 14, 15, 16, 18, 19, 20, 25, 28, 29, 30, 32 are the causes of BMD2. Feature 6, 28 are the causes of BMD3. Feature 7, 10, 13, 16, 17, 20, 21, 26, 28, 30 are the causes of BMD4. Feature 4, 8, 10, 21, 23, 30, 31, 33 are the causes of BMD5. Features 12, 19, 20, 30, 32, 34 are the causes of BMD6.

The difference between the DAG incorporating prior knowledge and not incorporating prior knowledge is analyzed as follows. For BMD1, the arcs from feature 13, 21, 34 to BMD1 and the arc BMD1 from 19 are removed while there are new arcs added from features 5, 6, 8, 9, 12, 29, 33 to BMD1. The arc from feature BMD3 to feature 15 is removed while there are new arcs added from features 5, 7, 8, 9, 10, 11, 12, 15, 20, 29, 33, 34 to BMD3. The arcs from feature 7, 10, 13, 20, 21, 26, 30 from BMD4 and arcs from BMD4 to feature 6, 23, 25 are removed while there are new arcs added from feature 5, 6, 7, 8, 9, 11, 12, 15, 29, 33 to BMD4. The arcs from features 4, 8, 10, 21, 23, 30, 31 to BMD5 and arcs from BMD5 to 25, 31 are removed while there are new arcs added from feature 5, 6, 7, 11, 12, 15, 28, 29 to BMD5. The arcs from feature 12, 19, 20, 30, 32, 34 to BMD6 are removed while there are new arcs from feature 5, 6, 7, 8, 11, 12, 15, 28, 29, 33 to BMD6.

(a) BMD1
(b) BMD2
(c) BMD3
(d) BMD4
(e) BMD5
(f) BMD6
Figure 1: The causal relationships of six BMDs. Each sub-figure illustrates the local causal relationship of one BMD. The left of each sub-figure is the local relationships that incorporating prior knowledge and the right is local relationships that not incorporating prior knowledge. The red circle in the central presents BMD and cyan circle in the edge presents feature.

5.4 Discussion

As we analyzed above, PCKL can discover the underlying causal mechanism between BMDs and their related risk factors. Here some casual relationships that have already been discovered in the clinical field will be discussed, which can demonstrate the superiority of PKCL. In addition, the new casual mechanism found by PCKL will provide an insight into the relationship between BMDs and their factors, which may contribute to the prevention and treatment of diabetes-related osteoporosis.

A report shows that 1 out of 3 women and 1 out of 5 men over 50 years old will experience an osteoporotic fracture at some point in their life[38]. Patients with T2DM, one of the most common chronic diseases, suffer from an increased osteoporotic-related bone fracture risk, which places a heavy burden on individuals. BMD is the golden standard for diagnosing osteoporosis. However, the causal chain involved in BMD and T2DM is not clear.

In elderly diabetic individuals, AGEs may inhibit the phenotypic expression of osteoblast and promote osteoblast apophasis, thereby contributing to the deficiency in the bone formation[39, 40]. AGEs also increases osteoclast-induced bone resorption. The study by Zhou et al. has indicated that increasing age is a more important risk factor for bone mineral loss in patients with T2DM than diabetes duration[41]. The report by Wang et al. has indicated that the adverse changes in the collagen network occur with aging and such changes may lead to the decreased toughness of the bone[42]. Moreover, the porosity of the bone significantly increases with aging and correlates to bone strength and stiffness[42]. Therefore, BMD negatively correlates to aging. After bone mass reaches a peak in the third or fourth decade of life, vertebral bone mass and density decrease with aging for both females and males[43]. Moreover, AGEs accumulation occurs in the bone with aging, increasing by 4 to10 fold at the age, of 50 years old[44]. As discussed below, increased levels of AGEs in bone tissues have been shown to be associated with diminished bone mechanical function and reduced cortical and trabecular bone strength. Additionally, age-related bone loss is associated with abnormalities in vitamin D status. Reduced serum levels of active vitamin D metabolites, 25-hydroxylamine-D[25(OH)D] and 1, 25-(OH)2-D, occur with aging in both sexes[45, 46]. Nutritional vitamin D deficiency may contribute to secondary hyperparathyroidism and bone loss with aging since decreases in serum 25(OH)D levels correlate inversely with serum parathyroid hormone levels and positively with BMD[47].

As indicated in this study, hyperglycemia is another important factor determining BMD in patients with T2DM. It could be explained as follows. Firstly, diabetes has been shown to cause decreased osteopsathyrosis, reduced bone formation, and enhanced osteoblast apophasis in a bone-loss mouse model[48]. Secondly, hyperglycemia leads to glycosuria, which results in a loss of calcium. Hypercalciuria presented as a raised glomerular filtration rate, reduces calcium reabsorption and impairs bone deposition in diabetic rats[49]. Hypercalciuria decreases the level of calcium in the bone, leading to poor bone quality[50]. Some reports indicate that the hypercalciuria in patients with uncontrolled blood glucose could stimulate parathyroid hormone secretion, which may contribute to the development of osteopenia[51]. Thirdly, hyperglycemia is known to generate higher concentrations of AGEs in collagen[52]. AGEs have been shown to be associated with decreased strength in human cadaver femurs[10]. The combination of the accumulation of AGEs in bone collagen and lower bone turnover may contribute to reduced bone strength for a given BMD in diabetes[53]. AGEs and oxidative stress produced by hyperglycemia may reduce enzymatic beneficial cross-linking, inhibit osteoblast differentiation, and induce osteoblast apoptosis[50].

Height, weight, sex, and obesity are also factors affecting BMD in T2DM. As a Korean population-based study reported, sex affects BMD. The difference in BMD distribution at the same skeletal site may be partially explained by distinctive endocrine and paracone factors between the two sexes[54]. It has been suggested that bone loss in elderly men is mostly a result of decreased bone formation, whereas bone loss in postmenopausal women is a result of excessive bone resorption[55]. Sex hormones may account for this difference. Estrogen rapidly decreases in postmenopausal women. An accelerated phase of predominantly cancellous bone loss initiated by menopause is the result of the loss of the direct restraining effect of estrogen on bone turnover[56]. Estrogen acts on high-affinity estrogen receptors in osteoblast and osteoclasis to restrain bone turnover[57]. Estrogen also regulates the production, by osteoblastic and marrow stromal cells, of cytokines involved in bone remodeling, such as interleukin (IL)-1, IL-6, tumor necrosis factor-, prostaglandin E2, transforming growth factor-, eta, and osteoprotegerin[57, 58]. The net result of the loss of direct action of estrogen is a marked increase in bone resorption that is not accompanied by an adequate increase in bone formation, resulting in bone loss. The accelerated phase of bone loss in women is due to direct skeletal consequences of rapid reduction in serum estrogen following the menopause.

High body weight and obesity have been shown to be associated with high BMD in many observational studies[59]. Obesity may lead to increased BMD because it is associated with higher 17 -estradiol levels and higher mechanical load, which may protect bone[60, 61]. Visceral fat accumulation is associated with higher levels of pro-inflammatory cytokinesis, which may up-regulate receptor activators of nuclear ligand, leading to increased bone resorption and therefore decreased BMD[62, 63, 64].

Some studies have shown that weight loss, both intentional and unintentional, is associated with the decreases in BMD. The study by Geoffroy et al. has shown that more than 70% of patients have clinically significant BMD loss at 12 months after bariatrics surgery[65]. This loss of bone density was observed at the femoral neck and femur[65]. Then the significant reduction in BMD was related by bivariate analysis to the extent of reduction in BMI, weight loss, and to loss of fat and lean mass[65]. A recent study in elderly women has identified risk factors for hip BMD loss over four years and concluded that women who gain weight show attenuated BMD loss at the trochanter, femoral neck, and total hip[66].

6 Conclusion

In this paper, we propose a new BN algorithm (PKCL) that can find the underlying causal mechanism between six BDMs and their related factors. PKCL includes two stages: the local stage that discovers the local MB sets and the global stage that learns the direction of casual-effect relationship.

In addition, to demonstrate the superiority and effectiveness of PCKL, a clinical data set that concludes the clinical indexes of the patient with T2DM was collected and preprocessed. Experiments on this dataset shows that PKCL can discover the casual relationships that have already been discovered in clinical literature. What’s more, PKCL can discover new casual relationships to assist clinical researchers in carrying out new experiments, which can save a lot of time and money. Different from other BN algorithm, PCKL incorporates rich prior knowledge, which means it can achieve good performance even when the dataset is small while the feature is numerous. What’ more, PCKL is not limited in the clinical literature but can be adjusted into any domain if incorporated with prior knowledge. The future work on this subject will employ other probabilistic models [67, 68] and learning in the model space [69, 70] for this kind of problems.

Figure 2: The DAG before incorporating prior knowledge
Figure 3: The DAG after incorporating prior knowledge

7 Appendix

7.1 The process of discretization

The discretization method that works as follows:

  1. Data is normalized so that each variable has mean 0 and standard deviation 1

  2. After normalization, association of each variable with the response variable is computed using either Wilcoxon rank sum test (for binary response variable) or Kruskal–Wallis non-parametric ANOVA (for multicategory response variable) at 0.05 alpha level


  3. If a variable is not significantly associated with the response variable, it is discretized as follows:

    • 0 for values less than -1 standard deviation

    • 1 for values between -1 and 1 standard deviation

    • 2 for values greater than 1 standard deviation

  4. If a variable is significantly associated with the response variable, it is discretized using sliding threshold (into binary) or using sliding window (into ternary). The discretization threshold(s) is determined by the Chi-squared test to maximize association with the response variable[72].

The discretization procedure can be instructed to compute necessary statistics only using training samples of the data to ensure unbiased estimation of error metrics on the testing data.

7.2 Two pictures of the overall DAG

Figure 2 is the learned DAG that not incorporating prior knowledge. Figure 3 is the learned DAG that incorporating prior knowledge. The circle represents the features and the arc represents the causal relationship. The numbers in the circle denote the features and Table 1 lists the corresponding relationships.

7.3 Abbreviations and its descriptions

Table 3 is the abbreviations appear in the paper and its descriptions

Abbreviations Descriptions
AGEs advanced glycation end-products
BMD bone mineral density
BN Bayesian networks
CCMB cross-check and complement MB discovery
CIs conditional independent tests
DAG Directed Acyclic Graph
DXA X-ray absorptiometry
FS feature selection
GS The Growth and Shrink algorithm
IAMB The increment associated MB algorithm
IPCMB Iterative parent children-based MB
ITs independent tests
kNN k-Nearest Neighbors
KS Koller-Sahami
MB The Markov Blanket
MMMB Min-max MB
NB Naive Bayes
PC parents and children
PCMB Parent children-based MB algorithm
PKCL Prior-Knowledge-driven local Causal structure Learning
RF Random Forest
SVM Support Vector Machine
STMB Simultaneous MB algorithm
SL structure learning
T2DM Type 2 diabetes
Table 3: Abbreviations and its descriptions


  • [1] Bin Zhou, Yuan Lu, Kaveh Hajifathalian, James Bentham, Mariachiara Di Cesare, Goodarz Danaei, Honor Bixby, Melanie J Cowan, Mohammed K Ali, Cristina Taddei, et al. Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4· 4 million participants. The Lancet, 387(10027):1513–1530, 2016.
  • [2] Maryam Ghodsi, Abbass Ali Keshtkar, Ensieh Nasli-Esfahani, Sudabeh Alatab, Mohammad Reza Mohajeri-Tehrani, et al. Mechanisms involved in altered bone metabolism in diabetes: a narrative review. Journal of Diabetes & Metabolic Disorders, 15(1):52, 2016.
  • [3] Jane A Cauley. Osteoporosis: fracture epidemiology update 2016. Current opinion in rheumatology, 29(2):150–156, 2017.
  • [4] JA Kanis, C-C Glüer, et al. An update on the diagnosis and assessment of osteoporosis with densitometry. Osteoporosis international, 11(3):192–202, 2000.
  • [5] Inbal Goldshtein, Allison Martin Nguyen, Anne E dePapp, Sofia Ish-Shalom, Julie M Chandler, Gabriel Chodick, and Varda Shalev. Epidemiology and correlates of osteoporotic fractures among type 2 diabetic patients. Archives of osteoporosis, 13(1):15, 2018.
  • [6] SC DeShields and TD Cunningham. Comparison of osteoporosis in us adults with type 1 and type 2 diabetes mellitus. Journal of endocrinological investigation, 41(9):1051–1060, 2018.
  • [7] Christian Muschitz, Alexandra Kautzky-Willer, Martina Rauner, Yvonne Winhöfer-Stöckl, and Judith Haschka. Diagnosis and management of patients with diabetes and co-existing osteoporosis (update 2019): Common guideline of the autrian society for bone and mineral research and the austrian diabetes society. Wiener klinische Wochenschrift, 131(Suppl 1):174–185, 2019.
  • [8] Peter Vestergaard. Discrepancies in bone mineral density and fracture risk in patients with type 1 and type 2 diabetes—a meta-analysis. Osteoporosis international, 18(4):427–444, 2007.
  • [9] Mohsen Janghorbani, Rob M Van Dam, Walter C Willett, and Frank B Hu. Systematic review of type 1 and type 2 diabetes mellitus and risk of fracture. American journal of epidemiology, 166(5):495–505, 2007.
  • [10] Denise E Bonds, Joseph C Larson, Ann V Schwartz, Elsa S Strotmeyer, John Robbins, Beatriz L Rodriguez, Karen C Johnson, and Karen L Margolis. Risk of fracture in women with type 2 diabetes: the women’s health initiative observational study. The Journal of clinical endocrinology & metabolism, 91(9):3404–3410, 2006.
  • [11] Lili Ma, Ling Oei, Lindi Jiang, Karol Estrada, Huiyong Chen, Zhen Wang, Qiang Yu, Maria Carola Zillikens, Xin Gao, and Fernando Rivadeneira. Association between bone mineral density and type 2 diabetes mellitus: a meta-analysis of observational studies. European journal of epidemiology, 27(5):319–332, 2012.
  • [12] Elsa S Strotmeyer, Jane A Cauley, Ann V Schwartz, Michael C Nevitt, Helaine E Resnick, Joseph M Zmuda, Douglas C Bauer, Frances A Tylavsky, Nathalie de Rekeneire, Tamara B Harris, et al. Diabetes is associated independently of body composition with bmd and bone volume in older white and black men and women: The health, aging, and body composition study. Journal of Bone and Mineral Research, 19(7):1084–1091, 2004.
  • [13] WH Linda Kao, Candace M Kammerer, Jennifer L Schneider, Richard L Bauer, and Braxton D Mitchell. Type 2 diabetes is associated with increased bone mineral density in mexican-american women. Archives of medical research, 34(5):399–406, 2003.
  • [14] Vikram V Shanbhogue, Deborah M Mitchell, Clifford J Rosen, and Mary L Bouxsein. Type 2 diabetes and the skeleton: new insights into sweet bones. The lancet Diabetes & endocrinology, 4(2):159–173, 2016.
  • [15] Keertik Fulzele, Ryan C Riddle, Douglas J DiGirolamo, Xuemei Cao, Chao Wan, Dongquan Chen, Marie-Claude Faugere, Susan Aja, Mehboob A Hussain, Jens C Brüning, et al. Insulin receptor signaling in osteoblasts regulates postnatal bone acquisition and body composition. Cell, 142(2):309–319, 2010.
  • [16] Ken-ichiro Tanaka, Toru Yamaguchi, Ippei Kanazawa, and Toshitsugu Sugimoto. Effects of high glucose and advanced glycation end products on the expressions of sclerostin and rankl as well as apoptosis in osteocyte-like mlo-y4-a2 cells. Biochemical and biophysical research communications, 461(2):193–199, 2015.
  • [17] J Compston. Type 2 diabetes mellitus and bone. Journal of internal medicine, 283(2):140–153, 2018.
  • [18] Bingbing Jiang, Chang Li, Maarten De Rijke, Xin Yao, and Huanhuan Chen. Probabilistic feature selection and classification vector machine. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(2):1–27, 2019.
  • [19] Xingyu Wu, Bingbing Jiang, Kui Yu, Huanhuan Chen, and Chunyan Miao. Multi-label causal feature selection. In

    Proceedings of the 34th AAAI Conference on Artificial Intelligence

    , pages 505–511, 2020.
  • [20] Bingbing Jiang, Xingyu Wu, Kui Yu, and Huanhuan Chen. Joint semi-supervised feature selection and classification through bayesian approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3983–3990, 2019.
  • [21] Shan He, Huanhuan Chen, Zexuan Zhu, Douglas G Ward, Helen J Cooper, Mark R Viant, John K Heath, and Xin Yao. Robust twin boosting for feature selection from high-dimensional omics data with label noise. Information Sciences, 291:1–18, 2015.
  • [22] Huan Liu and Hiroshi Motoda. Computational methods of feature selection. CRC Press, 2007.
  • [23] Constantin F Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation.

    Journal of Machine Learning Research

    , 11(Jan):171–234, 2010.
  • [24] Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Duy Le. Multi-source causal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  • [25] Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. Mining markov blankets without causal sufficiency.

    IEEE transactions on neural networks and learning systems

    , 29(12):6333–6347, 2018.
  • [26] Daphne Koller and Mehran Sahami. Toward optimal feature selection. Technical report, Stanford InfoLab, 1996.
  • [27] Dimitris Margaritis and Sebastian Thrun. Bayesian network induction via local neighborhoods. In Proceedings of the Advances in Neural Information Processing Systems, pages 505–511, 2000.
  • [28] Ioannis Tsamardinos, Constantin F Aliferis, Alexander R Statnikov, and Er Statnikov. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference, pages 376–380, 2003.
  • [29] Jose M Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. Towards scalable and data efficient learning of Markov boundaries. International Journal of Approximate Reasoning, 45(2):211–232, 2007.
  • [30] Ioannis Tsamardinos, Constantin F Aliferis, and Alexander Statnikov. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, pages 673–678, 2003.
  • [31] Aliferis CF, Tsamardinos I, and Statnikov A. HITON: a novel Markov Blanket algorithm for optimal variable selection. AMIA … Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, pages 21–5, 2003.
  • [32] Shunkai Fu and Michel C Desmarais. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence, pages 96–107, 2008.
  • [33] Tian Gao and Qiang Ji. Efficient Markov blanket discovery and its application. IEEE Transactions on Cybernetics, 47(5):1169–1179, 2017.
  • [34] Judea Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. 1988.
  • [35] Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. Accurate markov boundary discovery for causal feature selection. IEEE Transactions on Cybernetics, 2019.
  • [36] Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1):31–78, 2006.
  • [37] Alexander Statnikov, Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. Causal explorer: A matlab library of algorithms for causal discovery and variable selection for classification. Causation and Prediction Challenge Challenges in Machine Learning, Volume 2, page 267, 2010.
  • [38] Stavroula A Paschou, Anastasia D Dede, Panagiotis G Anagnostis, Andromachi Vryonidou, Daniel Morganstein, and Dimitrios G Goulis. Type 2 diabetes and osteoporosis: a guide to optimal management. The Journal of Clinical Endocrinology & Metabolism, 102(10):3621–3634, 2017.
  • [39] Yasuyuki Katayama, Takuhiko Akatsu, Michiko Yamamoto, Nobuo Kugai, and Naokazu Nagata. Role of nonenzymatic glycosylation of type i collagen in diabetic osteopenia. Journal of Bone and Mineral Research, 11(7):931–937, 1996.
  • [40] Mani Alikhani, Zoubin Alikhani, Coy Boyd, Christine M MacLellan, Markos Raptis, Rongkun Liu, Nicole Pischon, Philip C Trackman, Louis Gerstenfeld, and Dana T Graves. Advanced glycation end products stimulate osteoblast apoptosis via the map kinase and cytosolic apoptotic pathways. Bone, 40(2):345–353, 2007.
  • [41] Yijun Zhou, Yan Li, Dan Zhang, Jiahe Wang, and Hongwu Yang. Prevalence and predictors of osteopenia and osteoporosis in postmenopausal chinese women with type 2 diabetes. Diabetes research and clinical practice, 90(3):261–269, 2010.
  • [42] X Wang, X Shen, X Li, and C Mauli Agrawal. Age-related changes in the collagen network and toughness of bone. Bone, 31(1):1–7, 2002.
  • [43] Ebbe N Ebbesen, Jesper S Thomsen, Henning Beck-Nielsen, Hans J Nepper-Rasmussen, and Lis Mosekilde. Age-and gender-related differences in vertebral bone mass, density, and strength. Journal of Bone and Mineral Research, 14(8):1394–1403, 1999.
  • [44] Svenja Illien-Jünger, Paolo Palacio-Mancheno, William F Kindschuh, Xue Chen, Grazyna E Sroga, Deepak Vashishth, and James C Iatridis. Dietary advanced glycation end products have sex-and age-dependent effects on vertebral bone microstructure and mechanical function in mice. Journal of Bone and Mineral Research, 33(3):437–448, 2018.
  • [45] R Dhaliwal, S Islam, M Mikhail, L Ragolia, and JF Aloia. Effect of vitamin d on bone strength in older african americans: a randomized controlled trial. Osteoporosis International, pages 1–10, 2020.
  • [46] BE Christopher Nordin and Howard A Morris. Osteoporosis and vitamin d. Journal of cellular biochemistry, 49(1):19–25, 1992.
  • [47] Sundeep Khosla, Elizabeth J Atkinson, L Joseph Melton III, and B Lawrence Riggs. Effects of age and estrogen status on serum parathyroid hormone levels and biochemical markers of bone turnover in women: a population-based study. The Journal of Clinical Endocrinology & Metabolism, 82(5):1522–1527, 1997.
  • [48] Hongbing He, Rongkun Liu, Tesfahun Desta, Cataldo Leone, Louis C Gerstenfeld, and Dana T Graves. Diabetes causes decreased osteoclastogenesis, reduced bone formation, and enhanced apoptosis of osteoblastic cells in bacteria stimulated bone loss. Endocrinology, 145(1):447–452, 2004.
  • [49] Ji-Yu Wang, Yan-Zhen Cheng, Shuang-Li Yang, Min An, Hua Zhang, Hong Chen, and Li Yang. Dapagliflozin attenuates hyperglycemia related osteoporosis in zdf rats by alleviating hypercalciuria. Frontiers in endocrinology, 10, 2019.
  • [50] Kun-Hong Li, Yen-Tze Liu, Yu-Wen Yang, Ying-Li Lin, Min-Ling Hung, and I-Ching Lin. A positive correlation between blood glucose level and bone mineral density in taiwan. Archives of osteoporosis, 13(1):78, 2018.
  • [51] B Lawrence Riggs, Sundeep Khosla, and L Joseph Melton III. Sex steroids and the construction and conservation of the adult skeleton. Endocrine reviews, 23(3):279–302, 2002.
  • [52] S Yamagishi, K Nakamura, and Hiroyoshi Inoue. Possible participation of advanced glycation end products in the pathogenesis of osteoporosis in diabetic patients. Medical hypotheses, 65(6):1013–1015, 2005.
  • [53] Ann V Schwartz. Efficacy of osteoporosis therapies in diabetic patients. Calcified tissue international, 100(2):165–173, 2017.
  • [54] Shu-Feng Lei, Fei-Yan Deng, Miao-Xin Li, Volodymyr Dvornyk, and Hong-Wen Deng. Bone mineral density in elderly chinese: effects of age, sex, weight, height, and body mass index. Journal of bone and mineral metabolism, 22(1):71–78, 2004.
  • [55] L Yan, A Prentice, B Zhou, H Zhang, X Wang, DM Stirling, A Laidlaw, Y Han, and A Laskey. Age-and gender-related differences in bone mineral status and biochemical markers of bone metabolism in northern chinese men and women. Bone, 30(2):412–415, 2002.
  • [56] Rongtao Cui, Lin Zhou, Zuohong Li, Qing Li, Zhiming Qi, and Junyong Zhang. Assessment risk of osteoporosis in chinese people: relationship among body mass index, serum lipid profiles, blood glucose, and bone mineral density. Clinical interventions in aging, 11:887, 2016.
  • [57] Sundeep Khosla, LJ Melton III, and BL Riggs. Osteoporosis: gender differences and similarities. Lupus, 8(5):393–396, 1999.
  • [58] RL Jilka. Cytokines, bone remodeling, and estrogen deficiency: a 1998 update. Bone, 23(2):75, 1998.
  • [59] Sue A Shapses and Deeptha Sukumar. Bone metabolism in obesity and weight loss. Annual review of nutrition, 32:287–309, 2012.
  • [60] Linda R Nelson and Serdar E Bulun. Estrogen production and action. Journal of the American Academy of Dermatology, 45(3):S116–S124, 2001.
  • [61] H Ohta, T Ikeda, T Masuzawa, K Makita, Y Suda, and S Nozawa. Differences in axial bone mineral density, serum levels of sex steroids, and bone metabolism between postmenopausal and age-and body size-matched premenopausal subjects. Bone, 14(2):111–116, 1993.
  • [62] Lorenz C Hofbauer and Michael Schoppet. Clinical implications of the osteoprotegerin/rankl/rank system for bone and vascular diseases. Jama, 292(4):490–495, 2004.
  • [63] BJ Smith, MR Lerner, SY Bu, EA Lucas, JS Hanas, SA Lightfoot, RG Postier, MS Bronze, and DJ Brackett. Systemic bone loss and induction of coronary vessel disease in a rat model of chronic inflammation. Bone, 38(3):378–386, 2006.
  • [64] Raquel MS Campos, Aline de Piano, Patrícia L da Silva, June Carnier, Priscila L Sanches, Flávia C Corgosinho, Deborah CL Masquio, Marise Lazaretti-Castro, Lila M Oyama, Cláudia MO Nascimento, et al. The role of pro/anti-inflammatory adipokines on bone metabolism in nafld obese adolescents: effects of long-term interdisciplinary therapy. Endocrine, 42(1):146–156, 2012.
  • [65] Marion Geoffroy, Isabelle Charlot-Lambrecht, Jan Chrusciel, Isabelle Gaubil-Kaladjian, Ana Diaz-Cives, Jean-Paul Eschard, and Jean-Hugues Salmon. Impact of bariatric surgery on bone mineral density: observational study of 110 patients followed up in a specialized center for the treatment of obesity in france. Obesity surgery, 29(6):1765–1772, 2019.
  • [66] Sigridur Lara Gudmundsdottir, Diana Oskarsdottir, Olafur S Indridason, Leifur Franzson, and Gunnar Sigurdsson. Risk factors for bone loss in the hip of 75-year-old women: a 4-year follow-up study. Maturitas, 67(3):256–261, 2010.
  • [67] Huanhuan Chen, Peter Tino, and Xin Yao. Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6):901–914, 2009.
  • [68] Huanhuan Chen, Peter Tiňo, and Xin Yao. Efficient probabilistic classification vector machine with incremental basis function selection. IEEE Transactions on Neural Networks and Learning Systems, 25(2):356–369, 2013.
  • [69] Huanhuan Chen, Peter Tiňo, Ali Rodan, and Xin Yao. Learning in the model space for cognitive fault diagnosis. IEEE Transactions on Neural Networks and Learning Systems, 25(1):124–136, 2013.
  • [70] Huanhuan Chen, Fengzhen Tang, Peter Tino, and Xin Yao. Model-based kernel for efficient time series analysis. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 392–400, 2013.
  • [71] Myles Hollander, Douglas A Wolfe, and Eric Chicken. Nonparametric statistical methods, volume 751. John Wiley & Sons, 2013.
  • [72] A Agresti. Categorical data analysis. 2nd wiley interscience. Hoboken, NJ, 2002.