Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes

01/08/2020 ∙ by Zhaozhi Qian, et al. ∙ 0

Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals. In electronic medical records, we only observe onsets of diseases, but not their triggering comorbidities i.e., the mechanisms underlying temporal relations between diseases need to be inferred. Learning such temporal patterns from event data is crucial for understanding disease pathology and predicting prognoses. To this end, we develop deep diffusion processes (DDP) to model ”dynamic comorbidity networks”, i.e., the temporal relationships between comorbid disease onsets expressed through a dynamic graph. A DDP comprises events modelled as a multi-dimensional point process, with an intensity function parameterized by the edges of a dynamic weighted graph. The graph structure is modulated by a neural network that maps patient history to edge weights, enabling rich temporal representations for disease trajectories. The DDP parameters decouple into clinically meaningful components, which enables serving the dual purpose of accurate risk prediction and intelligible representation of disease pathology. We illustrate these features in experiments using cancer registry data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Diseases share a rich biological structure that unfolds over time, in a manner that the likelihood of a disease onset at a future time may be triggered by sequences of past events (Khan et al., 2018). The causal structure of relationships between diseases can be represented by networks that will often be dynamic in nature. For instance, we would expect that the infection due to a bacterial disease might induce a second infection, which in combination with the former leads to other complications triggered at different times. The strength of edges between network nodes changes over time, depending on the entire patient history. Beyond disease progression, these dynamics are also prevalent in economics, finance and sociology (Ahmed and Xing, 2009; Myers and Leskovec, 2010; Namaki et al., 2011).

In most of these cases, the underlying network dynamics are unknown, and what we observe are sequences of events spreading over the network. To infer the latent network dynamics from observed sequences, one needs to take into account both when and what

events occured in the past since both carry information on the mechanisms involved in disease instantiation and progression. Multi-dimensional point processes are natural candidates for this problem; they explicitly model the time period between events as random variables, and allow them to modulate the

intensity function

— a stochastic model for the time of the next event given previous events. However, traditional parametric models are not expressive enough to capture network dynamics, i.e. the networks they learn are static in nature. On the other hand, existing neural point process models do not entail well-defined network structure due to their complex parameterization.

Figure 1: An Exemplary Realization of DDP. Each node corresponds to a disease ICD-10 code (D64.9: Anemia, N18.9: Renal failure, I63.9: Cerebral infarction, E11.9: Diabetes, I70.209: Atherosclerosis, M81.0: Osteoporosis, and I25.10: Heart disease.) Edge weights are depicted via their thickness. The left panels show the evolution of the disease network, and the righmost panel shows the intensity functions of three selected nodes. Onset of heart disease (at ) triggers spikes in the intensity functions of diabetes and anemia, making them more likely in the future. The onset of diabetes (at ) elevates the risk of anemia (i.e., thicker edge), which consequently occurs at . Edge weights are modulated by a neural network over time.

In this paper, we develop the deep diffusion process (DDP), a deep probabilistic model for diffusion over comorbidity networks based on mutually-interacting point processes as illustrated in Figure 1. We model the DDP intensity function as an combination of contextualized background risk and networked disease interaction. The disease interaction further consists of three components: (1) static pairwise interactions, (2) time-decay, and (3) dynamic influence factors. The first two components are standard in point process models whereas the last component makes use of a deep neural network to (dynamically) update the disease’s influence on future events. The introduction of neural networks does not only add to model capacity, but also enables principled predictions based on clinically interpretable parameters which map the patient history on to personalized comorbidity networks. This brings us closer to understanding the underlying disease mechanisms, which as we hypothesize, leads to better out-of-sample and out-of-domain performances. In our experiments, we provide encouraging results in this direction, with better performance of our model in medical data from a different domain.

2 Related Work

Recently, the problem of inferring latent networks from event data has attracted a lot of attention. In this Section, we highlight previous approaches based on point process formalism, and techniques specifically used in medicine that are relevant to our problem setup.

2.1 Point Processes for Event Streams

2.1.1 Parametric Models

Gomez-Rodriguez et al. (2012) introduced one of the earliest algorithms for discovering latent networks from sequences of events with a transmission process influenced only by the most recent event. The cHawkes model (Choi et al., 2015)

removes the Markovian assumption by modelling the event stream as a Hawkes Process where past events temporarily raise the probability of future events. The resulting network captures the pairwise interaction between any two events. However, the influence of the

combinations of previous events and their timing is not accounted for in the network structure. As a result, the learnt network is constant for all event streams at all time. Hence, we refer to it as a static population-level network.

2.1.2 Neural Network Based Models

Several recent publications have been focusing on expanding the flexibility of point process models by using recurrent neural networks (RNN).

Du et al. (2016) models the inter-arrival time between consecutive events as a univariate point process and annotates each event with a marker to indicate the event type. Importantly, the marker and the arrival time of the next event are conditionally independent given the history. The independence assumption imposes limitations on the expressiveness of the model as there is only one underlying intensity function for all types of events. Neural Hawkes (Mei and Eisner, 2017) models the intensity function directly as a continuous time LSTM. The resulting model has much better flexibility and has achieved the state-of-the-art performance on a variety of prediction tasks. However, the model does not generate a well-defined network between events and it lacks interpretability in general as the hidden state of the RNN do not correspond to clinically meaningful variables.

More recently, the RPPN model (Xiao et al., 2019) incorporates temporal attention mechanism to improve the interpretability of neural point process. The model requires a separate attention function for each possible event type in order to connect the observed past with the unobserved future. This may not be an issue when all types of events occur relatively often, but in the medical domain, the majority of diseases have low prevalence in the population111For example, heart disease is perceived to be very common but it actually occurs in only 1.07% of adults according to official statistics (NHS, 2019a). Rare diseases often have prevalence lower than 0.01%., which means the attention functions for these diseases may not be adequately trained due to scarcity of data. Lamprier (2019) also considered applying neural networks to information diffusion modelling, although the model does not allow the same type of event to occur more than once. In the medical setting, recurrence of previous diseases carries important information about the patient’s health condition. It is also of interest to predict the future recurrence of existing morbidities.


Model
Background Temporal Dependence
Poisson Process 0
Hawkes Process
cHawkes Same as above
Neural Hawkes 0
RPPN 0
DDP
Table 1: Point Process intensity functions. Subscript denotes event type.

is the context vector.

is the time-decay kernel. is the outputs of RNN units and is the output of event ’s attention function. and

are softplus and sigmoid functions respectively.

2.2 Medical Disease Networks

Within the medical community, understanding disease networks and prevalent comorbidities — i.e. any two or more diseases that occur in one person at the same time — is a longstanding area of research. Many rule-based scoring models are based on the clinical and empirical understanding of symptoms and prevalence of specific comorbidities. For example, the Charlson Comorbidity Index (Charlson et al., 1987) was proposed as early as 1987 to predict the ten-year mortality for a patient by summing up the risk indices associated with various comorbid conditions. The index remains the preferred approach in medical community to represent comorbidity history (Quan et al., 2011).

Recent works have also investigated the construction of data-driven dynamic disease networks (Hu et al., 2019; Lee et al., 2019; Beck et al., 2016; Hidalgo et al., 2009). However, with no exception, the networks in these works are constructed in two steps. First, certain pairs of diseases are linked together based on population level statistics such as risk ratio or temporal correlation. Next, the disease pairs are pieced together into longer trajectories or networks. Since all the information used in this process is on population level, the resulting graph is not personalized. Furthermore, constructing the network by combining pieces usually implies strong independence assumptions e.g. Markovian assumption, which rarely holds in disease progression (a real example is given in the appendix). Therefore, the dynamic aspect of disease progression is not adequately represented.

The main contribution of this work is to augment the above approaches by modelling the disease network itself as an individualized dyanmic graph. This allows us to model more complex temporal interactions between diseases as well as provide personalized predictions.

3 Deep Diffusion Process

3.1 Dynamic Network Representation

Consider a dynamic network consisting of a set of vertices annotated with binary labels , and a set of directed edges weighed by . The vertices correspond to the set of all possible event types. At any time, a vertex has label if a type event has occurred or otherwise. The edge set, formally defined as , contains edges that link an observed vertex to another vertex if modulates ’s chance of occurrence. The edge weights represent the strength of such modulation effect between events (Refer to Figure 1).

While we do not observe the network directly at each time, we do have access to individual trajectories through the network, available as a sequence of events and corresponding time points,

(1)

where is the time of occurrence and is the associated event type. From the event sequence, one can immediately derive the vertex label for . Since the vertex set is fixed a-priori depending on the problem scope, the remaining unknown components of the graph are the label for as well as the weighted edges , for .

Determining the future vertex label is known as event prediction, whereas uncovering weighted edges corresponds to network inference. Our goal is to devise a model that addresses both problems simultaneously.

3.2 Preliminaries on Point Process

Before formally introducing the DDP model, we first recapitulate several key concepts of point process.

Lying at the core of point processes is the intensity function (also called intensity function), which is the probability of event occurring in time window given a history , i.e.,

(2)

As we can see in Table 1, different point process models have different parameterizations of the intensity function ranging from the simplest Poisson process to the complex Neural Hawkes and RPPN. However, once the intensity function is given, many interesting properties can be readily derived. For example, the likelihood of an observed sequence is given by

(3)

where and is the collection of all free parameters in the model. As another example, the probability of an type event happening at a specific time is given by

(4)

and the occurrence time of the next event is given by

(5)

The model can be trained in multiple ways. In general, one can maximize the likelihood function 3

via stochastic gradient descent. If the integral term does not have a closed form, it can be approximated by Monte Carlo sampling as done in

Mei and Eisner (2017). In addition, it is also possible to train the model by minimizing the prediction loss based on (4) and (5) (Du et al., 2016).

3.3 Model Specification

This section presents the Deep Diffusion Process — a deep probabilistic model for inferring network dynamics while accurately predicting future events.

3.3.1 intensity function

Our objective is to enrich the intensity function in order to capture the time-dependent disease-to-disease relationships. To this end, we decompose the overall intensity function into two additive components:

(6)

The first term captures the occurrence of events due to static exogenous risk factors . For example, it can model the increased risk of heart attack among the obese patients. The second term models the impact from past events. Each historical event adds an “impulse” to the intensity function of event depending on the event type , the timing , and, most importantly, the event history at the time. This decomposition allows modeling the impact of exogenous factors and that of past events separately.

Next, we introduce the parametric form to capture the impact from past events as follows:

(7)

The parameter captures the instantaneous impact from event to . We use to denote the matrix that contains all the between any two events. The time decay function captures the decaying aspect of previous events’ influence. It is a non-negative function defined on and integrates to one. One common choice is the exponential kernel .

Figure 2: Schematic of the DDP Architecture.

The last component, , is the (dynamic) influence factor that depends on the full patient history. It is learned by an RNN applied to the sequence of past events. As shown in Figure 2, the event is encoded through an embedding , which together with the time gap are fed into the recurrent layer as follows:

(8)

where is the LSTM output. In our implementation, we used standard LSTM with the time gap as an additional input dimension as shown in (8). However, we note that any continuous-time RNN (e.g., phased LSTM (Neil et al., 2016)) is applicable. The influence factor is then given by

(9)

where and are parameters to be learned, and is the sigmoid function.

For training, we use a loss function comprising the likelihood function

in (3) and the cross entropy loss for event type prediction, i.e.,

(10)

where

is a hyperparameter that trade -off the two objectives, and is determined from a validation set. The loss function in (

10) encapsulates our dual objective of a faithful representation for the observed event sequence and the ability to predict the next event.

3.4 Dynamic Network Inference

The parameter , the time-decay function and the influence factors jointly define the network structure at time . is the baseline matrix that encodes static pairwise relationships. The larger the value of , the more influence event will have on event on average. The time-decay function further modulates the link strength based on the time gap between the occurrence of events.

The influence factor modulates and enables the network structure to adapt to the observed event sequence. Based on the full history of past events, the influence factor may strengthen or diminish the impact of one particular event and thus modifies all its outgoing links. Therefore, at time , the directed edge will have weight

(11)

It is worth highlighting that the resulting graph is dynamic in two aspects. First, the influence factor is updated for each event based on the full event history up to that point. Depending on the combination and the timing of historical events, the influence factor for subsequent events will differ, leading to a different graph structure. Secondly, the time-decay function decreases the edge weight as time moves on.

It is often desirable for the graph to have a sparse structure i.e. for many pairs of events . We can introduce a regularization term for matrix to encourage sparsity as proposed in Choi et al. (2015).

Lastly, we note that sometimes it is required to construct a population-level static graph instead of a dynamic graph to capture the high-level event interaction. Static edge weights can be found by averaging out the dynamic components in equation 11 as follows

(12)

where the expectation represents the average influence factor of event .

4 Experiments

In this Section, we utilize data from a large-scale cancer registry to evaluate DDP222Implementation details are provided in the appendix. Throughout our experiments, we evaluate DDP with respect to three aspects: (a) its ability to extract interpretable disease networks that are sensible in the light of current medical literature (Section 4.2), (b) its accuracy in predicting disease pathways (Section 4.3), and (c) its generalizability to out-of-domain datasets (Section 4.4).

4.1 Data Description

Data source.

We used national registry data for a cohort of colorectal cancer patients diagnosed between 2011 and 2015. The data comprises 268,000 observations of 100 common diagnoses for 54,000 patients. Each patient is associated with up to 15 comorbidities. The earliest diagnoses date back to the 1990s, which gives us a fairly broad timescale to study the progression of colorectal cancer. The dataset also records 5 features for each patient. In addition to the primary dataset described above, we have also considered data for 25,000 patients with stomach cancer. It is well understood that patients with stomach cancer are exposed to different risk factors from patients with colorectal cancer (Miller, 1982; Drasar and Irving, 1973). Therefore, we use this dataset as an out-of-domain test set to validate transferability and robustness of DDP.

4.2 Colorectal cancer comorbidity networks

Figure 3:

(a) Illustration of the Jaccard index between two exemplary graphs. (b) heterogeneity of comorbidity networks increases over time in colorectal cancer pathways. (c) Onset of a comorbidity modulates future pathways.

Heterogeneity of disease pathways among patients can be quantified by measuring the distance between their comorbidity networks. A commonly used metric for measuring distance between graphs is the Jaccard index (Real and Vargas, 1996), defined as where and are the edge sets associated with networks and . (An illustrative example is given in Figure 3 (a).) Within a population, the average Jaccard distance between any two individual networks measures the heterogeneity of the population, i.e.,

(13)

where denotes the size of population. The larger the average distance, the more spread out the population. Since the disease networks are time-varying, we compute based on the the networks at time

to reflect the heterogeneity at that moment.

Patient pathways get more heterogeneous over time.

In Figure 3 (b), we track over time (referenced to the date of initial diagnosis). We can readily see that the comorbidity networks learned by DDP become increasingly heterogeneous as time progresses. This reflects the fact that as a patient gets older, more comorbidities will occur and the subsequent disease pathway will become more complex. The increase in heterogeneity also highlights the need for modeling comorbidities with a personalized method since any one-size-fits-all approach will under-appreciate the diversity occurring later in the pathway. The ability of DDP to accurately predict the heterogeneity of colorectal cancer pathways is assessed in Section 4.3.

Figure 3 (c) shows how the network dynamically adapts to “influencers”, diseases which trigger a large variety of comorbidities and complications. The figure displays change in before and after a disease onset relative to the population average. Positive value means increase in heterogeneity relative to the population, negative value otherwise, and zero means no change. Time is normalized so that the disease of interest always occurs at time 0. The red, blue and orange lines represent chronic obstructive pulmonary disease (COPD), Type 2 diabetes mellitus and heart failure respectively. It is well-established in the medical literature that all three types of diseases have complex heterogeneous comorbidity pathway (Fabbri et al., 2008; Stratton et al., 2000; Paulus and Tschöpe, 2013). This is clearly reflected in Figure 3 (c) where the onset of these diseases triggers an immediate and persistent increase in the heterogeneity of comorbidity networks. On the other hand, the red line represents varicose veins of lower extremities, a mild condition that often does not need treatment (NHS, 2019b). It is thus unsurprising to see that patients with this condition usually have less heterogeneous disease networks than the average.

The above analysis shows DDP’s ability to adjust the subsequent comorbidity pathway based on the occurrence of individual diseases. This high resolution view can help medical researchers better understand the taxonomy of diseases. There is a growing interest in the literature to re-think about the classification or even the definition of a disease based on its evolving pathways (Ahlqvist et al., 2018).

Figure 4: The dynamic comorbidity network learned by DDP for an individual patient at three time steps, together with the corresponding intensity function. Nodes for diseases that have not occurred are colored in gray, and disease already diagnosed are assigned a distinct color. Edge thickness correspond to the disease likelihood at the given time step. In the upper left panel, we plot the Jaccard distance of the patient’s network with respect to the average population as a function of time (on a logarithmic scale). The static comorbidity network obtained by counting disease co-occurrences and using the counts as graph edges is depicted on the right panel.

Individual-level comorbidity networks. Since we study colorectal and stomach cancer, it is natural to look at comorbidities related to the digestive system. Figure 4 depicts the evolution of the dynamic comorbidity network of five common gastrointestinal disorders (for one patient’s pathway) as inferred by DDP — the comorbidities included: diverticular diseases (ICD-10 code K57), intestinal disorders (K63), benign neoplasm in the colon and rectum (D12), diverticular diseases (K57), and ulcerative colitis (K52). The intensity function corresponding to the patient’s trajectory is shown in the bottom panel. Each edge’s thickness in the network corresponds to the likelihood of the disease designating the receiving node to occur at a given time step. The individual patient’s dynamic network is contrasted with a static network (upper right panel) constructed directly from raw data by counting the co-occurrences of each pair of comorbidities and weighting edges accordingly.

As we can see in Figure 4, the DDP comorbidity network is fairly dense at each time step, which suggests that the diseases are related. In fact, numerous medical publications have examined associations between these diseases. For example, it has been established that a lack of dietary fiber intake underlies the onset of diverticular diseases, intestinal disorders, and tumours of the colon and rectum (Painter and Burkitt, 1971; Burkitt et al., 1972). Furthermore, there are strong epidemiological evidences of associations between tumours of colon and rectum, diverticular diseases, and ulcerative colitis (Ekbom et al., 1990; Burkitt, 1971).

Figure 4 shows that the dynamics of the inferred individualized comorbidity network cannot be deduced from the approach presented in Beck et al. (2016)

. That is, at each time step the RNN component of the DDP adapts the weights on the network edges to reflect the impact of previous diagnoses on the odds of future ones. Moreover, the weights of the network edges reflect the

timing at which future comorbidities are expected to occur — for instance, D12 occurs only 23 days after diagnosis, which was correctly anticipated by the DDP network as it assigned a large weight to the edge connecting the (pre-existing comorbidity) K62 and D12 at diagnosis time. On the contrary, K57 occured more than 8 months after diagnosis, which also was anticipated by DDP model having assigned a smaller weight to edges flowing into K57 node.

By measuring the average (Jaccard) distance between the comorbidity network of the patient at hand and those of the overall patient population (upper left panel of Figure 4), we can see that the patient’s network diverges from the typical population-level pathway as time progresses. This emphasizes the importance of the personalization aspect of the DDP model in predicting patient prognosis in later stages of the disease as we will show in the next Section.

4.3 Predicting colorectal cancer pathways

Prediction targets and evaluation metric.

Each individual patient has a unique disease pathway and a good representation of their health trajectory should enable differentiating these pathways. We evaluate how well DDP discerns the future disease pathways by predicting the next event. Given a disease history , the models try to predict the probability of having a disease at time . The time represents the time of the next disease onset available in the dataset. We chose to predict the incidental risk at a given time due to the nature of our data. For most chronic diseases including cancer, the diagnosis occur much later than the actual onset. Hence the true disease onset time as well as the time between disease onsets are never observed. By focusing on predicting the diseases at known diagnosis time, evaluation becomes more objective and less prone to unknown variation. We calculate the Area Under ROC (AUC) score for predicting prevalent comorbidities.

ICD-10 code DDP Neural Hawkes cHawkes Charlson RETAIN
I50 0.74 0.0114 0.72 0.0127 0.69 0.0136 0.68 0.0111 0.73 0.0123
N39 0.64 0.0085 0.62 0.0085 0.59 0.0083 0.58 0.0079 0.65 0.0085
A41 0.72 0.0091 0.72 0.0092 0.71 0.0091 0.60 0.0098 0.70 0.0101
D12 0.69 0.0053 0.67 0.0055 0.66 0.0055 0.59 0.0055 0.66 0.0059
E86 0.72 0.0225 0.72 0.0235 0.69 0.0222 0.52 0.0222 0.58 0.0222
I25 0.79 0.0081 0.77 0.0089 0.77 0.0087 0.63 0.0085 0.77 0.0084
K63 0.68 0.0061 0.64 0.0064 0.64 0.0063 0.60 0.0060 0.65 0.0065
K83 0.69 0.0217 0.68 0.0225 0.66 0.0209 0.63 0.0200 0.62 0.0224
Table 2: AUC (

95% confidence intervals) performance for all baselines. Best performance is highlighted in bold font.

ICD-10 code DDP Neural Hawkes cHawkes Charlson RETAIN
I50 0.73 0.0089 0.69 0.0091 0.65 0.0102 0.67 0.0195 0.71 0.0093
N39 0.65 0.0063 0.56 0.0066 0.59 0.0064 0.62 0.0142 0.63 0.0066
A41 0.69 0.0065 0.59 0.0070 0.65 0.0070 0.62 0.0142 0.66 0.0072
D12 0.68 0.0065 0.56 0.0068 0.66 0.0065 0.57 0.0147 0.63 0.0078
E86 0.65 0.0127 0.52 0.0125 0.62 0.0121 0.55 0.0321 0.56 0.0122
I25 0.78 0.0049 0.66 0.0058 0.75 0.0054 0.59 0.0117 0.75 0.0053
K63 0.65 0.0077 0.58 0.0079 0.60 0.0078 0.57 0.0164 0.63 0.0083
K83 0.69 0.0126 0.60 0.0135 0.65 0.0123 0.57 0.0284 0.63 0.0128
Table 3: Out-of-domain AUC performance for all baselines. Best performance is highlighted in bold font.

Benchmarks. We compare the performance of DDP with Neural Hawkes (Mei and Eisner, 2017), cHawkes (Choi et al., 2015), Charlson Score (Charlson et al., 1987), and RETAIN (Choi et al., 2016), which is a recurrent neural network with temporal attention mechanism akin to RPPN (Xiao et al., 2019). Section 2 contains a detailed review of these models.

Results. The results are shown in Table 2. In five out of eight cases DDP achieved the best performance. In the rest three cases, the performance of DDP is comparable to that of Neural Hawkes or RETAIN. However, in these three cases, DDP does not only offer a competitive predictive accuracy, but infers the comorbidity network as well — comorbidity networks cannot be straightforwardly inferred from the parameters of Neural Hawkes and RETAIN. This does not only provide more elaborate interpretability, but as we show in Section 4.4, it enables better generalization to out-of-domain data as we will. In addition, we can readily see that DDP always outperforms cHawkes and the Charlson Score. This suggests that the history-independent triggering mechanics of cHawkes and Charlson score do not adequately capture the disease complexity.

4.4 Transferability to other types of cancer

Finally, we applied all baselines originally trained on the primary dataset (colorectal cancer) to the out-of-domain dataset (stomach cancer) without re-training. All other aspects of the experimental setup remains the same as the previous Section. The results are illustrated in Table 3. We can clearly see that DDP outperforms all the benchmarks including Neural Hawkes by a big margin on the out-of-domain samples.

We performed a post-hoc analysis to better understand what the DDP model has learned. First off, we performed Chi-squared tests to test whether the prevalence of individual diseases or the occurrence of disease pairs are different in the two data sets. In both cases, the test concluded that the distributions are different.(-value 0.001). This finding suggests that the disease networks constructed based on population level statistics such as those reviewed in Section 2.2 will tend to be different

for the two datasets. Next, we randomly sampled a subset of patients from each of the two datasets and calculated Jaccard distance within and between the datasets. Student’s t-test concluded the average distance between two groups are smaller than the distance within the group (

-value 0.01). This indicates that the graph heterogeneity across datasets are smaller than the one within. In other words, the disease network learned by DDP applies to both sets of patients and is generalizable across cancer sites.

5 Conclusion

In this paper, we developed a novel diffusion process (DDP) that utilizes deep neural networks to enable both accurate prediction of disease trajectories and interpretable representations of disease pathways. Combining the findings in Section 4, we can see that DDP can offer more nuanced understanding of disease progression mechanisms, more accurate prediction of patient pathways, and better generalizability across different diseases. By taking into account the full disease history, the learned DDP comorbidity networks are well equipped to deal with individual-level disease trajectories in a data-driven fashion, improving over existing one-size-fits-all clinical guidelines. The DDP model transferability is a major advantage of DDP in medical applications where the data for certain sub-populations are still scarce (e.g. in rare diseases). Moreover, the comorbidity networks uncovered by DDP may enable researchers to formulate hypotheses about the causal relations between diseases.

References

  • E. Ahlqvist, P. Storm, A. Käräjämäki, M. Martinell, M. Dorkhan, A. Carlsson, P. Vikman, R. B. Prasad, D. M. Aly, P. Almgren, et al. (2018)

    Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables

    .
    The lancet Diabetes & endocrinology 6 (5), pp. 361–369. Cited by: §4.2.
  • A. Ahmed and E. P. Xing (2009) Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences 106 (29), pp. 11878–11883. Cited by: §1.
  • M. K. Beck, A. B. Jensen, A. B. Nielsen, A. Perner, P. L. Moseley, and S. Brunak (2016) Diagnosis trajectories of prior multi-morbidity predict sepsis mortality. Scientific reports 6, pp. 36624. Cited by: §2.2, §4.2.
  • D. P. Burkitt, A. Walker, and N. S. Painter (1972) Effect of dietary fibre on stools and transit-times, and its role in the causation of disease. The Lancet 300 (7792), pp. 1408–1411. Cited by: §4.2.
  • D. P. Burkitt (1971) Epidemiology of cancer of the colon and rectum. Cancer 28 (1), pp. 3–13. Cited by: §4.2.
  • M. E. Charlson, P. Pompei, K. L. Ales, and C. R. MacKenzie (1987)

    A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

    .
    Journal of chronic diseases 40 (5), pp. 373–383. Cited by: §2.2, §4.3.
  • E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart (2016) Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems, pp. 3504–3512. Cited by: §4.3.
  • E. Choi, N. Du, R. Chen, L. Song, and J. Sun (2015) Constructing disease network and temporal progression model via context-sensitive hawkes process. In 2015 IEEE International Conference on Data Mining, pp. 721–726. Cited by: §2.1.1, §3.4, §4.3.
  • B. Drasar and D. Irving (1973) Environmental factors and cancer of the colon and breast. British Journal of Cancer 27 (2), pp. 167. Cited by: §4.1.
  • N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song (2016) Recurrent marked temporal point processes: embedding event history to vector. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1555–1564. Cited by: §2.1.2, §3.2.
  • A. Ekbom, C. Helmick, M. Zack, and H. Adami (1990) Ulcerative colitis and colorectal cancer: a population-based study. New England journal of medicine 323 (18), pp. 1228–1233. Cited by: §4.2.
  • L. Fabbri, F. Luppi, B. Beghé, and K. Rabe (2008) Complex chronic comorbidities of copd. European Respiratory Journal 31 (1), pp. 204–212. Cited by: §4.2.
  • M. Gomez-Rodriguez, J. Leskovec, and A. Krause (2012) Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5 (4), pp. 21. Cited by: §2.1.1.
  • C. A. Hidalgo, N. Blumm, A. Barabási, and N. A. Christakis (2009) A dynamic network approach for the study of human phenotypes. PLoS computational biology 5 (4), pp. e1000353 (eng). External Links: ISSN 1553-7358, Document Cited by: §2.2.
  • J. X. Hu, M. Helleberg, A. B. Jensen, S. Brunak, and J. Lundgren (2019) A large-cohort, longitudinal study determines precancer disease routes across different cancer types. Cancer research 79 (4), pp. 864–872. Cited by: §2.2.
  • A. Khan, S. Uddin, and U. Srinivasan (2018) Comorbidity network for chronic disease: a novel approach to understand type 2 diabetes progression. International journal of medical informatics 115, pp. 1–9. Cited by: §1.
  • S. Lamprier (2019) A recurrent neural cascade-based model for continuous-time diffusion. In

    International Conference on Machine Learning

    ,
    pp. 3632–3641. Cited by: §2.1.2.
  • D. Lee, M. Kim, and H. Shin (2019) Inference on chains of disease progression based on disease networks. PloS One 14 (6), pp. e0218871 (eng). External Links: ISSN 1932-6203, Document Cited by: §2.2.
  • H. Mei and J. M. Eisner (2017) The neural hawkes process: a neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pp. 6754–6764. Cited by: §2.1.2, §3.2, §4.3.
  • A. Miller (1982) Risk factors from geographic epidemiology for gastrointestinal cancer.. Cancer 50 (11 Suppl), pp. 2533–2540. Cited by: §4.1.
  • S. Myers and J. Leskovec (2010) On the convexity of latent social network inference. In Advances in neural information processing systems, pp. 1741–1749. Cited by: §1.
  • A. Namaki, A. Shirazi, R. Raei, and G. Jafari (2011) Network analysis of a financial market based on genuine correlation and threshold method. Physica A: Statistical Mechanics and its Applications 390 (21-22), pp. 3835–3841. Cited by: §1.
  • D. Neil, M. Pfeiffer, and S. Liu (2016) Phased lstm: accelerating recurrent network training for long or event-based sequences. In Advances in neural information processing systems, pp. 3882–3890. Cited by: §3.3.1.
  • NHS (2019a) Condition prevalence. Note: https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/general-practice-data-hub/condition-prevalenceAccessed: 2019-09-30 Cited by: footnote 1.
  • NHS (2019b) Treatment for varicose veins. Note: https://www.nhs.uk/conditions/varicose-veins/treatment/Accessed: 2019-09-30 Cited by: §4.2.
  • N. S. Painter and D. P. Burkitt (1971) Diverticular disease of the colon: a deficiency disease of western civilization.. British medical journal 2 (5759), pp. 450. Cited by: §4.2.
  • W. J. Paulus and C. Tschöpe (2013) A novel paradigm for heart failure with preserved ejection fraction: comorbidities drive myocardial dysfunction and remodeling through coronary microvascular endothelial inflammation. Journal of the American College of Cardiology 62 (4), pp. 263–271. Cited by: §4.2.
  • H. Quan, B. Li, C. M. Couris, K. Fushimi, P. Graham, P. Hider, J. Januel, and V. Sundararajan (2011) Updating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American journal of epidemiology 173 (6), pp. 676–682. Cited by: §2.2.
  • R. Real and J. M. Vargas (1996) The probabilistic basis of jaccard’s index of similarity. Systematic biology 45 (3), pp. 380–385. Cited by: §4.2.
  • I. M. Stratton, A. I. Adler, H. A. W. Neil, D. R. Matthews, S. E. Manley, C. A. Cull, D. Hadden, R. C. Turner, and R. R. Holman (2000) Association of glycaemia with macrovascular and microvascular complications of type 2 diabetes (ukpds 35): prospective observational study. Bmj 321 (7258), pp. 405–412. Cited by: §4.2.
  • S. Xiao, J. Yan, M. Farajtabar, L. Song, X. Yang, and H. Zha (2019) Learning time series associated event sequences with recurrent point process networks. IEEE transactions on neural networks and learning systems. Cited by: §2.1.2, §4.3.

References

  • E. Ahlqvist, P. Storm, A. Käräjämäki, M. Martinell, M. Dorkhan, A. Carlsson, P. Vikman, R. B. Prasad, D. M. Aly, P. Almgren, et al. (2018)

    Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables

    .
    The lancet Diabetes & endocrinology 6 (5), pp. 361–369. Cited by: §4.2.
  • A. Ahmed and E. P. Xing (2009) Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences 106 (29), pp. 11878–11883. Cited by: §1.
  • M. K. Beck, A. B. Jensen, A. B. Nielsen, A. Perner, P. L. Moseley, and S. Brunak (2016) Diagnosis trajectories of prior multi-morbidity predict sepsis mortality. Scientific reports 6, pp. 36624. Cited by: §2.2, §4.2.
  • D. P. Burkitt, A. Walker, and N. S. Painter (1972) Effect of dietary fibre on stools and transit-times, and its role in the causation of disease. The Lancet 300 (7792), pp. 1408–1411. Cited by: §4.2.
  • D. P. Burkitt (1971) Epidemiology of cancer of the colon and rectum. Cancer 28 (1), pp. 3–13. Cited by: §4.2.
  • M. E. Charlson, P. Pompei, K. L. Ales, and C. R. MacKenzie (1987)

    A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

    .
    Journal of chronic diseases 40 (5), pp. 373–383. Cited by: §2.2, §4.3.
  • E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart (2016) Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems, pp. 3504–3512. Cited by: §4.3.
  • E. Choi, N. Du, R. Chen, L. Song, and J. Sun (2015) Constructing disease network and temporal progression model via context-sensitive hawkes process. In 2015 IEEE International Conference on Data Mining, pp. 721–726. Cited by: §2.1.1, §3.4, §4.3.
  • B. Drasar and D. Irving (1973) Environmental factors and cancer of the colon and breast. British Journal of Cancer 27 (2), pp. 167. Cited by: §4.1.
  • N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song (2016) Recurrent marked temporal point processes: embedding event history to vector. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1555–1564. Cited by: §2.1.2, §3.2.
  • A. Ekbom, C. Helmick, M. Zack, and H. Adami (1990) Ulcerative colitis and colorectal cancer: a population-based study. New England journal of medicine 323 (18), pp. 1228–1233. Cited by: §4.2.
  • L. Fabbri, F. Luppi, B. Beghé, and K. Rabe (2008) Complex chronic comorbidities of copd. European Respiratory Journal 31 (1), pp. 204–212. Cited by: §4.2.
  • M. Gomez-Rodriguez, J. Leskovec, and A. Krause (2012) Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5 (4), pp. 21. Cited by: §2.1.1.
  • C. A. Hidalgo, N. Blumm, A. Barabási, and N. A. Christakis (2009) A dynamic network approach for the study of human phenotypes. PLoS computational biology 5 (4), pp. e1000353 (eng). External Links: ISSN 1553-7358, Document Cited by: §2.2.
  • J. X. Hu, M. Helleberg, A. B. Jensen, S. Brunak, and J. Lundgren (2019) A large-cohort, longitudinal study determines precancer disease routes across different cancer types. Cancer research 79 (4), pp. 864–872. Cited by: §2.2.
  • A. Khan, S. Uddin, and U. Srinivasan (2018) Comorbidity network for chronic disease: a novel approach to understand type 2 diabetes progression. International journal of medical informatics 115, pp. 1–9. Cited by: §1.
  • S. Lamprier (2019) A recurrent neural cascade-based model for continuous-time diffusion. In

    International Conference on Machine Learning

    ,
    pp. 3632–3641. Cited by: §2.1.2.
  • D. Lee, M. Kim, and H. Shin (2019) Inference on chains of disease progression based on disease networks. PloS One 14 (6), pp. e0218871 (eng). External Links: ISSN 1932-6203, Document Cited by: §2.2.
  • H. Mei and J. M. Eisner (2017) The neural hawkes process: a neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pp. 6754–6764. Cited by: §2.1.2, §3.2, §4.3.
  • A. Miller (1982) Risk factors from geographic epidemiology for gastrointestinal cancer.. Cancer 50 (11 Suppl), pp. 2533–2540. Cited by: §4.1.
  • S. Myers and J. Leskovec (2010) On the convexity of latent social network inference. In Advances in neural information processing systems, pp. 1741–1749. Cited by: §1.
  • A. Namaki, A. Shirazi, R. Raei, and G. Jafari (2011) Network analysis of a financial market based on genuine correlation and threshold method. Physica A: Statistical Mechanics and its Applications 390 (21-22), pp. 3835–3841. Cited by: §1.
  • D. Neil, M. Pfeiffer, and S. Liu (2016) Phased lstm: accelerating recurrent network training for long or event-based sequences. In Advances in neural information processing systems, pp. 3882–3890. Cited by: §3.3.1.
  • NHS (2019a) Condition prevalence. Note: https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/general-practice-data-hub/condition-prevalenceAccessed: 2019-09-30 Cited by: footnote 1.
  • NHS (2019b) Treatment for varicose veins. Note: https://www.nhs.uk/conditions/varicose-veins/treatment/Accessed: 2019-09-30 Cited by: §4.2.
  • N. S. Painter and D. P. Burkitt (1971) Diverticular disease of the colon: a deficiency disease of western civilization.. British medical journal 2 (5759), pp. 450. Cited by: §4.2.
  • W. J. Paulus and C. Tschöpe (2013) A novel paradigm for heart failure with preserved ejection fraction: comorbidities drive myocardial dysfunction and remodeling through coronary microvascular endothelial inflammation. Journal of the American College of Cardiology 62 (4), pp. 263–271. Cited by: §4.2.
  • H. Quan, B. Li, C. M. Couris, K. Fushimi, P. Graham, P. Hider, J. Januel, and V. Sundararajan (2011) Updating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American journal of epidemiology 173 (6), pp. 676–682. Cited by: §2.2.
  • R. Real and J. M. Vargas (1996) The probabilistic basis of jaccard’s index of similarity. Systematic biology 45 (3), pp. 380–385. Cited by: §4.2.
  • I. M. Stratton, A. I. Adler, H. A. W. Neil, D. R. Matthews, S. E. Manley, C. A. Cull, D. Hadden, R. C. Turner, and R. R. Holman (2000) Association of glycaemia with macrovascular and microvascular complications of type 2 diabetes (ukpds 35): prospective observational study. Bmj 321 (7258), pp. 405–412. Cited by: §4.2.
  • S. Xiao, J. Yan, M. Farajtabar, L. Song, X. Yang, and H. Zha (2019) Learning time series associated event sequences with recurrent point process networks. IEEE transactions on neural networks and learning systems. Cited by: §2.1.2, §4.3.