I Introduction
Online healthcare forums and communities [3, 1, 5] such as the Breast Cancer Community have greatly changed the way patients seek healthrelated information and have become an important part of patients’ lives. The communications and interactions between patients in online forums can provide valuable information about a patient’s emotional wellbeing and behaviors related to the management of their health that conventional clinical data collected from hospital information systems and electronic health records (EHR) is unable to capture. The synergies between the information on patients’ online communication and health status make possible a unique and wide range of research topics on health informatics [6, 33, 34] that rely on both patients’ interactions in online forums as well as their health stage records.
However, the health stage information in the online health community has some unique challenges and characteristics. First, though some patients share their disease history, as shown in Figure 1, such information is not provided or is simply missing for many others. For instance, over 36% active users that registered within recent 2 years have not yet shared their disease history in the Breast Cancer Community. Second, different subforums under specific topics are often correlated to specific disease stages. For example, in the online breast cancer forum, the patients who are active in the “Chemotherapy  Before, During, and After” subforum typically look for information related to their Chemotherapy treatment. Third, as the patients’ health conditions progress over time, they often move from one set of subforums to others that are more related to their new health stages. Therefore, for each patient, these transitions among subforums can lead to an interconnected subforum activity network that evolves over time, which could be highly entangled with the progress of patient’s health status, as shown in Figure 2.
The ability to accurately infer users’ missing health stage information is crucial, as this could enable health care organizations to better support patients by pinpointing the most valuable information for each at their particular health stage [11]. To infer the missing user health stage information, the correspondence between the users’ forum activities and their health stage history needs to be accurately identified and modeled. Naturally, the networked and timeevolving forum activity data can be formulated as a dynamic sequence of user activity transition graphs that change over time. In addition, the target user health stage history can be formulated as a sequence that needs to be inferred. Thus, without loss of generality, a new generic task is presented here where the goal is to learn the mapping from a sequence of graphstructured data to a target sequence. In this paper, we limit our scope to the domain of online health forums and focus on health stage sequence prediction based on online health forums data.
However, capturing the highlevel mapping between the evolution of the user activity networks and the changes in the corresponding user’s health stage can be very difficult due to the following challenges: 1) Difficulty in modeling the forum data, which is dynamic, networked, and multiattributed. A user’s activities in the various subforums can change dynamically over time and these activity transitions naturally bridge different subforums. 2) Difficulty in learning the association between a sequence of user activity networks and the corresponding sequence of health stages. The sequence of user activity networks contains complicated graphstructured information that dynamically evolves over time. Developing endtoend learning between such dynamic complex data and a specific sequence is highly difficult. 3) Lack of interpretability of the health stage sequence inference process. The sequence of user activity networks has a twolevel hierarchical structure, namely node (i.e., subforum) to network level, and network to health stage level. It is thus a major objective to incorporate this hierarchical structural information into the development of an interpretable health stage inference process.
In this paper, we formally define the generic learning problem of health stage sequence inference using online forum data and propose the first framework to address the aforementioned challenges effectively. The contribution of this paper is fourfold: 1) we define the health stage inference problem in online health forums and formulate the user activities as transition graphs that are capable of modeling user dynamic transitions between subforums and their complex relationships; 2) we propose a novel deep neural encoderdecoder framework for learning the mapping between complex dynamic graph sequence inputs and the target output sequence; 3) we propose a new dynamic graph hierarchical attention mechanism that captures both the timelevel and nodelevel attention, thus providing model transparency throughout the whole inference process; 4) experiments on online health forum dataset demonstrate that our proposed models outperform conventional sequence inference methods. In addition, our qualitative analyses and case studies provide interpretable insights into the learning results of the proposed model and its variations.
Ii Related work
Our model draws inspiration from the research fields of online health community analysis, dynamic graph learning, attention mechanisms, and neural encoderdecoder models.
Iia Online Health Communities Analysis
A number of studies have focused on the analysis and utilization of online health communities data. Popular social media is good for aggregate level pattern mining tasks [35, 25]. However, their power is limited for discovering individuallevel health stages and health network patterns due to the privacy issues involved and data scarcity. There have been several analyses of breast cancer forum data [6, 33]
and, more recently, machine learning models have been used for longitudinal analysis
[34] and some binary classification tasks[11]. However, we are the first to propose a general framework that can achieve health stage sequence inference using online forum data.IiB Dynamic Graph Representation Learning
As an emerging topic in the graph representation learning domain, dynamic graph learning has attracted a great deal of attention from researchers in recent years [23, 36, 8]. However, these graph embedding techniques typically focus on learning representations of the graphs, such as node embedding, but in many realworld applications the aim is to learn some highlevel knowledge from the graph data, such as graph classification tasks [26, 27] and graph to sequence tasks [28, 30]. An endtoend learning model is thus needed to learn the mapping between the whole sequence of graph data and the target output sequence, instead of merely focusing on learning node representations.
IiC Attention Mechanism
The attention mechanism was first proposed by [2] and has been widely used for machine translation tasks [17, 32]. The attention mechanism has also been introduced in the graph representation learning domain [24, 31]. However, there is little to no work that focuses specifically on studying the unique hierarchical structure that is naturally present in dynamic graphs.
IiD Neural EncoderDecoder Models
The neural encoderdecoder models [4, 2] have been widely extended to model the mapping of general object inputs to their corresponding sequences [7, 22]. Recent advances in graph deep learning and graph convolutional networks have enabled various graph deep learning models to handle challenges in the domains of graph generation [9, 20, 14] and graphtosequence learning [29]. However, there have been no reports of work that explores dynamic graph to sequence learning, where the natural sequential order contained in a dynamic graph and its sequences might be advantageous for neural encoderdecoder models.
Iii Problem Formulation
Iiia User Forum Activities as a Dynamic Graph
An activity transition network is formulated naturally as follows. User activities are first partitioned into a series of time windows. We then begin by formulating a node for each subforum, with a transition from one forum to the other deemed to occur if the most active forum (based on visiting time or number of postings) switches from the former to the latter, creating a directed ‘edge’ between them. Each node (i.e., subforum) also records the user activity in the forum to build the activity transition network. Naturally, such timeordered activity transition networks can be formally defined as dynamic graphs, also known as temporal networks in the network science literature [13], that capture the complex dynamic characteristics and timeevolving features of graphs, as defined in the following.
Definition 1.
(dynamic graph). A dynamic graph is an ordered sequence of separate graphs on the same set of nodes, with each snapshot graph characterized by a weighted adjacency matrix and a set of node features for a given time window, where represents the total number of node features.
We can now formulate the activity transition networks as a dynamic graph, illustrated in Figure 2. Here, the dynamic graph contains a sequence of snapshot graphs that characterize user activities in the online forum for a given time period, where represents the snapshot graph for simplicity. Each node represents a subforum devoted to a specific topic and the edges capture the user’s movement between different subforums at a given time window shown as blue boxes. Each node contains a set of features that represents the topics covered by the specific subforum. By formulating user activities as dynamic graphs, the mapping between the evolution of the user activity and the changes of user’s health stages will be preserved.
IiiB Learning Sequence from Dynamic Graph
As we can see from Figure 2, there is a clear mapping between the evolution of the user activity dynamic graph and changes in the corresponding user’s health stage. Motivated by this observation, we can formulate such problems as a general dynamic graph to sequence problem as follows:
Given a dynamic graph as input data, the goal is to predict the target sequence , where is the th token of the output sequence in vocabulary ; and and are the input graph sequence length and output sequence length, respectively. Formally, this problem is equivalent to learning a translation mapping from input dynamic graph to a sequence as .
The translation mapping problem between some source objects and target sequences has been widely studied, including both graphtosequence [29] and sequencetosequence [21, 4] formulations. However, dynamicgraphtosequence translation is more complex and poses several unique challenges, namely 1) Difficulty in comprehensively modeling the dynamic multiattributed networkstructured data, as both complex relationships and dynamic evolving characteristics need to be captured; 2) The temporal dependency of snapshot graphs in the dynamic graph need to be modeled and constrained by the learning model; and 3) The learned translation mapping is often obscure and hard to explain or verify. This is because the original lowlevel representation (i.e. the node level at a specific time) is aggregated into the highlevel representation (i.e. the dynamic graph as a whole), making it much more difficult to backtrack and explain the correspondence.
Iv Dynamic GraphToSequence Model
Iva Dynamic Graph Encoder
The base model of our graph convolutional network for each snapshot graph is inspired by graph2seq [29], which was originally proposed for addressing static graphtosequence learning problems. The Graph2Seq model employs an inductive node embedding algorithm that generates bidirectional node embeddings by aggregating information from a node local forward and backward neighborhood within hops for a static graph. We extend this idea for dynamic graphs by applying such graph convolution on each snapshot graph within dynamic graph inputs. Specifically, suppose the total number of hops is
, then the hidden representation of
th node in the snapshot graph after applying the first graph convolutional layer will be computed as follows:(1)  
(2)  
(3) 
where represents the set of forward neighbor nodes of node , whereas represents the set of backward neighbor nodes; and are learnable parameters for the first convolution layer.
is the feature vector of node
in a snapshot graph at time step ;represents the activation function of the network (e.g. ReLU); the
function takes the elementwise mean of the set of vectors in the equation; and concatenates the two row vectors into a single row vector.Likewise, for hop , the hidden representation of the th node in the snapshot graph can be computed via the hidden representations computed from layer . Finally, after applying layers of convolutions, the final hidden representation of the th node in the snapshot graph will be output as .
In order to capture the highlevel representation of graphs for endtoend graph learning, aggregating node level embeddings to graph level embedding that conveys the entire graph information is essential. To achieve this, we adopt the max pooling operation proposed by
[29, 26] as the base aggregation function, which feeds the node embeddings to a fullyconnected layer and then applies the max pooling method elementwise for each snapshot graph to yield a sequence of graphlevel representations . To model the graph dynamic changes and longterm dependencies throughout thesteps, we utilize Long Short Term Memory (LSTM) networks
[10] as a graph embedding sequence encoder to learn the entire dynamic graphlevel embedding.IvB Sequence Decoder with Dynamic Graph Hierarchical Attention
Once the dynamic graph encoder takes the sequence of snapshot graphs and aggregates node embeddings to generate a sequence of graphlevel embeddings that capture the entire dynamic graph’s global characteristics, the LSTM layer will output the final hiddenstate of encoder to summarize all the graphlevel embeddings. Then, in the sequence decoding phase, we utilize a conventional sequence decoder [16] and set the initial cell state of the decoder as in order to decode the target sequence .
However, there are two issues with this simple sequence decoder: 1) the effectiveness of the sequence decoder depends on the length of the dynamic graph sequence; and 2) the predicted user’s health stage sequence need to be interpretable based on the dynamic graph sequence at both the timelevel and nodelevel . To handle the above questions pertaining to model interpretability, we propose a novel dynamic graph hierarchical attention mechanism that includes nodetograph and graphtosequence attention that is capable of enhancing the interpretability for node embedding aggregation and capture the hierarchical structure of user online forum activities over time more effectively.
IvB1 NodetoGraph Attention
Once the node embeddings of a graph have been computed, an average or max pooling operation [29, 26] is typically employed as the base aggregation function to obtain the graphlevel embedding for the current graph. Although this works well in their individual settings, it does not work properly in our case since not all node embeddings contribute equally to the representation of the graph. For example, although a patient may view multiple subforums within a given time period, only a few important subforums will be correlated with the specific health stage of the patient. Therefore, it is vital to identify these important nodes (subforums) that contribute most to representing the embedding of the current graph. Inspired by [19], we adopt the feedforward attention to aggregate the node embeddings and formulate the graphlevel embeddings. Figure 4 shows an example of how the nodetograph attention is computed for a snapshot graph . For a given snapshot graph at step , the nodetograph attention is given as follows:
where the function is a learnable function that depends on the node embeddings ; and denotes the aggregated graphlevel embedding for a snapshot graph at step . In this formulation, the attention weights explicitly model the importance of each node when constructing the graphlevel representation of . Clearly, we can utilize the attention weight information for each node to pinpoint which nodes (subforums) are highly related to the current health stage. We will discuss the interpretability of our nodetograph attention in detail in the experimental Section.
IvB2 GraphtoSequence Attention
Once the graphlevel embedding has been obtained for each snapshot graph , the whole sequence of graph embeddings is fed into the sequence decoder, which generates the global hidden embedding that characterizes the entire sequence of dynamic graph information. Following the conventional encoderdecoder setup, is set as the initial hidden state for the sequence decoder from which to generate the target sequence of the health stages.
Although the hidden vector theoretically contains all the information needed for generating the target sequence, the encoder’s hidden representation also contains valuable information about the snapshot graph information at that time step during the sequence encoding. To reward such snapshot graphs, we use the attention mechanism and introduce graphtosequence level attention to measure the importance of each snapshot graph with the target sequence. Specifically, as shown in Figure 4, the graphtosequence attention takes the sequence of hidden states for each graph in the dynamic graph sequence as additional inputs to the decoder. This forces the decoder to consider both the current hidden state and the attention alignments between each word generated and for the whole sequence .
V Experiments
We evaluated the performance of our proposed model utilizing a realworld online health forum, namely the breast cancer community. All the experiments were conducted on a 64bit machine with Intel(R) Xeon(R) W2155 CPU 3.30GHz processor, 32GB memory and an NVIDIA TITAN Xp GPU.
Model  BLEU1  BLEU2  BLEU3  BLEU4  ROUGE 
NMT(seq2seq) (w/o att)  55.52.38  38.40.91  27.10.90  19.20.87  71.61.04 
NMT(seq2seq) (w/ att)  57.81.86  40.41.21  29.01.28  20.11.06  72.90.86 
Graph2Seq (w/o att)  57.51.72  41.50.94  29.80.72  20.30.85  75.81.20 
Graph2Seq (w/ att)  58.22.19  41.11.38  30.10.83  21.00.51  76.20.96 
DynGraph2Seq (w/o att)  60.91.53  43.71.00  31.50.63  22.10.48  79.30.80 
DynGraph2Seq (w/ att)  62.31.46  44.71.29  32.00.94  22.51.13  80.80.36 
Va Experimental Settings
Online Breast Cancer Community Dataset: The Breast Cancer Community [3] is one of the largest online forums designed for patients to share information related to breast cancer. The forum data collected for this study covers an 8 year period from the beginning of 2010 to the end of 2017. To create user subforum activity transition graph sequences, we defined user activities as being when they posted new topics or replied to existing topics and the time window was set as one month. After removing common words and stop words, we extracted the 100 top frequency keywords from the forum content to construct the feature vectors for the subforums. We randomly selected 70% of users who provided their health stage history for training, another 10% for validation, and the remaining 20% for testing. The predicted health stage sequences were validated against the real health stage history extracted from the users’ signatures. The vocabulary of the health stages consists of ‘Dx’^{1}^{1}1Short for Oncotype DX test, an initial diagnosis that analyzes how a cancer is likely to behave and respond to treatment., ‘Chemotherapy’, ‘Targeted’, ‘Hormonal’, ‘Radiation’, ‘Surgery’.
VA1 Evaluation Metrics
We used BLEU scores [18] and ROUGE1 score [15]
as evaluation metrics for determining the closeness of the model predicted health stage history and the ground truth.
VA2 Comparison Methods
NMT(seq2seq)
The Neural Machine Translation model
[16] is a widely used stateoftheart sequencetosequence model for machine transition tasks. Since the NMT model can only handle simple sequence inputs, we simplified the input data by concatenating the transition sequences of user activity for each month together in time order. The subforum features are omitted in such formulations.Graph2seq The Graph2seq model [29] is a generalpurpose encoderdecoder model for static graph to sequence learning. Since the model cannot handle dynamic graphs, we simplified the input by aggregating all the edges that appeared in the dynamic graph together into a single static graph.
VA3 Hyperparameter Settings
We used the Adam optimizer[12] with a learning rate of 0.001 and a batch size of 50 for model training; greedy search was used for sequence decoders. Hyperparameters were searched based on the highest scores achieved on the validation set.
VB Performance
Table I shows the model performance of the baseline and proposed models. The scores were obtained from 20 individual runs and presented in a mean standard deviation (SD) format. In general, our proposed DynGraph2Seq framework significantly outperformed both the Seq2Seq and Graph2Seq baselines for the various model settings and evaluation metrics. The basic DynGraph2Seq framework with the proposed dynamic graph hierarchical attention achieved the best score on all the metrics, outperforming the baseline models by 7%  17% on the BLEU scores and 6%  13% on the ROUGE scores. The baseline Graph2Seq model also achieved good scores, but was not as competitive as our proposed model. This was largely because Graph2Seq model failed to capture the dynamic characteristics of user activity with only static graph inputs. The Seq2Seq model performed badly due to its inability to model the complex relationships between the subforums with simple sequence inputs.
VC Interpretablity Analysis
Figure 5 shows an example of the learned dynamic graph hierarchical attention by DynGraph2Seq. The left part of the figure shows the graphtosequence attention learned by the model, where each column is a grayscale heatmap representing the amount of attention being paid to each snapshot graph when the model predicted a specific health stage. The darker the color, the greater the attention being paid. We can see much attention was paid to the graphs around the months being labeled in the figure. The graphs for each labeled months are shown on the right. Interestingly, the graphs in the first two months attracted more attention from the model because those were the months when the patient first became active in the breast cancer online forum. The last two labeled snapshot graphs relate approximately to the time when the user engaged in extensive activities in a wide variety of subforums.
To understand why these particular snapshot graphs were important , we went one step deeper by examining the nodetograph level attention. The red spots on the nodes shown on the right side of Figure 5 represent the amount of attention being paid to each node (i.e. subforum). Again the darker the red spot, the greater the attention being paid. Now the attention becomes even more interesting and interpretable. For example, when constructing the representation of the May2012 snapshot graph, Subforum was assigned the most attention. The title of it is actually “Radiation Therapy  Before, During and After”, which is strongly correlated to the health stage ‘Radiation’. Likewise, we further discovered that Subforum , entitled “Not Diagnosed but Worried”, has a strong correlation with ‘Dx’ and Subforum , entitled “DCIS (Ductal Carcinoma In Situ)”, is a strong indicator for ‘Surgery’. These observed correspondences confirm that the proposed dynamic graph hierarchical attention mechanism greatly enhances the interpretability of the model.
Vi Conclusion
In this paper, we formulated the task of health stage inference using online health forum data as a dynamic graphtosequence learning problem and propose a novel DynGraph2Seq architecture that can handle this new type of learning problem effectively. Our DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder to learn the mapping between a sequence of timeevolving user activity graphs and a sequence of target health stages. In addition, we developed a dynamic graph hierarchical attention to facilitate the multilevel interpretability. Our comprehensive experiments and analyses for health stage prediction demonstrate both the effectiveness and the interpretability of the proposed models.
Acknowledgement
This work was supported by the National Science Foundation grant: #1755850, #1841520, #1907805, Jeffress Trust Award, and NVIDIA GPU Grant.
References
 [1] American cancer society. Note: http://www.cancer.org Cited by: §I.
 [2] (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §IIC, §IID.
 [3] Breast cancer community. Note: https://community.breastcancer.org/ Cited by: §I, §VA.
 [4] (2014) Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §IID, §IIIB.
 [5] EHealth forum. Note: http://ehealthforum.com Cited by: §I.
 [6] (2014) Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions. In AMIA Annual Symposium Proceedings, Vol. 2014, pp. 516. Cited by: §I, §IIA.
 [7] (2016) Treetosequence attentional neural machine translation. arXiv preprint arXiv:1603.06075. Cited by: §IID.
 [8] (2018) Dyngraph2vec: capturing network dynamics using dynamic graph representation learning. arXiv preprint arXiv:1809.02657. Cited by: §IIB.
 [9] (2018) Deep graph translation. arXiv preprint arXiv:1805.09980. Cited by: §IID.
 [10] (1997) Long shortterm memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §IVA.

[11]
(2010)
Cancer stage prediction based on patient online discourse.
In
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
, pp. 64–71. Cited by: §I, §IIA.  [12] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §VA3.
 [13] (2017) The fundamental advantages of temporal networks. Science 358 (6366), pp. 1042–1046. Cited by: §IIIA.
 [14] (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §IID.
 [15] (2004) Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out. Cited by: §VA1.
 [16] (2017) Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt. Cited by: §IVB, §VA2.
 [17] (2015) Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025. Cited by: §IIC.
 [18] (2002) BLEU: a method for automatic evaluation of machine translation. In ACL 2002, pp. 311–318. Cited by: §VA1.
 [19] (2015) Feedforward networks with attention can solve some longterm memory problems. arXiv preprint arXiv:1512.08756. Cited by: §IVB1.

[20]
(2018)
GraphVAE: towards generation of small graphs using variational autoencoders
. arXiv preprint arXiv:1802.03480. Cited by: §IID.  [21] (2014) Sequence to sequence learning with neural networks. In NIPS 2014, pp. 3104–3112. Cited by: §IIIB.
 [22] (2015) Improved semantic representations from treestructured long shortterm memory networks. arXiv preprint arXiv:1503.00075. Cited by: §IID.
 [23] (2018) Representation learning over dynamic graphs. arXiv preprint arXiv:1803.04051. Cited by: §IIB.
 [24] (2017) Graph attention networks. arXiv preprint arXiv:1710.10903 1 (2). Cited by: §IIC.
 [25] (2018) Multiinstance domain adaptation for vaccine adverse event detection. In WWW 2018, pp. 97–106. Cited by: §IIA.
 [26] (2018) Multiple structureview learning for graph classification. IEEE Transactions on Neural Networks and Learning Systems 29 (7), pp. 3236–3251. Cited by: §IIB, §IVA, §IVB1.
 [27] (2019) Scalable global alignment graph kernel using random features: from node embedding to graph embedding. In KDD 2019, pp. 1418–1428. Cited by: §IIB.

[28]
(2018)
SQLtotext generation with graphtosequence model
. arXiv preprint arXiv:1809.05255. Cited by: §IIB.  [29] (2018) Graph2Seq: graph to sequence learning with attentionbased neural networks. arXiv preprint arXiv:1804.00823. Cited by: §IID, §IIIB, §IVA, §IVA, §IVB1, §VA2.
 [30] (2018) Exploiting rich syntactic information for semantic parsing with graphtosequence model. arXiv preprint arXiv:1808.07624. Cited by: §IIB.
 [31] (2018) Graph rcnn for scene graph generation. arXiv preprint arXiv:1808.00191 2. Cited by: §IIC.
 [32] (2016) Hierarchical attention networks for document classification. In NAACL, pp. 1480–1489. Cited by: §IIC.
 [33] (2014) Does sustained participation in an online health community affect sentiment?. In AMIA Annual Symposium Proceedings, Vol. 2014, pp. 1970. Cited by: §I, §IIA.

[34]
(2017)
Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks
. Journal of biomedical informatics 69, pp. 1–9. Cited by: §I, §IIA.  [35] (2016) Hierarchical incomplete multisource feature learning for spatiotemporal event forecasting. In KDD 2016, pp. 2085–2094. Cited by: §IIA.
 [36] (2018) Dynamic network embedding by modeling triadic closure process.. In AAAI, Cited by: §IIB.
Comments
There are no comments yet.