Online healthcare forums and communities [3, 1, 5] such as the Breast Cancer Community have greatly changed the way patients seek health-related information and have become an important part of patients’ lives. The communications and interactions between patients in online forums can provide valuable information about a patient’s emotional well-being and behaviors related to the management of their health that conventional clinical data collected from hospital information systems and electronic health records (EHR) is unable to capture. The synergies between the information on patients’ online communication and health status make possible a unique and wide range of research topics on health informatics [6, 33, 34] that rely on both patients’ interactions in online forums as well as their health stage records.
However, the health stage information in the online health community has some unique challenges and characteristics. First, though some patients share their disease history, as shown in Figure 1, such information is not provided or is simply missing for many others. For instance, over 36% active users that registered within recent 2 years have not yet shared their disease history in the Breast Cancer Community. Second, different subforums under specific topics are often correlated to specific disease stages. For example, in the online breast cancer forum, the patients who are active in the “Chemotherapy - Before, During, and After” subforum typically look for information related to their Chemotherapy treatment. Third, as the patients’ health conditions progress over time, they often move from one set of subforums to others that are more related to their new health stages. Therefore, for each patient, these transitions among subforums can lead to an inter-connected subforum activity network that evolves over time, which could be highly entangled with the progress of patient’s health status, as shown in Figure 2.
The ability to accurately infer users’ missing health stage information is crucial, as this could enable health care organizations to better support patients by pinpointing the most valuable information for each at their particular health stage . To infer the missing user health stage information, the correspondence between the users’ forum activities and their health stage history needs to be accurately identified and modeled. Naturally, the networked and time-evolving forum activity data can be formulated as a dynamic sequence of user activity transition graphs that change over time. In addition, the target user health stage history can be formulated as a sequence that needs to be inferred. Thus, without loss of generality, a new generic task is presented here where the goal is to learn the mapping from a sequence of graph-structured data to a target sequence. In this paper, we limit our scope to the domain of online health forums and focus on health stage sequence prediction based on online health forums data.
However, capturing the high-level mapping between the evolution of the user activity networks and the changes in the corresponding user’s health stage can be very difficult due to the following challenges: 1) Difficulty in modeling the forum data, which is dynamic, networked, and multi-attributed. A user’s activities in the various subforums can change dynamically over time and these activity transitions naturally bridge different subforums. 2) Difficulty in learning the association between a sequence of user activity networks and the corresponding sequence of health stages. The sequence of user activity networks contains complicated graph-structured information that dynamically evolves over time. Developing end-to-end learning between such dynamic complex data and a specific sequence is highly difficult. 3) Lack of interpretability of the health stage sequence inference process. The sequence of user activity networks has a two-level hierarchical structure, namely node (i.e., subforum) to network level, and network to health stage level. It is thus a major objective to incorporate this hierarchical structural information into the development of an interpretable health stage inference process.
In this paper, we formally define the generic learning problem of health stage sequence inference using online forum data and propose the first framework to address the aforementioned challenges effectively. The contribution of this paper is four-fold: 1) we define the health stage inference problem in online health forums and formulate the user activities as transition graphs that are capable of modeling user dynamic transitions between subforums and their complex relationships; 2) we propose a novel deep neural encoder-decoder framework for learning the mapping between complex dynamic graph sequence inputs and the target output sequence; 3) we propose a new dynamic graph hierarchical attention mechanism that captures both the time-level and node-level attention, thus providing model transparency throughout the whole inference process; 4) experiments on online health forum dataset demonstrate that our proposed models outperform conventional sequence inference methods. In addition, our qualitative analyses and case studies provide interpretable insights into the learning results of the proposed model and its variations.
Ii Related work
Our model draws inspiration from the research fields of online health community analysis, dynamic graph learning, attention mechanisms, and neural encoder-decoder models.
Ii-a Online Health Communities Analysis
A number of studies have focused on the analysis and utilization of online health communities data. Popular social media is good for aggregate level pattern mining tasks [35, 25]. However, their power is limited for discovering individual-level health stages and health network patterns due to the privacy issues involved and data scarcity. There have been several analyses of breast cancer forum data [6, 33]
and, more recently, machine learning models have been used for longitudinal analysis and some binary classification tasks. However, we are the first to propose a general framework that can achieve health stage sequence inference using online forum data.
Ii-B Dynamic Graph Representation Learning
As an emerging topic in the graph representation learning domain, dynamic graph learning has attracted a great deal of attention from researchers in recent years [23, 36, 8]. However, these graph embedding techniques typically focus on learning representations of the graphs, such as node embedding, but in many real-world applications the aim is to learn some high-level knowledge from the graph data, such as graph classification tasks [26, 27] and graph to sequence tasks [28, 30]. An end-to-end learning model is thus needed to learn the mapping between the whole sequence of graph data and the target output sequence, instead of merely focusing on learning node representations.
Ii-C Attention Mechanism
The attention mechanism was first proposed by  and has been widely used for machine translation tasks [17, 32]. The attention mechanism has also been introduced in the graph representation learning domain [24, 31]. However, there is little to no work that focuses specifically on studying the unique hierarchical structure that is naturally present in dynamic graphs.
Ii-D Neural Encoder-Decoder Models
The neural encoder-decoder models [4, 2] have been widely extended to model the mapping of general object inputs to their corresponding sequences [7, 22]. Recent advances in graph deep learning and graph convolutional networks have enabled various graph deep learning models to handle challenges in the domains of graph generation [9, 20, 14] and graph-to-sequence learning . However, there have been no reports of work that explores dynamic graph to sequence learning, where the natural sequential order contained in a dynamic graph and its sequences might be advantageous for neural encoder-decoder models.
Iii Problem Formulation
Iii-a User Forum Activities as a Dynamic Graph
An activity transition network is formulated naturally as follows. User activities are first partitioned into a series of time windows. We then begin by formulating a node for each subforum, with a transition from one forum to the other deemed to occur if the most active forum (based on visiting time or number of postings) switches from the former to the latter, creating a directed ‘edge’ between them. Each node (i.e., subforum) also records the user activity in the forum to build the activity transition network. Naturally, such time-ordered activity transition networks can be formally defined as dynamic graphs, also known as temporal networks in the network science literature , that capture the complex dynamic characteristics and time-evolving features of graphs, as defined in the following.
(dynamic graph). A dynamic graph is an ordered sequence of separate graphs on the same set of nodes, with each snapshot graph characterized by a weighted adjacency matrix and a set of node features for a given time window, where represents the total number of node features.
We can now formulate the activity transition networks as a dynamic graph, illustrated in Figure 2. Here, the dynamic graph contains a sequence of snapshot graphs that characterize user activities in the online forum for a given time period, where represents the snapshot graph for simplicity. Each node represents a subforum devoted to a specific topic and the edges capture the user’s movement between different subforums at a given time window shown as blue boxes. Each node contains a set of features that represents the topics covered by the specific subforum. By formulating user activities as dynamic graphs, the mapping between the evolution of the user activity and the changes of user’s health stages will be preserved.
Iii-B Learning Sequence from Dynamic Graph
As we can see from Figure 2, there is a clear mapping between the evolution of the user activity dynamic graph and changes in the corresponding user’s health stage. Motivated by this observation, we can formulate such problems as a general dynamic graph to sequence problem as follows:
Given a dynamic graph as input data, the goal is to predict the target sequence , where is the th token of the output sequence in vocabulary ; and and are the input graph sequence length and output sequence length, respectively. Formally, this problem is equivalent to learning a translation mapping from input dynamic graph to a sequence as .
The translation mapping problem between some source objects and target sequences has been widely studied, including both graph-to-sequence  and sequence-to-sequence [21, 4] formulations. However, dynamic-graph-to-sequence translation is more complex and poses several unique challenges, namely 1) Difficulty in comprehensively modeling the dynamic multi-attributed network-structured data, as both complex relationships and dynamic evolving characteristics need to be captured; 2) The temporal dependency of snapshot graphs in the dynamic graph need to be modeled and constrained by the learning model; and 3) The learned translation mapping is often obscure and hard to explain or verify. This is because the original low-level representation (i.e. the node level at a specific time) is aggregated into the high-level representation (i.e. the dynamic graph as a whole), making it much more difficult to backtrack and explain the correspondence.
Iv Dynamic Graph-To-Sequence Model
Iv-a Dynamic Graph Encoder
The base model of our graph convolutional network for each snapshot graph is inspired by graph2seq , which was originally proposed for addressing static graph-to-sequence learning problems. The Graph2Seq model employs an inductive node embedding algorithm that generates bi-directional node embeddings by aggregating information from a node local forward and backward neighborhood within hops for a static graph. We extend this idea for dynamic graphs by applying such graph convolution on each snapshot graph within dynamic graph inputs. Specifically, suppose the total number of hops is
, then the hidden representation of-th node in the snapshot graph after applying the first graph convolutional layer will be computed as follows:
where represents the set of forward neighbor nodes of node , whereas represents the set of backward neighbor nodes; and are learnable parameters for the first convolution layer.
is the feature vector of nodein a snapshot graph at time step ; function takes the element-wise mean of the set of vectors in the equation; and concatenates the two row vectors into a single row vector.
Likewise, for hop , the hidden representation of the -th node in the snapshot graph can be computed via the hidden representations computed from layer . Finally, after applying layers of convolutions, the final hidden representation of the -th node in the snapshot graph will be output as .
In order to capture the high-level representation of graphs for end-to-end graph learning, aggregating node level embeddings to graph level embedding that conveys the entire graph information is essential. To achieve this, we adopt the max pooling operation proposed by[29, 26] as the base aggregation function, which feeds the node embeddings to a fully-connected layer and then applies the max pooling method element-wise for each snapshot graph to yield a sequence of graph-level representations . To model the graph dynamic changes and long-term dependencies throughout the
steps, we utilize Long Short Term Memory (LSTM) networks as a graph embedding sequence encoder to learn the entire dynamic graph-level embedding.
Iv-B Sequence Decoder with Dynamic Graph Hierarchical Attention
Once the dynamic graph encoder takes the sequence of snapshot graphs and aggregates node embeddings to generate a sequence of graph-level embeddings that capture the entire dynamic graph’s global characteristics, the LSTM layer will output the final hidden-state of encoder to summarize all the graph-level embeddings. Then, in the sequence decoding phase, we utilize a conventional sequence decoder  and set the initial cell state of the decoder as in order to decode the target sequence .
However, there are two issues with this simple sequence decoder: 1) the effectiveness of the sequence decoder depends on the length of the dynamic graph sequence; and 2) the predicted user’s health stage sequence need to be interpretable based on the dynamic graph sequence at both the time-level and node-level . To handle the above questions pertaining to model interpretability, we propose a novel dynamic graph hierarchical attention mechanism that includes node-to-graph and graph-to-sequence attention that is capable of enhancing the interpretability for node embedding aggregation and capture the hierarchical structure of user online forum activities over time more effectively.
Iv-B1 Node-to-Graph Attention
Once the node embeddings of a graph have been computed, an average or max pooling operation [29, 26] is typically employed as the base aggregation function to obtain the graph-level embedding for the current graph. Although this works well in their individual settings, it does not work properly in our case since not all node embeddings contribute equally to the representation of the graph. For example, although a patient may view multiple subforums within a given time period, only a few important subforums will be correlated with the specific health stage of the patient. Therefore, it is vital to identify these important nodes (subforums) that contribute most to representing the embedding of the current graph. Inspired by , we adopt the feed-forward attention to aggregate the node embeddings and formulate the graph-level embeddings. Figure 4 shows an example of how the node-to-graph attention is computed for a snapshot graph . For a given snapshot graph at step , the node-to-graph attention is given as follows:
where the function is a learnable function that depends on the node embeddings ; and denotes the aggregated graph-level embedding for a snapshot graph at step . In this formulation, the attention weights explicitly model the importance of each node when constructing the graph-level representation of . Clearly, we can utilize the attention weight information for each node to pinpoint which nodes (subforums) are highly related to the current health stage. We will discuss the interpretability of our node-to-graph attention in detail in the experimental Section.
Iv-B2 Graph-to-Sequence Attention
Once the graph-level embedding has been obtained for each snapshot graph , the whole sequence of graph embeddings is fed into the sequence decoder, which generates the global hidden embedding that characterizes the entire sequence of dynamic graph information. Following the conventional encoder-decoder setup, is set as the initial hidden state for the sequence decoder from which to generate the target sequence of the health stages.
Although the hidden vector theoretically contains all the information needed for generating the target sequence, the encoder’s hidden representation also contains valuable information about the snapshot graph information at that time step during the sequence encoding. To reward such snapshot graphs, we use the attention mechanism and introduce graph-to-sequence level attention to measure the importance of each snapshot graph with the target sequence. Specifically, as shown in Figure 4, the graph-to-sequence attention takes the sequence of hidden states for each graph in the dynamic graph sequence as additional inputs to the decoder. This forces the decoder to consider both the current hidden state and the attention alignments between each word generated and for the whole sequence .
We evaluated the performance of our proposed model utilizing a real-world online health forum, namely the breast cancer community. All the experiments were conducted on a 64-bit machine with Intel(R) Xeon(R) W-2155 CPU 3.30GHz processor, 32GB memory and an NVIDIA TITAN Xp GPU.
|NMT(seq2seq) (w/o att)||55.52.38||38.40.91||27.10.90||19.20.87||71.61.04|
|NMT(seq2seq) (w/ att)||57.81.86||40.41.21||29.01.28||20.11.06||72.90.86|
|Graph2Seq (w/o att)||57.51.72||41.50.94||29.80.72||20.30.85||75.81.20|
|Graph2Seq (w/ att)||58.22.19||41.11.38||30.10.83||21.00.51||76.20.96|
|DynGraph2Seq (w/o att)||60.91.53||43.71.00||31.50.63||22.10.48||79.30.80|
|DynGraph2Seq (w/ att)||62.31.46||44.71.29||32.00.94||22.51.13||80.80.36|
V-a Experimental Settings
Online Breast Cancer Community Dataset: The Breast Cancer Community  is one of the largest online forums designed for patients to share information related to breast cancer. The forum data collected for this study covers an 8 year period from the beginning of 2010 to the end of 2017. To create user subforum activity transition graph sequences, we defined user activities as being when they posted new topics or replied to existing topics and the time window was set as one month. After removing common words and stop words, we extracted the 100 top frequency keywords from the forum content to construct the feature vectors for the subforums. We randomly selected 70% of users who provided their health stage history for training, another 10% for validation, and the remaining 20% for testing. The predicted health stage sequences were validated against the real health stage history extracted from the users’ signatures. The vocabulary of the health stages consists of ‘Dx’111Short for Oncotype DX test, an initial diagnosis that analyzes how a cancer is likely to behave and respond to treatment., ‘Chemotherapy’, ‘Targeted’, ‘Hormonal’, ‘Radiation’, ‘Surgery’.
V-A1 Evaluation Metrics
V-A2 Comparison Methods
The Neural Machine Translation model is a widely used state-of-the-art sequence-to-sequence model for machine transition tasks. Since the NMT model can only handle simple sequence inputs, we simplified the input data by concatenating the transition sequences of user activity for each month together in time order. The subforum features are omitted in such formulations.
Graph2seq The Graph2seq model  is a general-purpose encoder-decoder model for static graph to sequence learning. Since the model cannot handle dynamic graphs, we simplified the input by aggregating all the edges that appeared in the dynamic graph together into a single static graph.
V-A3 Hyper-parameter Settings
We used the Adam optimizer with a learning rate of 0.001 and a batch size of 50 for model training; greedy search was used for sequence decoders. Hyper-parameters were searched based on the highest scores achieved on the validation set.
Table I shows the model performance of the baseline and proposed models. The scores were obtained from 20 individual runs and presented in a mean standard deviation (SD) format. In general, our proposed DynGraph2Seq framework significantly outperformed both the Seq2Seq and Graph2Seq baselines for the various model settings and evaluation metrics. The basic DynGraph2Seq framework with the proposed dynamic graph hierarchical attention achieved the best score on all the metrics, outperforming the baseline models by 7% - 17% on the BLEU scores and 6% - 13% on the ROUGE scores. The baseline Graph2Seq model also achieved good scores, but was not as competitive as our proposed model. This was largely because Graph2Seq model failed to capture the dynamic characteristics of user activity with only static graph inputs. The Seq2Seq model performed badly due to its inability to model the complex relationships between the subforums with simple sequence inputs.
V-C Interpretablity Analysis
Figure 5 shows an example of the learned dynamic graph hierarchical attention by DynGraph2Seq. The left part of the figure shows the graph-to-sequence attention learned by the model, where each column is a grayscale heatmap representing the amount of attention being paid to each snapshot graph when the model predicted a specific health stage. The darker the color, the greater the attention being paid. We can see much attention was paid to the graphs around the months being labeled in the figure. The graphs for each labeled months are shown on the right. Interestingly, the graphs in the first two months attracted more attention from the model because those were the months when the patient first became active in the breast cancer online forum. The last two labeled snapshot graphs relate approximately to the time when the user engaged in extensive activities in a wide variety of subforums.
To understand why these particular snapshot graphs were important , we went one step deeper by examining the node-to-graph level attention. The red spots on the nodes shown on the right side of Figure 5 represent the amount of attention being paid to each node (i.e. subforum). Again the darker the red spot, the greater the attention being paid. Now the attention becomes even more interesting and interpretable. For example, when constructing the representation of the May-2012 snapshot graph, Subforum was assigned the most attention. The title of it is actually “Radiation Therapy - Before, During and After”, which is strongly correlated to the health stage ‘Radiation’. Likewise, we further discovered that Subforum , entitled “Not Diagnosed but Worried”, has a strong correlation with ‘Dx’ and Subforum , entitled “DCIS (Ductal Carcinoma In Situ)”, is a strong indicator for ‘Surgery’. These observed correspondences confirm that the proposed dynamic graph hierarchical attention mechanism greatly enhances the interpretability of the model.
In this paper, we formulated the task of health stage inference using online health forum data as a dynamic graph-to-sequence learning problem and propose a novel DynGraph2Seq architecture that can handle this new type of learning problem effectively. Our DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder to learn the mapping between a sequence of time-evolving user activity graphs and a sequence of target health stages. In addition, we developed a dynamic graph hierarchical attention to facilitate the multi-level interpretability. Our comprehensive experiments and analyses for health stage prediction demonstrate both the effectiveness and the interpretability of the proposed models.
This work was supported by the National Science Foundation grant: #1755850, #1841520, #1907805, Jeffress Trust Award, and NVIDIA GPU Grant.
-  American cancer society. Note: http://www.cancer.org Cited by: §I.
-  (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §II-C, §II-D.
-  Breast cancer community. Note: https://community.breastcancer.org/ Cited by: §I, §V-A.
-  (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §II-D, §III-B.
-  EHealth forum. Note: http://ehealthforum.com Cited by: §I.
-  (2014) Characterizing the sublanguage of online breast cancer forums for medications, symptoms, and emotions. In AMIA Annual Symposium Proceedings, Vol. 2014, pp. 516. Cited by: §I, §II-A.
-  (2016) Tree-to-sequence attentional neural machine translation. arXiv preprint arXiv:1603.06075. Cited by: §II-D.
-  (2018) Dyngraph2vec: capturing network dynamics using dynamic graph representation learning. arXiv preprint arXiv:1809.02657. Cited by: §II-B.
-  (2018) Deep graph translation. arXiv preprint arXiv:1805.09980. Cited by: §II-D.
-  (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §IV-A.
Cancer stage prediction based on patient online discourse.
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 64–71. Cited by: §I, §II-A.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §V-A3.
-  (2017) The fundamental advantages of temporal networks. Science 358 (6366), pp. 1042–1046. Cited by: §III-A.
-  (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §II-D.
-  (2004) Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out. Cited by: §V-A1.
-  (2017) Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt. Cited by: §IV-B, §V-A2.
-  (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025. Cited by: §II-C.
-  (2002) BLEU: a method for automatic evaluation of machine translation. In ACL 2002, pp. 311–318. Cited by: §V-A1.
-  (2015) Feed-forward networks with attention can solve some long-term memory problems. arXiv preprint arXiv:1512.08756. Cited by: §IV-B1.
GraphVAE: towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480. Cited by: §II-D.
-  (2014) Sequence to sequence learning with neural networks. In NIPS 2014, pp. 3104–3112. Cited by: §III-B.
-  (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075. Cited by: §II-D.
-  (2018) Representation learning over dynamic graphs. arXiv preprint arXiv:1803.04051. Cited by: §II-B.
-  (2017) Graph attention networks. arXiv preprint arXiv:1710.10903 1 (2). Cited by: §II-C.
-  (2018) Multi-instance domain adaptation for vaccine adverse event detection. In WWW 2018, pp. 97–106. Cited by: §II-A.
-  (2018) Multiple structure-view learning for graph classification. IEEE Transactions on Neural Networks and Learning Systems 29 (7), pp. 3236–3251. Cited by: §II-B, §IV-A, §IV-B1.
-  (2019) Scalable global alignment graph kernel using random features: from node embedding to graph embedding. In KDD 2019, pp. 1418–1428. Cited by: §II-B.
SQL-to-text generation with graph-to-sequence model. arXiv preprint arXiv:1809.05255. Cited by: §II-B.
-  (2018) Graph2Seq: graph to sequence learning with attention-based neural networks. arXiv preprint arXiv:1804.00823. Cited by: §II-D, §III-B, §IV-A, §IV-A, §IV-B1, §V-A2.
-  (2018) Exploiting rich syntactic information for semantic parsing with graph-to-sequence model. arXiv preprint arXiv:1808.07624. Cited by: §II-B.
-  (2018) Graph r-cnn for scene graph generation. arXiv preprint arXiv:1808.00191 2. Cited by: §II-C.
-  (2016) Hierarchical attention networks for document classification. In NAACL, pp. 1480–1489. Cited by: §II-C.
-  (2014) Does sustained participation in an online health community affect sentiment?. In AMIA Annual Symposium Proceedings, Vol. 2014, pp. 1970. Cited by: §I, §II-A.
Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. Journal of biomedical informatics 69, pp. 1–9. Cited by: §I, §II-A.
-  (2016) Hierarchical incomplete multi-source feature learning for spatiotemporal event forecasting. In KDD 2016, pp. 2085–2094. Cited by: §II-A.
-  (2018) Dynamic network embedding by modeling triadic closure process.. In AAAI, Cited by: §II-B.