1. Introduction
Recurrent Neural Networks (RNNs) have been proven to be powerful in learning a reusable parameters that produce hidden representations of sequences. They have been successfully applied to model sequential data and achieve stateoftheart performance in numerous domains such as speech recognition
(Graves et al., 2013b; Miao et al., 2015; Soltau et al., 2016)(Kim et al., 2016; Bahdanau et al., 2014; Mikolov et al., 2010a, 2011), healthcare (Jagannatha and Yu, 2016; Che et al., 2018; Luo, 2017), recommendations (Zhou et al., 2018; Hidasi et al., 2015; Wu et al., 2016) and information retrieval (Palangi et al., 2016).The majority of existing RNN models have been designed for traditional sequences, which are assumed to be identically, independently distributed (i.i.d.). However, many realworld applications generate linked sequences. For example, web documents, sequences of words, are connected via hyperlinks; genes, sequences of DNA or RNA, typically interact with each other. Figure 1 illustrates one toy example of linked sequences where there are four sequences – , , and . These four sequences are linked via three links – is connected with and and is linked with and . On the one hand, these linked sequences are inherently related. For example, linked web documents are likely to be similar (Glover et al., 2002) and interacted genes tend to share similar functionalities (Bebek, 2012). Hence, linked sequences are not i.i.d., which presents immense challenges to traditional RNNs. On the other hand, linked sequences offer additional link information in addition to the sequential information. It is evident that link information can be exploited to boost various analytical tasks such as social recommendations (Tang et al., 2013)
(Wang et al., 2011; Hu et al., 2013)(Tang and Liu, 2012). Thus, the availability of link information in linked sequences has the great potential to enable us to develop advanced Recurrent Neural Networks.Now we have established that – (1) traditional RNNs are insufficient and dedicated efforts are needed for linked sequences; and (2) the availability of link information in linked sequences offer unprecedented opportunities to advance traditional RNNs. In this paper, we study the problem of modeling linked sequences via RNNs. In particular, we aim to address the following challenges – (1) how to capture link information mathematically and (2) how to combine sequential and link information via Recurrent Neural Networks. To address these two challenges, we propose a novel Linked Recurrent Neural Network (LinkedRNN) for linked sequences. Our major contributions are summarized as follows:

We introduce a principled way to capture link information for linked sequence mathematically;

We propose a novel RNN framework LinkedRNN, which can model sequential and link information coherently for linked sequences; and

We validate the effectiveness of the proposed framework on realworld datasets across different domains.
The rest of the paper is organized as follows. Section 2 gives a formal definition of the problem we aim to investigate. In Section 3, we motivate and detail the framework LinkedRNN. The experiment design, results and the datasets are described in Section 4. Section 5 briefly reviews the related work in literature. Finally, we conclude our work and discuss the future work in Section 6.
2. Problem Statement
Before we give a formal definition of the problem, we want firstly give notations that will be used throughout the paper. We denote scalars by lowercase letters such as and
, vectors are denoted by bold lowercase letters such as
and , and matrices are represented by bold upper case letters such as and . For a matrix , we denote the entry at the row and column of it as , the row as and column as . In addition, let represent set where the order of the elements does not matter and the superscripts are used to denote indexes of elements such as , which is equivalent to . In contrast, is used to denote a set of sequential events where the order matters and we use subscripts to indicate the order indexes of the events in sequences such as .Let be the set of sequences. For linked sequences, two types of information are available. One is the sequential information for each sequence. We denote the sequential information of as where is the length of . The other is the link information. We use an adjacent matrix to denote the link information of linked sequences where if there is a link between the sequence and and , otherwise. In this work, we following the transductive learning setting. In detail, we assume that a part of the sequences from to are labeled where . We denote the labeled sequences as . For a sequence , we use to denote its label where is a continuous number for the regression problem and is one symbol for the classification problem. Note that in this work, we focus on the unweighted and undirected links among sequences. However, it is straightforward to extend the proposed framework for weighted and directed links. We would like to leave it as one future work. Although the proposed framework is designed for transductive learning, we also can use it for inductive learning, which will be discussed when we introduce the proposed framework in the following section.
With the above notations and definitions, we formally define the problem we target in this work as follows:
Given a set of sequences with sequential information and link information , and a subset of labeled sequences , we aim to build a RNN model by leveraging , and , which can learn representations for sequences to predict the labels of the unlabeled sequences in .
3. The proposed framework
In addition to sequential information, link information is available for linked sequences as shown in Figure 1. As aforementioned, the major challenges to model linked sequences are how to capture link information and how to combine sequential and link information coherently. To tackle these two challenges, we propose a novel Recurrent Neural Networks LinkedRNN. An illustrate of the proposed framework on the toy example of Figure 1 is demonstrated in Figure 2. It mainly consists of two layers. The RNN layer is to capture the sequential information. The output of the RNN layer is the input of the link layer where link information is captured. Next, we first detail each layer and then present the overall framework of LinkedRNN.
3.1. Capturing sequential information
Given a sequence
, the RNN layer aims to learn a representation vector that can capture its complex sequential patterns via Recurrent Neural Networks. In deep learning community, Recurrent Neural Networks (RNNs)
(Rumelhart et al., 1986a; Mikolov et al., 2010b) have been very successful to capture sequential patterns in many fields(Mikolov et al., 2010b; Sutskever et al., 2011). Specifically, RNN consists of recurrent units that take the previous state and current event as input and output a current state containing the sequential information seen so far as:(1) 
Where and W are the learnable parameters and
is a activation function which enables the nonlinearity. However, one major limitation of the vanilla RNN in Equation
1 is that it suffers from gradients vanishing or exploding issues, which fail the learning procedure as it cannot capture the error signals during backpropagation process(Bengio et al., 1994).More advanced recurrent units such as long shortterm memory (LSTM) model
(Hochreiter and Schmidhuber, 1997a)and the Gated Recurrent Unit (GRU)
(Cho et al., 2014) have been proposed to solve the gradient vanishing problem. Different from vallina RNN, these variants employ gating mechanism to decide when and how much the state should be updated with the current information. In this work, due to its simplicity and effectiveness, we choose GRU as our RNN unit. Specifically, in the GRU, current stateis a linear interpolation between previous state
and a candidate state :(2) 
where is the elementwise multiplication and is called update gate which is introduced to control how much current state should be updated. It is obtained through the following equation:
(3) 
Where and are the parameters and
is the sigmoid function, that is,
. In addition, the newly introduced candidate state is computed by the Equation 4:(4) 
where is the tanh function that and and are model parameters. is the reset gate which determines the contribution of previous state to the candidate state and is obtained as follows:
(5) 
The output of the RNN layer will be the input of the link layer. For a sequence , the RNN layer will learn a sequence of latent representations . There are various ways to obtain the final output of from . In this work, we investigate two popular ways:

As the last latent representation is able to capture information from previous states, we can just use it as the representation of the whole sequence. We denote this way of aggregation as . Specifically, we let .

The attention mechanism can help the model automatically focus on relevant parts of the sequence to better capture the longrange structure and it has shown effectiveness in many tasks (Bahdanau et al., 2016; Luong et al., 2015; Chorowski et al., 2015). Thus, we define our second way of aggregation based on the attention mechanism as follows:
(6) where is the attention score, which can be obtained as
(7) where is a feedforward layer:
(8) Note that different attention mechanisms can be used, we will leave it as one future work. We denote the aggregation way described above as .
For the general purpose, we will use RNN to denote GRU in the rest of the paper.
3.2. Capturing link information
The RNN layer is able to capture the sequential information. However, in linked sequences, sequences are naturally related. The Homophily theory suggests that linked entities tend to have similar attributes (McPherson et al., 2001), which have been validated in many realworld networks such as social networks (Krivitsky et al., 2009), web networks (Lin et al., 2006), and biological networks (Bebek, 2012). As indicated by Homophily, a node is likely to share similar attributes and properties with nodes with connections. In other words, a node is similar to its neighbors. With this intuition, we propose the link layer to capture link information in linked sequences.
As shown in Figure 2, to capture link information, for a node, the link layer not only includes information from its sequential information but also aggregates information from its neighbors. The link layer can contain multiple hidden layers. In other words, for one node, we can aggregate information from itself and its neighbors multiple times. Let be the hidden representations of the sequence after aggregations. Note that when , is the input of the link layer, i.e., . Then can be updated as:
(9) 
where is an elementwise activation function, is the set of neighbors who are linked with , i.e., , and is the number of neighbors of . We define as the matrix form of representations of all sequences at the th layer. We modify the original adjacency matrix by allowing . The aggregation in the Eq. (9) can be written in the matrix form as:
(10) 
where is the embedding matrix after step aggregation, and is the diagonal matrix where is defined as:
(11) 
3.3. Linked Recurrent Neural Networks
With the model components to capture sequential and link information, the procedure of the proposed framework LinkedRNN is presented below:
(12) 
where the input of the RNN layer is the sequential information and the RNN layer will produce the sequence of latent representations . The sequence of latent representations will be aggregated to obtain the output of the RNN layer, which serves as the input of the Link layer. After layers, link layer produces a sequence of latent representations , which will be aggregated to the final representation.
The final representation for the sequence is to aggregate the sequence from the link layer. In this work, we investigate several ways to obtain the final representation as:

As is the output of the last layer, we can define the final representation as: , and we denote this way as .

Although the new representation
incorporates all the neighbor information, the signal in the representation of itself may be overwhelmed during the aggregation process. This is especially likely to happen when there are a large number of neighbors. Thus, to make the new representation to focus more on itself, we propose to use a feed forward neural network to perform the combination. We concatenate representations from the last two layers as the input of the feed forward network. We refer this aggregation method as
. 
Each representation could contain its unique information, which cannot be carried in the later part. Thus, similarly, we use a feed forward neural network to perform the combination of . We refer this aggregation method as .
To learn the parameters of the proposed framework LinkedRNN, we need to define a loss function that depends on the specific task. In this work, we investigate LinkedRNN in two tasks – classification and regression.
Classification. The final output of a sequence is . We can consider
as features and build the classifier. In particular, the predicted class labels can be obtained through a softmax function as:
(13) 
where and are the coefficients and the bias parameters, respectively. is the predicted label of the sequence . The corresponding loss function used in this paper is the crossentropy loss.
Regression.
For the regression problem, we choose linear regression in this work. In other words, the regression label of the sequence
is predicted as:(14) 
where and are the regression coefficients and the bias parameters, respectively. Then square loss is adopted in this work as the loss function as:
(15) 
Note that there are other ways to define loss functions for classification and regression. We would like to leave the investigation of other formats of loss functions as one future work.
Prediction. For an unlabeled sequence
under the classification problem, its label is predicted as the one corresponding to the entity with the highest probability in
.For an unlabeled sequence under the regression problem, its label is predicted as .
Although the framework is designed for transductive learning, it can be naturally used for inductive learning. For a sequence , which is unseen in the given linked sequences , according to its sequential information and its neighbors , it is easy to obtain its representation via Eq. (12). Then based on , its label can be predicted as the normal prediction step described above.
4. Experiment
In this section, we present experimental details to verify the effectiveness of the proposed framework. Specifically, we validate the proposed framework on datasets from two different domains. Next, we firstly describe the datasets we used in the experiments and then compare the performance of the proposed framework with representative baselines. Lastly, we analyze the key components of LinkedRNN.
Description  DBLP  BOOHEE 

# of sequences  47,491  18,229 
Network density (‰)  0.13  0.012 
Avg length of sequences  6.6  23.5 
Max length of sequences  20  29 
4.1. Datasets
In this study, we collect two types of linked sequences. One is from DBLP where data contains textual sequences of papers. The other is from a weight loss website BOOHEE where data includes weight sequences of users. Some statistics of the datasets are demonstrated in Table 1. Next we introduce more details.
DBLP dataset. We constructed a paper citation network from the public available DBLP data set^{1}^{1}1https://aminer.org/citation.(Tang et al., 2008). This dataset contains information for millions of paper from a variety of research fields. Specifically, each paper contains the following relevant information: paper id, publication venue, the id references of it and abstract. Following the similar practice in (Tang et al., 2015), we only select papers from conferences in 10 largest computer science domains including VCG, ACL, IP, TC, WC, CCS, CVPR, PDS , NIPS, KDD, WWW, ICSE, Bioinformatics, TCS. We construct a sequence for each paper from their abstracts and regard their citation relationships as the link information between sequences. Specifically, we first split the abstract into sentences and tokenize each sentence using python NLTK package. Then, we use Word2Vec (Mikolov et al., 2013) to embed each word into Euclidean space and for each sentence, we treat the mean of its word vectors as the sentence embedding. Thus, the abstract of each paper can be represented by a sequence of sentence embeddings. We will conduct the classification task on this dataset, i.e., paper classification. Thus, the label of each sequence is the corresponding publication venue.
BOOHEE dataset. This dataset is collected from one of the most popular weight management mobile applications, BOOHEE ^{2}^{2}2https:www.boohee.com. It contains million of users who selftrack their weights and interact with each other in the internal social network provided by the application. Specifically, they can follow friends, make comment to friends’ post and mention (@) friends in comments or posts. The recored weights by users form sequences which contain the weight dynamic information and the social networking behaviors result in three networks that correspond to following, commenting, and mentioning interactions, respectively. Previous work (Wang et al., 2017) has shown a social correlation on the users’ weight loss. Thus, we use these social networks as the link information for the weight sequence data. We preprocess the dataset to filter out the sequences from suspicious spam users. Moreover, we change the time granularity of weight sequence from days to weeks to remove the daily fluctuation noise. Specifically, we compute the mean value of all the recorded weights in one week and use it as the weight for that week. For networks, we combine three networks into one by adding them together and filter out weak ties. In this dataset, we will conduct a regression task of weight prediction. We choose the most recent weight in a weight sequence as the weight we aim to predict (or the groundtruth of the regression problem). Note that for a user, we remove all social interactions that form after the most recent weight where we want to avoid the issue of using future link information for weight prediction.
Measurement  Method  Training ratio  

10 %  30 %  50%  70%  
MicroF1  node2vec  0.6641  0.6550  0.6688  0.6691 
GCN  0.7005  0.7093  0.7110  0.7180  
RNN  0.7686  0.7980  0.7978  0.8025  
RNNnode2vec  0.7940  0.8031  0.7933  0.8114  

RNNGCN  0.7912  0.8230  0.8255  0.8284 
LinkedRNN  0.8146  0.8399  0.8463  0.8531  
MacroF1 
node2vec  0.6514  0.6523  0.6513  0.6565 
GCN  0.6874  0.6992  0.7004  0.7095  

RNN  0.7452  0.7751  0.7754  0.7824 
RNN+node2vec  0.7734  0.7797  0.7702  0.7912  
RNN+GCN  0.7642  0.8014  0.8069  0.8104  
LinkedRNN  0.7970  0.8249  0.8331  0.8365 
Method  Training ratio  

10 %  30 %  50%  70%  
node2vec  8.8702  8.8517  7.4744  7.0390 
GCN  8.9347  8.6830  6.7949  6.7278 
RNN  8.6600  8.6048  7.0466  6.8033 
RNNnode2vec  8.4653  8.5944  7.0173  6.7796 
RNNGCN  8.6286  8.5662  6.9967  6.7945 
LinkedRnn  7.1822  6.3882  6.8416  6.3517 
4.2. Representative baselines
To validate the effectiveness of the proposed framework, we construct three groups of representative baselines. The first group includes the stateoftheart network embedding methods, i.e., node2vec (Grover and Leskovec, 2016) and GCN (Kipf and Welling, 2016), which only capture the link information. The second group is the GRU RNN model (Graves et al., 2013b), which is the basic model we used in our model to capture sequential information. Baselines in the third group is to combine models in the first and second groups, which captures both sequential and link information. Next, we present more details about these baselines.

Node2vec (Grover and Leskovec, 2016). Node2vec is one stateoftheart network embedding method. It learns the representation of sequences only capturing the link information in a randomwalk perspective.

GCN (Kipf and Welling, 2016) It is the traditional graph convolutional graph algorithm. It is trained with both link and label information. Hence, it is different from node2vec, which is learnt with only link information and is totally independent on the task.

RNN (Graves et al., 2013b). RNNs have been widely used for modeling sequential data and achieved great success in a variety of domains. However, they tend to ignore the correlation between sequences and only focus on sequential information. We construct this baseline to show the importance of correlation information. To make the comparison fair, we employ the same recurrent unit (GRU) in both the proposed framework and this baseline.

RNNnode2vec. The Node2vec method is able to learn representation from the link information and the RNN can do so from the sequential information. Thus, to obtain the representation of sequences that contains both link and sequential information, we concatenate the two sets of embeddings obtained from Node2vec and RNN via a feed forward neural network.

RNNGCN. RNNGCN applies a similar strategy of combining RNN and node2vec to combine RNN and GCN.
There are several notes about the baselines. First, node2vec does not use label information and it is unsupervised , RNN and RNNnode2vec utilize label information and they are supervised, and GCN and RNNGCN use both label information and unlabeled data and they are semisupervised. Second, some sequences may not have link information and baselines only capture link information cannot learn representations for these sequences; hence, in this work, when representations from link information are unavailable, we will use the representations from the sequential information via RNN instead. Third, we do not choose LSTM and its variants as baselines since our current model is based on GRU and we also can choose LSTM and its variants as the base models.
4.3. Experimental settings
Data split: For both datasets, we randomly select 30% for test. Then we fix the test set and choose of the remaining data for training and for validation to select parameters for baselines and the proposed framework. In this work, we vary as .
Parameter selection: In our experiments, we set the dimension of representation vectors of sequences to 100. For Node2vec, we use the validation data to select the best value for and from as suggested by the authors (Grover and Leskovec, 2016) and use the default values for the remaining parameters. In addition, the learning rate for all of the methods are selected through validation set.
Evaluation metrics: Since we will perform classification in the DBLP data, we use Micro and Macro F1 scores as the metrics for DBLP, which are widely used for classification problems (Yang et al., 2016; Grover and Leskovec, 2016). The higher value means better performance. We perform the regression problem weight prediction in the BOOHEE data. Therefore the performance in BOOHEE data is evaluated by mean squared error (MSE) score. The lower value of MSE indicates higher prediction performance.
4.4. Experimental Results
We first present the results in DBLP data. The results are shown in Table 2. For the proposed framework, we choose for the link layer and more details about discussions about the choices of its aggregation functions will be discussed in the following section. From the table, we make the following observations:

As we can see in Table 2, in most cases, the performance tends to improve as the number of training samples increases.

The random guess can obtain 0.1 for both microF1 and macroF1. We note that the network embedding methods perform much better than the random guess, which clearly shows that the link information is indeed helpful for the prediction.

GCN achieves much better performance than node2vec. As we mentioned before, GCN uses label information and the learnt representations are optimal for the given task. While node2vec learns representations independent on the given task, the representations may be not optimal.

The RNN approach has higher performance than GCN. Both of them use the label information. This observation suggests that the content and sequential information is very helpful.

Most of the time, RNNnode2vec and RNNGCN outperform the individual models. This observation indicates that both sequential and link information are important and they contain complementary information.

The proposed framework LinkedRNN consistently outperforms baselines. This strongly demonstrates the effectiveness of LinkedRNN. In addition, comparing to RNNnode2vec and RNNGCN, the proposed framework is able to jointly capture the sequential and link information coherently, which leads to significant performance gain.
We present the performance on BOOHEE in Table 3. Overall, we make similar observations as these on DBLP as – (1) the performance improves with the increase of number of training samples; (2) the combined models outperform individual ones most of the time and (3) the proposed framework LinkedRNN obtains the best performance.
Via the comparison, we can conclude that both sequential and link information in the linked sequences are important and they contain complementary information. Meanwhile, the consistent impressive performance of LinkedRNN on datasets from different domains demonstrate its effectiveness in capturing the sequential and link information presented in the sequences.
4.5. Component Analysis
In the proposed framework LinkedRNN, we have investigate several ways to define the two aggregation functions. In this subsection, we investigate the impact of the aggregation functions on the performance of the proposed framework LinkedRNN by defining the following variants.

LinkedRNN11: it is the variant which chooses and

LinkedRNN12: we define the variant by using and

LinkedRNN13: this variant is made by applying and

LinkedRNN21: this variant utilizes and

LinkedRNN22: it is the variant which chooses and

LinkedRNN23: we construct the variant by adopting and
The results are demonstrated in Figure 3. Note that we only show results on DBLP with as training since we can have similar observations with other settings. It can be observed:

Generally, the variants of LinkedRNN with obtain better performance than . It demonstrates that aggregating the sequence of the latent presentations with the help of the attention mechanism can boost the performance.

Aggregating representations from more layers in the link layer typically can result in better performance.
4.6. Parameter Analysis
LinkedRNN uses the link layer to capture link information. The link layer can have multiple layers. In this subsection, we study the impact of the number of layers on the performance of LinkedRNN. The performance changes with the number of layers are shown in Figure 4. Similar to the component analysis, we only report the results with one setting in DBLP since we have similar observations. In general, the performance first dramatically increases and then slowly decreases. One layer is not sufficient to capture the link information while more layers may result in overfitting.
5. Related Work
In this section, we briefly review the RNN based deep methods that have been proposed to learn the sequential data effectively (Rumelhart et al., 1986b). Although it has been designed to model arbitrarily long sequences, there are tremendous challenges to prevent it from effectively capturing the longterm dependencies. For example, the gradient vanishing and exploding issues make it very difficult to backpropagate error signals and the dependencies in sequences are complex. In addition, it is also timeconsuming to train these models as the training procedure is hard to parallelize. Thus, many researchers have attempted to develop advanced architectures to overcome aforementioned challenges. One of the most successful attempts is to add gate mechanism to the recurrent unit. The two representative works are Long shortterm memory (LSTM) (Hochreiter and Schmidhuber, 1997b) and gated recurrent units (GRU) (Cho et al., 2014), where sophisticated activation function is introduced to capture longterm dependencies in sequences. For example, in LSTM, a recurrent unit maintains two states and three gates which decide how much the new memory should added, how much exiting memory should be forgotten, the amount of memory context exposure, respectively. The RNNs that are equipped with such gating mechanism can effectively mitigate the gradient exploding and vanishing issues and have demonstrated extraordinary performance in a variety of tasks, such as machine translation (Sutskever et al., 2014), speech recognition (Graves et al., 2013a), and medical events detection (Jagannatha and Yu, 2016). Moreover, several works have introduced additional gates into LSTM unit to deal with the situation where irregularly sampled data presents (Neil et al., 2016; Baytas et al., 2017). These timeaware models can largely improve the training efficiency and effectiveness of RNNs.
Beside gating mechanism, other directions of extending RNN for better performance are also heavily explored. Schuster et al (Schuster and Paliwal, 1997)
described a new architecture called bidirectional recurrent neural networks(BRNNs), where two hidden layers that process the input from opposite directions were proposed. Although BRNNs are able to use all available input sequential information and effectively boost the prediction performance
(Ma et al., 2017), the limitation of it is quite obvious as it requires the information from the future (Lipton et al., 2015). Recently, Koutnik et al. (Koutnik et al., 2014) presented a clockwork RNN which modifies standard RNN architecture and partitions the hidden layer into separate modules. In this way, each individual can process the sequence at its own temporal granularity. One recent work proposed by Change et al. (Chang et al., 2017) tries to tackle those major challenges together. In doing so, they introduced a dilated recurrent skip connection which can largely reduce the model parameters and therefore enhances the computational efficiency. In addition, such layers can be stacked so that the dependencies of different scales are learned effectively at different layers. While a large body of research has focused on modeling the dependencies within the sequences, limited efforts have been made to model the dependencies between sequences. In this paper, we devote to tackling this novel challenge brought by the links among sequences and propose an effective model which has shown promising results.6. Conclusion
RNNs have been proven to be powerful in modeling sequences in many domains. Most of existing RNN methods have been designed for sequences which are assumed to be i.i.d. However, in many realworld applications, sequences are inherently linked and linked sequences present both challenges and opportunities to existing RNN methods, which calls for novel RNN methods. In this paper, we study the problem of designing RNN models for linked sequences. Suggested by Homophily, we introduce a principled method to capture link information and propose a novel RNN framework LinkedRNN, which can jointly model sequential and link information. Experimental results on datasets from different domains demonstrate that (1) the proposed framework can outperform a variety of representative baselines; and (2) link information is helpful to boost the RNN performance.
There are several interesting directions to investigate in the future. First, our current model focuses on unweighted and undirected links and we will study weighted and directed links and the corresponding RNN models. Second, in current work, we focus on classification and regression problems with certain loss functions. we will investigate other types of loss functions to learn the parameters of the proposed framework and also investigate more applications of the proposed framework. Third, since our model can be naturally extended for inductive learning, we will further validate the effectiveness of the proposed framework for inductive learning. Finally, in some applications, the link information may be evolving; thus we plan to study RNN models, which can capture the dynamics of links as well.
References
 (1)
 Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
 Bahdanau et al. (2016) Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. Endtoend attentionbased large vocabulary speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 4945–4949.
 Baytas et al. (2017) Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via timeaware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 65–74.
 Bebek (2012) Gurkan Bebek. 2012. Identifying gene interaction networks. In Statistical Human Genetics. Springer, 483–494.
 Bengio et al. (1994) Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning longterm dependencies with gradient descent is difficult. IEEE transactions on neural networks 5, 2 (1994), 157–166.
 Chang et al. (2017) Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark A HasegawaJohnson, and Thomas S Huang. 2017. Dilated recurrent neural networks. In Advances in Neural Information Processing Systems. 76–86.
 Che et al. (2018) Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2018. Recurrent neural networks for multivariate time series with missing values. Scientific reports 8, 1 (2018), 6085.
 Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
 Chorowski et al. (2015) Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attentionbased models for speech recognition. In Advances in neural information processing systems. 577–585.
 Glover et al. (2002) Eric J Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M Pennock, and Gary W Flake. 2002. Using web structure for classifying and describing web pages. In Proceedings of the 11th international conference on World Wide Web. ACM, 562–569.
 Graves et al. (2013a) Alex Graves, Navdeep Jaitly, and Abdelrahman Mohamed. 2013a. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 273–278.
 Graves et al. (2013b) Alex Graves, Abdelrahman Mohamed, and Geoffrey Hinton. 2013b. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645–6649.
 Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855–864.
 Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Sessionbased recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
 Hochreiter and Schmidhuber (1997a) Sepp Hochreiter and Jürgen Schmidhuber. 1997a. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Hochreiter and Schmidhuber (1997b) Sepp Hochreiter and Jürgen Schmidhuber. 1997b. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Hu et al. (2013) Xia Hu, Lei Tang, Jiliang Tang, and Huan Liu. 2013. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 537–546.
 Jagannatha and Yu (2016) Abhyuday N Jagannatha and Hong Yu. 2016. Bidirectional RNN for medical event detection in electronic health records. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, Vol. 2016. NIH Public Access, 473.
 Kim et al. (2016) Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. CharacterAware Neural Language Models.. In AAAI. 2741–2749.
 Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
 Koutnik et al. (2014) Jan Koutnik, Klaus Greff, Faustino Gomez, and Juergen Schmidhuber. 2014. A clockwork rnn. arXiv preprint arXiv:1402.3511 (2014).
 Krivitsky et al. (2009) Pavel N Krivitsky, Mark S Handcock, Adrian E Raftery, and Peter D Hoff. 2009. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social networks 31, 3 (2009), 204–213.
 Lin et al. (2006) Zhenjiang Lin, Michael R Lyu, and Irwin King. 2006. PageSim: a novel linkbased measure of web page aimilarity. In Proceedings of the 15th international conference on World Wide Web. ACM, 1019–1020.
 Lipton et al. (2015) Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015).
 Luo (2017) Yuan Luo. 2017. Recurrent neural networks for classifying relations in clinical notes. Journal of biomedical informatics 72 (2017), 85–95.
 Luong et al. (2015) MinhThang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
 Ma et al. (2017) Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attentionbased bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1903–1911.
 McPherson et al. (2001) Miller McPherson, Lynn SmithLovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
 Miao et al. (2015) Yajie Miao, Mohammad Gowayyed, and Florian Metze. 2015. EESEN: Endtoend speech recognition using deep RNN models and WFSTbased decoding. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 167–174.
 Mikolov et al. (2013) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
 Mikolov et al. (2010a) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010a. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
 Mikolov et al. (2010b) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010b. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
 Mikolov et al. (2011) Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 5528–5531.
 Neil et al. (2016) Daniel Neil, Michael Pfeiffer, and ShihChii Liu. 2016. Phased lstm: Accelerating recurrent network training for long or eventbased sequences. In Advances in Neural Information Processing Systems. 3882–3890.
 Palangi et al. (2016) Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long shortterm memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24, 4 (2016), 694–707.
 Rumelhart et al. (1986a) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986a. Learning representations by backpropagating errors. nature 323, 6088 (1986), 533.
 Rumelhart et al. (1986b) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986b. Learning representations by backpropagating errors. nature 323, 6088 (1986), 533.
 Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.
 Soltau et al. (2016) Hagen Soltau, Hank Liao, and Hasim Sak. 2016. Neural speech recognizer: Acoustictoword LSTM model for large vocabulary speech recognition. arXiv preprint arXiv:1610.09975 (2016).

Sutskever
et al. (2011)
Ilya Sutskever, James
Martens, and Geoffrey E Hinton.
2011.
Generating text with recurrent neural networks. In
Proceedings of the 28th International Conference on Machine Learning (ICML11)
. 1017–1024.  Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104–3112.
 Tang et al. (2013) Jiliang Tang, Xia Hu, and Huan Liu. 2013. Social recommendation: a review. Social Network Analysis and Mining 3, 4 (2013), 1113–1133.
 Tang and Liu (2012) Jiliang Tang and Huan Liu. 2012. Unsupervised feature selection for linked social media data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 904–912.
 Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.
 Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD’08. 990–998.
 Wang et al. (2011) Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. 2011. Topic sentiment analysis in twitter: a graphbased hashtag sentiment classification approach. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1031–1040.
 Wang et al. (2017) Zhiwei Wang, Tyler Derr, Dawei Yin, and Jiliang Tang. 2017. Understanding and Predicting Weight Loss with Mobile Social Networking Data. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1269–1278.
 Wu et al. (2016) Caihua Wu, Junwei Wang, Juntao Liu, and Wenyu Liu. 2016. Recurrent neural network based recommendation for time heterogeneous feedback. KnowledgeBased Systems 109 (2016), 90–103.
 Yang et al. (2016) Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.
 Zhou et al. (2018) Meizi Zhou, Zhuoye Ding, Jiliang Tang, and Dawei Yin. 2018. Micro behaviors: A new perspective in ecommerce recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 727–735.
Comments
There are no comments yet.