Linked Recurrent Neural Networks

08/19/2018 ∙ by Zhiwei Wang, et al. ∙ Association for Computing Machinery Michigan State University 0

Recurrent Neural Networks (RNNs) have been proven to be effective in modeling sequential data and they have been applied to boost a variety of tasks such as document classification, speech recognition and machine translation. Most of existing RNN models have been designed for sequences assumed to be identically and independently distributed (i.i.d). However, in many real-world applications, sequences are naturally linked. For example, web documents are connected by hyperlinks; and genes interact with each other. On the one hand, linked sequences are inherently not i.i.d., which poses tremendous challenges to existing RNN models. On the other hand, linked sequences offer link information in addition to the sequential information, which enables unprecedented opportunities to build advanced RNN models. In this paper, we study the problem of RNN for linked sequences. In particular, we introduce a principled approach to capture link information and propose a linked Recurrent Neural Network (LinkedRNN), which models sequential and link information coherently. We conduct experiments on real-world datasets from multiple domains and the experimental results validate the effectiveness of the proposed framework.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recurrent Neural Networks (RNNs) have been proven to be powerful in learning a reusable parameters that produce hidden representations of sequences. They have been successfully applied to model sequential data and achieve state-of-the-art performance in numerous domains such as speech recognition 

(Graves et al., 2013b; Miao et al., 2015; Soltau et al., 2016)

, natural language processing 

(Kim et al., 2016; Bahdanau et al., 2014; Mikolov et al., 2010a, 2011), healthcare (Jagannatha and Yu, 2016; Che et al., 2018; Luo, 2017), recommendations (Zhou et al., 2018; Hidasi et al., 2015; Wu et al., 2016) and information retrieval (Palangi et al., 2016).

The majority of existing RNN models have been designed for traditional sequences, which are assumed to be identically, independently distributed (i.i.d.). However, many real-world applications generate linked sequences. For example, web documents, sequences of words, are connected via hyperlinks; genes, sequences of DNA or RNA, typically interact with each other. Figure 1 illustrates one toy example of linked sequences where there are four sequences – , , and . These four sequences are linked via three links – is connected with and and is linked with and . On the one hand, these linked sequences are inherently related. For example, linked web documents are likely to be similar (Glover et al., 2002) and interacted genes tend to share similar functionalities (Bebek, 2012). Hence, linked sequences are not i.i.d., which presents immense challenges to traditional RNNs. On the other hand, linked sequences offer additional link information in addition to the sequential information. It is evident that link information can be exploited to boost various analytical tasks such as social recommendations (Tang et al., 2013)

, sentiment analysis 

(Wang et al., 2011; Hu et al., 2013)

and feature selection 

(Tang and Liu, 2012). Thus, the availability of link information in linked sequences has the great potential to enable us to develop advanced Recurrent Neural Networks.

Figure 1. An Illustration of Linked Sequences. , , and denote four sequences and they are connected via four links.

Now we have established that – (1) traditional RNNs are insufficient and dedicated efforts are needed for linked sequences; and (2) the availability of link information in linked sequences offer unprecedented opportunities to advance traditional RNNs. In this paper, we study the problem of modeling linked sequences via RNNs. In particular, we aim to address the following challenges – (1) how to capture link information mathematically and (2) how to combine sequential and link information via Recurrent Neural Networks. To address these two challenges, we propose a novel Linked Recurrent Neural Network (LinkedRNN) for linked sequences. Our major contributions are summarized as follows:

  • We introduce a principled way to capture link information for linked sequence mathematically;

  • We propose a novel RNN framework LinkedRNN, which can model sequential and link information coherently for linked sequences; and

  • We validate the effectiveness of the proposed framework on real-world datasets across different domains.

The rest of the paper is organized as follows. Section 2 gives a formal definition of the problem we aim to investigate. In Section 3, we motivate and detail the framework LinkedRNN. The experiment design, results and the datasets are described in Section 4. Section 5 briefly reviews the related work in literature. Finally, we conclude our work and discuss the future work in Section 6.

2. Problem Statement

Before we give a formal definition of the problem, we want firstly give notations that will be used throughout the paper. We denote scalars by lower-case letters such as and

, vectors are denoted by bold lower-case letters such as

and , and matrices are represented by bold upper case letters such as and . For a matrix , we denote the entry at the row and column of it as , the row as and column as . In addition, let represent set where the order of the elements does not matter and the superscripts are used to denote indexes of elements such as , which is equivalent to . In contrast, is used to denote a set of sequential events where the order matters and we use subscripts to indicate the order indexes of the events in sequences such as .

Let be the set of sequences. For linked sequences, two types of information are available. One is the sequential information for each sequence. We denote the sequential information of as where is the length of . The other is the link information. We use an adjacent matrix to denote the link information of linked sequences where if there is a link between the sequence and and , otherwise. In this work, we following the transductive learning setting. In detail, we assume that a part of the sequences from to are labeled where . We denote the labeled sequences as . For a sequence , we use to denote its label where is a continuous number for the regression problem and is one symbol for the classification problem. Note that in this work, we focus on the unweighted and undirected links among sequences. However, it is straightforward to extend the proposed framework for weighted and directed links. We would like to leave it as one future work. Although the proposed framework is designed for transductive learning, we also can use it for inductive learning, which will be discussed when we introduce the proposed framework in the following section.

With the above notations and definitions, we formally define the problem we target in this work as follows:

Given a set of sequences with sequential information and link information , and a subset of labeled sequences , we aim to build a RNN model by leveraging , and , which can learn representations for sequences to predict the labels of the unlabeled sequences in .

3. The proposed framework

Figure 2. An illustrate of the proposed framework LinkedRNN on the toy example as shown in Figure 1. It consists of two major layers where RNN layer is to capture sequential information and the link layer is to capture link information.

In addition to sequential information, link information is available for linked sequences as shown in Figure 1. As aforementioned, the major challenges to model linked sequences are how to capture link information and how to combine sequential and link information coherently. To tackle these two challenges, we propose a novel Recurrent Neural Networks LinkedRNN. An illustrate of the proposed framework on the toy example of Figure 1 is demonstrated in Figure 2. It mainly consists of two layers. The RNN layer is to capture the sequential information. The output of the RNN layer is the input of the link layer where link information is captured. Next, we first detail each layer and then present the overall framework of LinkedRNN.

3.1. Capturing sequential information

Given a sequence

, the RNN layer aims to learn a representation vector that can capture its complex sequential patterns via Recurrent Neural Networks. In deep learning community, Recurrent Neural Networks (RNNs)

(Rumelhart et al., 1986a; Mikolov et al., 2010b) have been very successful to capture sequential patterns in many fields(Mikolov et al., 2010b; Sutskever et al., 2011). Specifically, RNN consists of recurrent units that take the previous state and current event as input and output a current state containing the sequential information seen so far as:

(1)

Where and W are the learnable parameters and

is a activation function which enables the non-linearity. However, one major limitation of the vanilla RNN in Equation 

1 is that it suffers from gradients vanishing or exploding issues, which fail the learning procedure as it cannot capture the error signals during back-propagation process(Bengio et al., 1994).

More advanced recurrent units such as long short-term memory (LSTM) model 

(Hochreiter and Schmidhuber, 1997a)

and the Gated Recurrent Unit (GRU) 

(Cho et al., 2014) have been proposed to solve the gradient vanishing problem. Different from vallina RNN, these variants employ gating mechanism to decide when and how much the state should be updated with the current information. In this work, due to its simplicity and effectiveness, we choose GRU as our RNN unit. Specifically, in the GRU, current state

is a linear interpolation between previous state

and a candidate state :

(2)

where is the element-wise multiplication and is called update gate which is introduced to control how much current state should be updated. It is obtained through the following equation:

(3)

Where and are the parameters and

is the sigmoid function, that is,

. In addition, the newly introduced candidate state is computed by the Equation 4:

(4)

where is the tanh function that and and are model parameters. is the reset gate which determines the contribution of previous state to the candidate state and is obtained as follows:

(5)

The output of the RNN layer will be the input of the link layer. For a sequence , the RNN layer will learn a sequence of latent representations . There are various ways to obtain the final output of from . In this work, we investigate two popular ways:

  • As the last latent representation is able to capture information from previous states, we can just use it as the representation of the whole sequence. We denote this way of aggregation as . Specifically, we let .

  • The attention mechanism can help the model automatically focus on relevant parts of the sequence to better capture the long-range structure and it has shown effectiveness in many tasks (Bahdanau et al., 2016; Luong et al., 2015; Chorowski et al., 2015). Thus, we define our second way of aggregation based on the attention mechanism as follows:

    (6)

    where is the attention score, which can be obtained as

    (7)

    where is a feedforward layer:

    (8)

    Note that different attention mechanisms can be used, we will leave it as one future work. We denote the aggregation way described above as .

For the general purpose, we will use RNN to denote GRU in the rest of the paper.

3.2. Capturing link information

The RNN layer is able to capture the sequential information. However, in linked sequences, sequences are naturally related. The Homophily theory suggests that linked entities tend to have similar attributes (McPherson et al., 2001), which have been validated in many real-world networks such as social networks (Krivitsky et al., 2009), web networks (Lin et al., 2006), and biological networks (Bebek, 2012). As indicated by Homophily, a node is likely to share similar attributes and properties with nodes with connections. In other words, a node is similar to its neighbors. With this intuition, we propose the link layer to capture link information in linked sequences.

As shown in Figure 2, to capture link information, for a node, the link layer not only includes information from its sequential information but also aggregates information from its neighbors. The link layer can contain multiple hidden layers. In other words, for one node, we can aggregate information from itself and its neighbors multiple times. Let be the hidden representations of the sequence after aggregations. Note that when , is the input of the link layer, i.e., . Then can be updated as:

(9)

where is an element-wise activation function, is the set of neighbors who are linked with , i.e., , and is the number of neighbors of . We define as the matrix form of representations of all sequences at the -th layer. We modify the original adjacency matrix by allowing . The aggregation in the Eq. (9) can be written in the matrix form as:

(10)

where is the embedding matrix after step aggregation, and is the diagonal matrix where is defined as:

(11)

3.3. Linked Recurrent Neural Networks

With the model components to capture sequential and link information, the procedure of the proposed framework LinkedRNN is presented below:

(12)

where the input of the RNN layer is the sequential information and the RNN layer will produce the sequence of latent representations . The sequence of latent representations will be aggregated to obtain the output of the RNN layer, which serves as the input of the Link layer. After layers, link layer produces a sequence of latent representations , which will be aggregated to the final representation.

The final representation for the sequence is to aggregate the sequence from the link layer. In this work, we investigate several ways to obtain the final representation as:

  • As is the output of the last layer, we can define the final representation as: , and we denote this way as .

  • Although the new representation

    incorporates all the neighbor information, the signal in the representation of itself may be overwhelmed during the aggregation process. This is especially likely to happen when there are a large number of neighbors. Thus, to make the new representation to focus more on itself, we propose to use a feed forward neural network to perform the combination. We concatenate representations from the last two layers as the input of the feed forward network. We refer this aggregation method as

    .

  • Each representation could contain its unique information, which cannot be carried in the later part. Thus, similarly, we use a feed forward neural network to perform the combination of . We refer this aggregation method as .

To learn the parameters of the proposed framework LinkedRNN, we need to define a loss function that depends on the specific task. In this work, we investigate LinkedRNN in two tasks – classification and regression.

Classification. The final output of a sequence is . We can consider

as features and build the classifier. In particular, the predicted class labels can be obtained through a softmax function as:

(13)

where and are the coefficients and the bias parameters, respectively. is the predicted label of the sequence . The corresponding loss function used in this paper is the cross-entropy loss.

Regression.

For the regression problem, we choose linear regression in this work. In other words, the regression label of the sequence

is predicted as:

(14)

where and are the regression coefficients and the bias parameters, respectively. Then square loss is adopted in this work as the loss function as:

(15)

Note that there are other ways to define loss functions for classification and regression. We would like to leave the investigation of other formats of loss functions as one future work.

Prediction. For an unlabeled sequence

under the classification problem, its label is predicted as the one corresponding to the entity with the highest probability in

.

For an unlabeled sequence under the regression problem, its label is predicted as .

Although the framework is designed for transductive learning, it can be naturally used for inductive learning. For a sequence , which is unseen in the given linked sequences , according to its sequential information and its neighbors , it is easy to obtain its representation via Eq. (12). Then based on , its label can be predicted as the normal prediction step described above.

4. Experiment

In this section, we present experimental details to verify the effectiveness of the proposed framework. Specifically, we validate the proposed framework on datasets from two different domains. Next, we firstly describe the datasets we used in the experiments and then compare the performance of the proposed framework with representative baselines. Lastly, we analyze the key components of LinkedRNN.

Description DBLP BOOHEE
# of sequences 47,491 18,229
Network density (‰) 0.13 0.012
Avg length of sequences 6.6 23.5
Max length of sequences 20 29
Table 1. Statistics of the datasets.

4.1. Datasets

In this study, we collect two types of linked sequences. One is from DBLP where data contains textual sequences of papers. The other is from a weight loss website BOOHEE where data includes weight sequences of users. Some statistics of the datasets are demonstrated in Table 1. Next we introduce more details.

DBLP dataset. We constructed a paper citation network from the public available DBLP data set111https://aminer.org/citation.(Tang et al., 2008). This dataset contains information for millions of paper from a variety of research fields. Specifically, each paper contains the following relevant information: paper id, publication venue, the id references of it and abstract. Following the similar practice in (Tang et al., 2015), we only select papers from conferences in 10 largest computer science domains including VCG, ACL, IP, TC, WC, CCS, CVPR, PDS , NIPS, KDD, WWW, ICSE, Bioinformatics, TCS. We construct a sequence for each paper from their abstracts and regard their citation relationships as the link information between sequences. Specifically, we first split the abstract into sentences and tokenize each sentence using python NLTK package. Then, we use Word2Vec (Mikolov et al., 2013) to embed each word into Euclidean space and for each sentence, we treat the mean of its word vectors as the sentence embedding. Thus, the abstract of each paper can be represented by a sequence of sentence embeddings. We will conduct the classification task on this dataset, i.e., paper classification. Thus, the label of each sequence is the corresponding publication venue.

BOOHEE dataset. This dataset is collected from one of the most popular weight management mobile applications, BOOHEE 222https:www.boohee.com. It contains million of users who self-track their weights and interact with each other in the internal social network provided by the application. Specifically, they can follow friends, make comment to friends’ post and mention (@) friends in comments or posts. The recored weights by users form sequences which contain the weight dynamic information and the social networking behaviors result in three networks that correspond to following, commenting, and mentioning interactions, respectively. Previous work (Wang et al., 2017) has shown a social correlation on the users’ weight loss. Thus, we use these social networks as the link information for the weight sequence data. We preprocess the dataset to filter out the sequences from suspicious spam users. Moreover, we change the time granularity of weight sequence from days to weeks to remove the daily fluctuation noise. Specifically, we compute the mean value of all the recorded weights in one week and use it as the weight for that week. For networks, we combine three networks into one by adding them together and filter out weak ties. In this dataset, we will conduct a regression task of weight prediction. We choose the most recent weight in a weight sequence as the weight we aim to predict (or the groundtruth of the regression problem). Note that for a user, we remove all social interactions that form after the most recent weight where we want to avoid the issue of using future link information for weight prediction.

Measurement Method Training ratio
10 % 30 % 50% 70%
Micro-F1 node2vec 0.6641 0.6550 0.6688 0.6691
GCN 0.7005 0.7093 0.7110 0.7180
RNN 0.7686 0.7980 0.7978 0.8025
RNN-node2vec 0.7940 0.8031 0.7933 0.8114

RNN-GCN 0.7912 0.8230 0.8255 0.8284
LinkedRNN 0.8146 0.8399 0.8463 0.8531

Macro-F1
node2vec 0.6514 0.6523 0.6513 0.6565
GCN 0.6874 0.6992 0.7004 0.7095

RNN 0.7452 0.7751 0.7754 0.7824
RNN+node2vec 0.7734 0.7797 0.7702 0.7912
RNN+GCN 0.7642 0.8014 0.8069 0.8104
LinkedRNN 0.7970 0.8249 0.8331 0.8365
Table 2. Performance Comparison in the DBLP dataset
Method Training ratio
10 % 30 % 50% 70%
node2vec 8.8702 8.8517 7.4744 7.0390
GCN 8.9347 8.6830 6.7949 6.7278
RNN 8.6600 8.6048 7.0466 6.8033
RNN-node2vec 8.4653 8.5944 7.0173 6.7796
RNN-GCN 8.6286 8.5662 6.9967 6.7945
LinkedRnn 7.1822 6.3882 6.8416 6.3517
Table 3. Performance Comparision in the BOOHEE dataset

4.2. Representative baselines

To validate the effectiveness of the proposed framework, we construct three groups of representative baselines. The first group includes the state-of-the-art network embedding methods, i.e., node2vec (Grover and Leskovec, 2016) and GCN (Kipf and Welling, 2016), which only capture the link information. The second group is the GRU RNN model (Graves et al., 2013b), which is the basic model we used in our model to capture sequential information. Baselines in the third group is to combine models in the first and second groups, which captures both sequential and link information. Next, we present more details about these baselines.

  • Node2vec (Grover and Leskovec, 2016). Node2vec is one state-of-the-art network embedding method. It learns the representation of sequences only capturing the link information in a random-walk perspective.

  • GCN (Kipf and Welling, 2016) It is the traditional graph convolutional graph algorithm. It is trained with both link and label information. Hence, it is different from node2vec, which is learnt with only link information and is totally independent on the task.

  • RNN (Graves et al., 2013b). RNNs have been widely used for modeling sequential data and achieved great success in a variety of domains. However, they tend to ignore the correlation between sequences and only focus on sequential information. We construct this baseline to show the importance of correlation information. To make the comparison fair, we employ the same recurrent unit (GRU) in both the proposed framework and this baseline.

  • RNN-node2vec. The Node2vec method is able to learn representation from the link information and the RNN can do so from the sequential information. Thus, to obtain the representation of sequences that contains both link and sequential information, we concatenate the two sets of embeddings obtained from Node2vec and RNN via a feed forward neural network.

  • RNN-GCN. RNN-GCN applies a similar strategy of combining RNN and node2vec to combine RNN and GCN.

There are several notes about the baselines. First, node2vec does not use label information and it is unsupervised , RNN and RNN-node2vec utilize label information and they are supervised, and GCN and RNN-GCN use both label information and unlabeled data and they are semi-supervised. Second, some sequences may not have link information and baselines only capture link information cannot learn representations for these sequences; hence, in this work, when representations from link information are unavailable, we will use the representations from the sequential information via RNN instead. Third, we do not choose LSTM and its variants as baselines since our current model is based on GRU and we also can choose LSTM and its variants as the base models.

4.3. Experimental settings

Data split: For both datasets, we randomly select 30% for test. Then we fix the test set and choose of the remaining data for training and for validation to select parameters for baselines and the proposed framework. In this work, we vary as .

Parameter selection: In our experiments, we set the dimension of representation vectors of sequences to 100. For Node2vec, we use the validation data to select the best value for and from as suggested by the authors (Grover and Leskovec, 2016) and use the default values for the remaining parameters. In addition, the learning rate for all of the methods are selected through validation set.

Evaluation metrics: Since we will perform classification in the DBLP data, we use Micro and Macro F1 scores as the metrics for DBLP, which are widely used for classification problems (Yang et al., 2016; Grover and Leskovec, 2016). The higher value means better performance. We perform the regression problem weight prediction in the BOOHEE data. Therefore the performance in BOOHEE data is evaluated by mean squared error (MSE) score. The lower value of MSE indicates higher prediction performance.

4.4. Experimental Results

We first present the results in DBLP data. The results are shown in Table 2. For the proposed framework, we choose for the link layer and more details about discussions about the choices of its aggregation functions will be discussed in the following section. From the table, we make the following observations:

  • As we can see in Table 2, in most cases, the performance tends to improve as the number of training samples increases.

  • The random guess can obtain 0.1 for both micro-F1 and macro-F1. We note that the network embedding methods perform much better than the random guess, which clearly shows that the link information is indeed helpful for the prediction.

  • GCN achieves much better performance than node2vec. As we mentioned before, GCN uses label information and the learnt representations are optimal for the given task. While node2vec learns representations independent on the given task, the representations may be not optimal.

  • The RNN approach has higher performance than GCN. Both of them use the label information. This observation suggests that the content and sequential information is very helpful.

  • Most of the time, RNN-node2vec and RNN-GCN outperform the individual models. This observation indicates that both sequential and link information are important and they contain complementary information.

  • The proposed framework LinkedRNN consistently outperforms baselines. This strongly demonstrates the effectiveness of LinkedRNN. In addition, comparing to RNN-node2vec and RNN-GCN, the proposed framework is able to jointly capture the sequential and link information coherently, which leads to significant performance gain.

We present the performance on BOOHEE in Table 3. Overall, we make similar observations as these on DBLP as – (1) the performance improves with the increase of number of training samples; (2) the combined models outperform individual ones most of the time and (3) the proposed framework LinkedRNN obtains the best performance.

Via the comparison, we can conclude that both sequential and link information in the linked sequences are important and they contain complementary information. Meanwhile, the consistent impressive performance of LinkedRNN on datasets from different domains demonstrate its effectiveness in capturing the sequential and link information presented in the sequences.

4.5. Component Analysis

In the proposed framework LinkedRNN, we have investigate several ways to define the two aggregation functions. In this subsection, we investigate the impact of the aggregation functions on the performance of the proposed framework LinkedRNN by defining the following variants.

  • LinkedRNN11: it is the variant which chooses and

  • LinkedRNN12: we define the variant by using and

  • LinkedRNN13: this variant is made by applying and

  • LinkedRNN21: this variant utilizes and

  • LinkedRNN22: it is the variant which chooses and

  • LinkedRNN23: we construct the variant by adopting and

The results are demonstrated in Figure 3. Note that we only show results on DBLP with as training since we can have similar observations with other settings. It can be observed:

  • Generally, the variants of LinkedRNN with obtain better performance than . It demonstrates that aggregating the sequence of the latent presentations with the help of the attention mechanism can boost the performance.

  • Aggregating representations from more layers in the link layer typically can result in better performance.

Figure 3. The impact of aggregation functions on the performance of the proposed framework.
Figure 4.

The performance variance with the number of layers of the link layer.

4.6. Parameter Analysis

LinkedRNN uses the link layer to capture link information. The link layer can have multiple layers. In this subsection, we study the impact of the number of layers on the performance of LinkedRNN. The performance changes with the number of layers are shown in Figure 4. Similar to the component analysis, we only report the results with one setting in DBLP since we have similar observations. In general, the performance first dramatically increases and then slowly decreases. One layer is not sufficient to capture the link information while more layers may result in overfitting.

5. Related Work

In this section, we briefly review the RNN based deep methods that have been proposed to learn the sequential data effectively (Rumelhart et al., 1986b). Although it has been designed to model arbitrarily long sequences, there are tremendous challenges to prevent it from effectively capturing the long-term dependencies. For example, the gradient vanishing and exploding issues make it very difficult to back-propagate error signals and the dependencies in sequences are complex. In addition, it is also time-consuming to train these models as the training procedure is hard to parallelize. Thus, many researchers have attempted to develop advanced architectures to overcome aforementioned challenges. One of the most successful attempts is to add gate mechanism to the recurrent unit. The two representative works are Long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997b) and gated recurrent units (GRU) (Cho et al., 2014), where sophisticated activation function is introduced to capture long-term dependencies in sequences. For example, in LSTM, a recurrent unit maintains two states and three gates which decide how much the new memory should added, how much exiting memory should be forgotten, the amount of memory context exposure, respectively. The RNNs that are equipped with such gating mechanism can effectively mitigate the gradient exploding and vanishing issues and have demonstrated extraordinary performance in a variety of tasks, such as machine translation (Sutskever et al., 2014), speech recognition (Graves et al., 2013a), and medical events detection (Jagannatha and Yu, 2016). Moreover, several works have introduced additional gates into LSTM unit to deal with the situation where irregularly sampled data presents (Neil et al., 2016; Baytas et al., 2017). These time-aware models can largely improve the training efficiency and effectiveness of RNNs.

Beside gating mechanism, other directions of extending RNN for better performance are also heavily explored. Schuster et al (Schuster and Paliwal, 1997)

described a new architecture called bidirectional recurrent neural networks(BRNNs), where two hidden layers that process the input from opposite directions were proposed. Although BRNNs are able to use all available input sequential information and effectively boost the prediction performance 

(Ma et al., 2017), the limitation of it is quite obvious as it requires the information from the future (Lipton et al., 2015). Recently, Koutnik et al. (Koutnik et al., 2014) presented a clockwork RNN which modifies standard RNN architecture and partitions the hidden layer into separate modules. In this way, each individual can process the sequence at its own temporal granularity. One recent work proposed by Change et al. (Chang et al., 2017) tries to tackle those major challenges together. In doing so, they introduced a dilated recurrent skip connection which can largely reduce the model parameters and therefore enhances the computational efficiency. In addition, such layers can be stacked so that the dependencies of different scales are learned effectively at different layers. While a large body of research has focused on modeling the dependencies within the sequences, limited efforts have been made to model the dependencies between sequences. In this paper, we devote to tackling this novel challenge brought by the links among sequences and propose an effective model which has shown promising results.

6. Conclusion

RNNs have been proven to be powerful in modeling sequences in many domains. Most of existing RNN methods have been designed for sequences which are assumed to be i.i.d. However, in many real-world applications, sequences are inherently linked and linked sequences present both challenges and opportunities to existing RNN methods, which calls for novel RNN methods. In this paper, we study the problem of designing RNN models for linked sequences. Suggested by Homophily, we introduce a principled method to capture link information and propose a novel RNN framework LinkedRNN, which can jointly model sequential and link information. Experimental results on datasets from different domains demonstrate that (1) the proposed framework can outperform a variety of representative baselines; and (2) link information is helpful to boost the RNN performance.

There are several interesting directions to investigate in the future. First, our current model focuses on unweighted and undirected links and we will study weighted and directed links and the corresponding RNN models. Second, in current work, we focus on classification and regression problems with certain loss functions. we will investigate other types of loss functions to learn the parameters of the proposed framework and also investigate more applications of the proposed framework. Third, since our model can be naturally extended for inductive learning, we will further validate the effectiveness of the proposed framework for inductive learning. Finally, in some applications, the link information may be evolving; thus we plan to study RNN models, which can capture the dynamics of links as well.

References

  • (1)
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  • Bahdanau et al. (2016) Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-end attention-based large vocabulary speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 4945–4949.
  • Baytas et al. (2017) Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 65–74.
  • Bebek (2012) Gurkan Bebek. 2012. Identifying gene interaction networks. In Statistical Human Genetics. Springer, 483–494.
  • Bengio et al. (1994) Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5, 2 (1994), 157–166.
  • Chang et al. (2017) Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark A Hasegawa-Johnson, and Thomas S Huang. 2017. Dilated recurrent neural networks. In Advances in Neural Information Processing Systems. 76–86.
  • Che et al. (2018) Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2018. Recurrent neural networks for multivariate time series with missing values. Scientific reports 8, 1 (2018), 6085.
  • Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
  • Chorowski et al. (2015) Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in neural information processing systems. 577–585.
  • Glover et al. (2002) Eric J Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M Pennock, and Gary W Flake. 2002. Using web structure for classifying and describing web pages. In Proceedings of the 11th international conference on World Wide Web. ACM, 562–569.
  • Graves et al. (2013a) Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013a. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 273–278.
  • Graves et al. (2013b) Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013b. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645–6649.
  • Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855–864.
  • Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
  • Hochreiter and Schmidhuber (1997a) Sepp Hochreiter and Jürgen Schmidhuber. 1997a. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  • Hochreiter and Schmidhuber (1997b) Sepp Hochreiter and Jürgen Schmidhuber. 1997b. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  • Hu et al. (2013) Xia Hu, Lei Tang, Jiliang Tang, and Huan Liu. 2013. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 537–546.
  • Jagannatha and Yu (2016) Abhyuday N Jagannatha and Hong Yu. 2016. Bidirectional RNN for medical event detection in electronic health records. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, Vol. 2016. NIH Public Access, 473.
  • Kim et al. (2016) Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character-Aware Neural Language Models.. In AAAI. 2741–2749.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  • Koutnik et al. (2014) Jan Koutnik, Klaus Greff, Faustino Gomez, and Juergen Schmidhuber. 2014. A clockwork rnn. arXiv preprint arXiv:1402.3511 (2014).
  • Krivitsky et al. (2009) Pavel N Krivitsky, Mark S Handcock, Adrian E Raftery, and Peter D Hoff. 2009. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social networks 31, 3 (2009), 204–213.
  • Lin et al. (2006) Zhenjiang Lin, Michael R Lyu, and Irwin King. 2006. PageSim: a novel link-based measure of web page aimilarity. In Proceedings of the 15th international conference on World Wide Web. ACM, 1019–1020.
  • Lipton et al. (2015) Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015).
  • Luo (2017) Yuan Luo. 2017. Recurrent neural networks for classifying relations in clinical notes. Journal of biomedical informatics 72 (2017), 85–95.
  • Luong et al. (2015) Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
  • Ma et al. (2017) Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1903–1911.
  • McPherson et al. (2001) Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
  • Miao et al. (2015) Yajie Miao, Mohammad Gowayyed, and Florian Metze. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 167–174.
  • Mikolov et al. (2013) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  • Mikolov et al. (2010a) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010a. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
  • Mikolov et al. (2010b) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010b. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
  • Mikolov et al. (2011) Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 5528–5531.
  • Neil et al. (2016) Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. 2016. Phased lstm: Accelerating recurrent network training for long or event-based sequences. In Advances in Neural Information Processing Systems. 3882–3890.
  • Palangi et al. (2016) Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24, 4 (2016), 694–707.
  • Rumelhart et al. (1986a) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986a. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533.
  • Rumelhart et al. (1986b) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986b. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533.
  • Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.
  • Soltau et al. (2016) Hagen Soltau, Hank Liao, and Hasim Sak. 2016. Neural speech recognizer: Acoustic-to-word LSTM model for large vocabulary speech recognition. arXiv preprint arXiv:1610.09975 (2016).
  • Sutskever et al. (2011) Ilya Sutskever, James Martens, and Geoffrey E Hinton. 2011. Generating text with recurrent neural networks. In

    Proceedings of the 28th International Conference on Machine Learning (ICML-11)

    . 1017–1024.
  • Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104–3112.
  • Tang et al. (2013) Jiliang Tang, Xia Hu, and Huan Liu. 2013. Social recommendation: a review. Social Network Analysis and Mining 3, 4 (2013), 1113–1133.
  • Tang and Liu (2012) Jiliang Tang and Huan Liu. 2012. Unsupervised feature selection for linked social media data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 904–912.
  • Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.
  • Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD’08. 990–998.
  • Wang et al. (2011) Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. 2011. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1031–1040.
  • Wang et al. (2017) Zhiwei Wang, Tyler Derr, Dawei Yin, and Jiliang Tang. 2017. Understanding and Predicting Weight Loss with Mobile Social Networking Data. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1269–1278.
  • Wu et al. (2016) Caihua Wu, Junwei Wang, Juntao Liu, and Wenyu Liu. 2016. Recurrent neural network based recommendation for time heterogeneous feedback. Knowledge-Based Systems 109 (2016), 90–103.
  • Yang et al. (2016) Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.
  • Zhou et al. (2018) Meizi Zhou, Zhuoye Ding, Jiliang Tang, and Dawei Yin. 2018. Micro behaviors: A new perspective in e-commerce recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 727–735.