A Neural Turing Machine for Conditional Transition Graph Modeling

07/15/2019
by   Mehdi Ben Lazreg, et al.
9

Graphs are an essential part of many machine learning problems such as analysis of parse trees, social networks, knowledge graphs, transportation systems, and molecular structures. Applying machine learning in these areas typically involves learning the graph structure and the relationship between the nodes of the graph. However, learning the graph structure is often complex, particularly when the graph is cyclic, and the transitions from one node to another are conditioned such as graphs used to represent a finite state machine. To solve this problem, we propose to extend the memory based Neural Turing Machine (NTM) with two novel additions. We allow for transitions between nodes to be influenced by information received from external environments, and we let the NTM learn the context of those transitions. We refer to this extension as the Conditional Neural Turing Machine (CNTM). We show that the CNTM can infer conditional transition graphs by empirically verifiying the model on two data sets: a large set of randomly generated graphs, and a graph modeling the information retrieval process during certain crisis situations. The results show that the CNTM is able to reproduce the paths inside the graph with accuracy ranging from 82,12 65,25

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

06/07/2020

Deep Graph Generators

Learning distributions of graphs can be used for automatic drug discover...
10/04/2018

Dual Convolutional Neural Network for Graph of Graphs Link Prediction

Graphs are general and powerful data representations which can model com...
07/23/2018

On Minimum Connecting Transition Sets in Graphs

A forbidden transition graph is a graph defined together with a set of p...
04/15/2021

A Tunable Model for Graph Generation Using LSTM and Conditional VAE

With the development of graph applications, generative models for graphs...
06/11/2020

Pointer Graph Networks

Graph neural networks (GNNs) are typically applied to static graphs that...
08/04/2020

On the complexity of graphs (networks) by information content, and conditional (mutual) information given other graphs

This report concerns the information content of a graph, optionally cond...
07/31/2018

A First Experiment on Including Text Literals in KGloVe

Graph embedding models produce embedding vectors for entities and relati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many important machine learning tasks involve data modeled as graphs such as classification and analysis of parse trees, social networks, knowledge graphs, transportation systems, and molecular structures. This typically involves learning the graph structure, including the relationship between the nodes, often based on partial graph observations. An example of partial observation is a family tree in which the connections: Davis is John’s father, and Alice is John’s sister are given. The learning algorithm need then to infer that David is Alice’s father. Leaning such relations is a challenging task particularly when the graph is cyclic, transitions from one node to another are conditioned, and the observable data does not contain all the edges of the graph.

Over the years, several machine learning approaches have been introduced to model graph data ranging from the simple Bayesian networks

[1]

to recurrent neural networks (RNN)

[2], and their more recent memory augmented versions: the Neural Turing Machine(NTM)[3] and Deferential Neural Computers(DNC)[4]. RNNs have been used to learn functions over sequences for more than three decades. The recent development of RNNs including the sequence-to-sequence paradigm [5], GTP-2 [6], content-based attention mechanism [7], and pointer networks[8], have gone a long way into solving significant challenges in sequence learning. Further, the NTM introduced an interaction between the network and an external memory which made it possible for RNN to be applied in new domains such as learning functions over trees, or graphs.

Despite impressive results shown by the RNN applied to learning family trees, sparse trees for natural language processing, and transportation systems, its application on network and graph data is still limited to simple cases. In this paper, we are interested in graphs where transition from a node to another is conditioned by an external input. A real world analogy to better understand conditional graphs is a model of the thought process of a person. Lets assume that the person is hungry. In our simple example, many states can follow but we narrow it down two possibilities. The first possibility is that he sits down for lunch. The other possibility is that he instead only has a small snack. The possible states are then either “eating lunch” or “eating a snack”. Whether he goes to any of those states is conditioned. It depends on many aspects, much of which he does not have control over, such as the time of day and his hunger level. In this case, the person undergoes a conditional transition from hungry to both “eating lunch” and “eating a snack”. Another example which we will take as a case study in this paper is the information gathering process during a crisis situation. A crisis is a complex event in which many variables change over time. The information needed by crisis responders largely varies from a crisis to another, and from a situation to another during the same crisis. Furthermore, a typical situation is that any new information provided will make the responders require even more information, e.g. receiving information that a fire has broken out leads to the needed information of where the fire is located. Such information gathering process can be modelled as a graph which will directly influence the decisions and interventions to take depending on the status of the crisis.

Hence, the information gathering process depends on the status of the crisis and the information gathered so far. One might argue that such a graph can be represented using a simple finite state machines (FSM) in which each state represent the needed information and the inputs are the statue of the crisis. However, the number of crisis types ranging form natural, man-made to technological and this number is constantly growing, the dynamic and evolving nature of each crisis all are factors that make the FSM designed to model the information graph infinitely big and exhausting to maintain and update. Nevertheless, if we assume that we have an FSM that represent the information graph of certain generic crises, the question becomes: can that be generalize to other crises, and other situation in a specific crisis? In this case, the problem becomes that of link prediction in the sense of inferring missing links from an observed FSM graph. As example, if we know that in a state for crisis we require information ( in the next state in the FSM), than, in the same state of a similar crisis , it is highly likely that we require the same information . To be more concrete, if there is a fire (crises ) and we are in a situation where we do not know the location (we are in state ), we require information about the location (information ). If, on the other hand, there is a shooting (crises ), and do not know the location (state ), we also need the location (information ).

In this paper, we propose an extension of the memory based neural Turing machine to model conditional transition graphs, we call it the Conditional Neural Turing Machine (CNTM). The aim is to allow the CNTM to change state, infer missing links in a conditional transition graph, and transit from a node to another based on input received from an external environment. First, to prove the concept we test our model on a set of randomly generated conditional transition graphs. Then, to practically test our approach, we consider the use case of a humanitarian crisis. We will show how the iterative information gathering process during a crisis can be modeled in a conditional graph, and we will use that graph to test our proposed model.

Ii Background

Ii-a State of the art

During the first years of artificial intelligence (AI), neural networks were considered an unpromising research direction. From the 1950s to the late 1980s, AI was dominated by symbolic approaches that attempted to explain how the human brain might function in terms of symbols, structures, and rules that could manipulate said symbols and structures[9]

. It was considered by many that the brain function could be implemented using a Turing machine. It was not until 1986 thanks to the work of Hinton that neural networks or the more commonly used term connectionism regained traction by exhibiting the ability for distributed representation of concepts

[10].

Despite this new capability, two significant criticisms were made against neural networks as tools capable of implementing intelligence. First, neural networks with fixed-size inputs were seemingly unable to solve problems with variable-size inputs like words and sentences. Second, neural networks seemed unable to do a symbol level representation i.e. to represent a state that has a combination of syntactic and semantic structure such as language.

The first challenge was answered with the creation of advancement in RNNs, in particular LSTM and GRU [11][12]. RNNs can now process variable-size inputs without needing to be constrained by a fixed frame rate. This advancement brought breakthrough and state-of-the-art results in core problems such as translation, parsing, and video captioning.

The second criticism (i.e. missing symbol level representation) is still a pending issue. However, attempts to solve that problem started from the early 1990s. In 1990, Touretzky designed BlotzCONS [13]

, a neural network model capable of creating and manipulating composite symbols structures (implemented using a linked list). BlotzCONS shows that a neural network can exhibit compositionality, and reference a complex structure via abbreviated tags -two properties that distinguish symbol processing from a low-level cognitive function such as pattern recognition. Later, Smolensky continued by defining a general neural network method capable of value/variable bindings

[14]. The methods permit a fully distributed representation of bindings and symbolic structures. At the same time, Pollak [15] designed a neural network architecture capable of automatically develop a distributed representation of compositional recursive data structure such lists and trees . In 1997, Hochreiter et al.[12]

developed the Long Short-Term Memory network (LSTM) mainly to solve the exploding/degeneration gradient problem, but the network exhibits also memory like features such as copy and forget. In the early 2000s, Plate

[16]

worked on the same problem of distributed representation of compositional structures by using convolutions to associate items of these structures represented by vectors. Graves et al.

[3] developed the neural Turing machine by giving a neural network an external memory and the capacity learn how to access it, read from it and writes to it. The NTM reconciles the connectionist approach and the symbolic approach with the idea that brain functions can be implemented using a Turing machine. Several extension of the NTM was developed over the past few years most notability the sparse NTM [17] and the DNC[4].

In this paper we extend the NTM to learn a partially observed graphs. The link prediction problem is related to inferring missing links form an observed network or graph. It is based on constructing a network of observable data and try to infer additional links that, while not present in the observed data, are likely to exist. For a graph where is the set of nodes, and

is the set of edges, the probability of choosing correctly at random an edge in a sparse graph (which is the case in most applications domain) is

. This makes the problem more difficult as the graph grows bigger. The link prediction problem is a common problem in social networks where the objective is to predict if two people are likely to connect (the friend suggestion feature in Facebook for example)[18]. Beyond social networks, link prediction have applications in bioinfomatics [19], e-commerce[20], and security [21]. Different approaches have been used for that purpose [22]: First, the non-Bayesian approach which trains a binary classification model on a set of extracted features. Second, the probabilistic approach which models the joint-probability among the entities in a network using a Bayesian models. Finally the linear algebraic approach which computes the similarity between the nodes in a network using similarity matrices.

All of the previously cited link prediction applications do not consider the case in which a the edges are conditioned by an external input: so called conditional graphs. A typical example of a conditional graph is the graph represented by an FSM, An FSM has a structure that exhibit a syntactic and semantic meaning, which often is cyclic, and with transition between nodes dependent on an external input. On the other hand, if we only have an FSM that only represents a part of the system and we want to complete this FSM by inferring new links making it fully descriptive of the system, then the problem becomes challenging to model using traditional link prediction solution because it introduces a new variable which is the external input. A typical example is a graph where some links are missing or not known which such as in crisis information retrieval problems introduced earlier. However, an FSM can be represented by a Turing machine. We will use this feature to design a neural Turing machine that can infer the kind of link present in an FSM.

Ii-B Neural Turing Machine

An NTM is composed of a neural network, called the controller, and a two-dimensional matrix often referred to as the memory (Figure 1). The controller is a feed forward or recurrent neural network that can read from and write to selected memory locations using read and write heads. Graves et al. [4] draw inspiration from the traditional Turing machine and use the term head to describe the vector the controller uses to access the selected memory location. The read head and the write head have the property described in equation 1.

(1)
Fig. 1: An NTM block

Let be the memory matrix at time t. In order to read values from , we need an addressing mechanism that dictates from where the head should read. A read operation is defined as the weighted sum over the memory rows :

(2)

The writing operation is composed of an erase operation, and an add operation. The erase operation deletes certain elements from the memory using an erase vector . The add operation replaces the deleted values with elements from an add vector . Thus, the writing operation can be expressed by the following equation where is the element-wise multiplication:

(3)

The calculations of the vectors and is done independently but using the same approach. Thus, in the remaining of this section will denote or interchangeably.

There are two types of addressing methods used to create the vector : content-based and location-based addressing. First, the content based addressing selects the weights based on the similarity between a row in the memory matrix and a given query generated by the controller:

(4)

where

is a similarity measure (typically cosine similarity),

a differentiable monotonic transformation (typically a softmax), and a key strength that amplifies or attenuate the precision of the focus.

Second, the location-based addressing goes through three different phases:

  1. An interpolation between the previous weights

    an the wights produced by the content based addressing using a gate (equation 5). This method is used when we want to have a combination of content based and location based addressing. It yields the weight

    (5)
  2. A shift operation that rotates the elements of the weights using a shift vector (equation 6). The shift produces the weights .

    (6)
  3. A sharpening that combats any leakage or dispersion of weights over time if the element of the shift vector s(t) are not sharp i.e. neither close to 1 or 0.

    (7)

All the parameters used to compute are calculated using neural layers that takes as input the output of the controller at time

. Given the constraint applied to some, we use different activation functions to compute these parameters: Rectifier linear for

, Sigmoid for , Softmax for , and Oneplus for .

Iii Theoretical approach

Iii-a problem definition

In a conditional transition graph, transitions from one node to another is conditioned by an external knowledge. Figure 2 shows a simple example of a such a graph where the transition from node A to D is performed when the proposition C is true, and from A to B otherwise. Such graph are used to represent an FSM. It is composed of:

Fig. 2: Example of a simple conditional graph
  • A finite set of node or states

  • A finite set of input. can be a set of logical proposition that can be true of false as presented in Figure 2, or a vector of logical propositions. In the context of this paper, is a set of variables .

  • A transition from a node to the next.

  • A final node of state

In this paper, we will model two important parts of a conditional transition graph: The first part is the input C that triggers that transition. The second part is the transition .

To produce C, we introduce what we call an environment. The environment’s role is to produce an input given the current node in the graph. As an example, a node in the graph can represent a database query. The environment in this case is the database that, given the query, will return a set of data (the condition C). That data is then used to select the next node or query in this case. It is worth noting that the environment can be any simple or complex system, such a deterministic, or a real world system (e.g. database). In this paper, we consider the environment be random: Consider that from node , we can transition to nodes , , …, or . Each transition is conditioned with ,…, and

respectively. A random environment E gives a probability distribution over all the possible value of C:

.

The problem of learning the transition

can be expressed as learning a conditional probability distribution over the set of sates

knowing the current state an the input form the environment:

(8)

In the next section, we will detail how the CNTM learns such a probability.

Iii-B Neural Turing Machine for conditional graphs

This section extends the existing NTM for conditional graphs. We call this extension the conditional neural Turing machine (CNTM), which is the major contributions of the paper. The overall objective is to design a neural Turing machine that can learn conditional transition graphs.

In Section III-A, we introduced the environment which randomly produces an input . The input produced by the environment can be extended to include the current node in the graph . This extension produces what we call a context vector which the input of the transition .

The first step of the CNTM is to produce a coding given the current context and the sequence of previous contexts. The idea is to use the NTM attention mechanisms (content based and location based addressing) to retrieve a representation of the context. The output of the NTM block is implemented using a neural layer that takes as input the output of the controller and the read vector and calculates a linear combination between them. Thus the activation function for that output layer is a linear activation:

(9)

In the second step, the transition form a node to the other is implemented using the output layer. The output layer’s role is to produce the next node in the graph given the previous set of coding of the context produced by the NTM block. At each time step , the output layer takes as input . Its output at time is a a probability distribution over the nodes of the graph , where is the parameters of the output layer. It is implemented using a LSTM with a Softmax output layer.

The training phase is divided into two phases: A description phase, and a answer phase. During the description phase, the input () was presented to the CNTM in random order. The target state were presented only during the answer phases with no inputs. For a sequence of contexts and a sequence of targets both of length

, the parameters of the model are trained to maximize the cross entropy loss function:

(10)

Where is an indicator function whose value is 1 during answer phases and 0 otherwise.The overall model is presented in Figure 3. The CNTM is differentiable from end to end and its parameters can be optimized using stochastic gradient decent, or other standard neural network optimizers.

Fig. 3: Neural network for conditional graph modeling (CNTM)

Iv Experimental results

In the conditional graph inference task, the input of the CNTM consist of a triple encoding the current state, the input from the environment, and the target state. Each element is coded using a binary vector with a vector of all zeros reserved for a special undefined element. We set the length of the vector to 30 so the input of the CNTM is a 90 elements vector.

The experiment CNTM works in two phases, training and validation. At the training phase, the input to the network was an incomplete triple with an unspecified target state: (current state, input, undefined). The network has to infer the target of each triple. For evaluation, the first input to the network was an incomplete triple with an unspecified target state. In the rest of the time steps, the input triples contains only the input from the environment, with source and target state undefined. To succeed, the network had to infer the destination of each triple, and remember it as the implicit current state for the next time step. we assume that the output of the CNTM is correct if the produced graph is in the complete graph. This means that the CNTM produces a correct graph that may be the complete or a correct sub-graph of the known entire graph.

For the output we used an LSTM with with 256 hidden units, a feed froward network for the NTM controller of 128 units, a memory of

. All the weights and the memory were initialized using a Xavier initialization. The CNTM is trained with RMSprop stochastic gradient descent with a learning rate of 0.001 a batch size of 128.The implementation of the CNTM as long as as the data sets is available here

https://github.com/mehdi-mbl/NTM

Iv-a Random graphs

We train and test out model with two datasets. The first dataset is used to prove the concept, and is composed of randomly generated sparse conditional graphs. Here we compiled 6 different datasets. Each dataset contains 1000 different conditional graphs of 10, 20, 40, 60, 80, 100 nodes each.

Data CNTM LSTM Graph distance
Accuracy accuracy accuracy
Randomly generated 82.12% 79.51% 19.45%
graphs with 10 nodes
Randomly generated 78.54% 70.23% 18.03%
graphs with 20 nodes
Randomly generated 72.62% 62.46% 15.78%
graphs with 40 nodes
Randomly generated 70.78% 57.93% 13.59%
graphs with 60 nodes
Randomly generated 67.61% 50.89% 10.30%
graphs with 80 nodes
Randomly generated 65.25% 42.47% 6.34%
graphs with 100 nodes
Crisis data: 50 nodes 78,59% 67.29% 16.46%
TABLE I: Results on the randomly generated graphs and the crisis data

It is important to note here that during the training phase, we only train the algorithm on graphs containing 70% of the links in the randomly generated graphs. Table I shows the accuracy of the CNTM compared to the vanilla Graph distance[23], and the LSTM[12] in inferring the correct links for randomly generated conditional transition graphs. The table shows a clear advantage of using the CNTM over the other approaches. As can be expected, the bigger the graph (in number of nodes), the less accurate the predictions become. For a graph with 100 nodes the accuracy is 65.25%. However, as the number of nodes grows, the gap in performacnce betwen the CNTM and the other approaches grows exponentially: The gap between the CNTM and the LSTM starts with 2.6% for 10 nodes graph, and it grows t0 reach approximately 23% for 100 nodes graphs. Note if we randomly pick a 10 nodes-long path from the same graph the change of getting a correct pick is approximately .

Figures 4 to 6 show box-plots comparing the result produced by the CNTM with three other approaches on all the randomly generated context graphs: a random predictor, graph distance, and LSTM respectively. Figure 4 compare the CNTM, LSTM and graph distance with a random predictor as a baseline. It shows that all these approaches perform at least 10% better on average then the random predictor. The CNTM is on average approximately 70% better than a random predictor. Figure 5 compares the CNTM and the LSTM with the graph distance as a baseline. It illustrates that both these approaches perform on average 42% better then the graph distance. Finally, Figure 6

uses the LSTM as a baseline. It shows that the CNTM performs 10% better then the LSTM on average. It is important to point here that the variance in performance of the CNTM is much lower than the other approaches.

Fig. 4: Comparison of different link predictor with the random predictor as the baseline.
Fig. 5: Comparison of different link predictor with the graph distance predictor as the baseline.
Fig. 6: Comparison of different link predictor with the LSTM predictor as the baseline.

Iv-B Case study

The second data set is a much more use case to model the information needs by emergency management services during a crisis (Figure 9 presents a portion of that graph). Emergency management is chosen to prove the practical applicability of the CNTM since this is a scenario is particularly challenging. Emergency personnel rely on correct information in dynamic and chaotic situations. Further, the graph is highly conditioned as much of the emergency response relies upon previous information such as type of crisis and location. For example, emergency personnel need to respond differently if a crises is a public disturbance or a fire outbreak. The type of response is conditioned on the type of crises. In addition, emergency management services are well documented in the literature.

The environment graph is compiled using information available in the literature, particularly in three areas: Fire, extreme weather, and public disturbance. It is obvious that emergency personnel act differently in these three scenarios. For a fire emergency, the sub-graph is extracted from the work of Nunvath et. al. [24] who did an extensive interview of firefighters about the type of information they need during an indoor fire crisis. For extreme weather, the sub-graph was extracted form the work of Ben Lazreg et. al. [25] who collected personnel form police and municipality to gather the type and flow of information they need during extreme weather crisis. Finally, The public disturbance sub-graph as well as the rest of the graph was vetted by two policeman from Oslo police station who are expert in riots, demonstrations and public disturbance control. The nodes in the graph present the type of information needed by emergency manger during a crisis. The transition from a node 1 to node 2 is conditioned on weather the information that node 1 requires is answered or not. The answers are provided by the environment.

Similarly to the randomly generated graph, we only train the algorithm on graphs containing 70% of the links in the crisis graphs. The accuracy of the network in inferring the correct links for the crisis graphs is 78,59%. It is in the same range of the accuracy obtained using a randomly generated graph of 20 nodes. This might be due to the fact that, in randomly generated graphs, we average the results over 100 different graphs. Some of the graphs might perform worst or better then the average depending on randomly generated edges. The crisis graph, on the other hand, is a well defined graph presenting logical edges and connections.

Fig. 7: Example of results provided by the model

Figure 7

shows an example of path in the graph proposed by the network from the crisis information graph. In this test, the environment is given externally by what is the correct and wrong transitions in the emergency graph. Note that the third path in the figure contains a link not available in the full graph therefore classified as a wrong. It proposes a transition from location to affected people. The link from the location node in the context of indoor fire is not available in the training data. However, a transition from location to affected peoples is present in the training data in the context of extreme weather. Since both extreme weather and indoor fire are sudden onset disasters, the CNTM was able to predict a link between the location and the affected people nodes in the context of indoor fire.

Fig. 8: Comparison of expert opinion with the CNTM predictor as the baseline.

We tried to further investigate the links and path predicted by the CNTM not available in our original crisis graph. We have proposed those graphs to two expert from the police and asked them to rank their relevance on a scale from 1 to 4; 1 being not relevant, and 4 relevant. Figure 8 shows a box plot of the distribution of the expert evaluation with the CNTM as a baseline. The figure shows that the majority of the expert evaluation are within the interval which mean that the expert assign a grade of 3 or 4 to the information paths predicted by the CNTM.

Fig. 9: Conditional graph for for information needed by crisis emergency management

V Conclusion

This paper presents a neural network able to model conditional graphs. The network is based on Neural Turing Computer which we extend to understand context and propose the Conditional Neural Turing Computer (CNTM).

A conditional graph is a graph in which the transition from a node to the other is conditioned by a certain context. We showed that such graphs can be divided into two part: an environment and transition. The environment is a random generator of inputs. To present the transition, we used the CNTM. We carried out empirical tests on two data sets: a large set of randomly generated conditional graphs, and a graph modeling the information retrieval process during certain crisis situations. The results showed that the CNTM is able to reproduce the paths inside the graph with accuracy ranging from 82,12% for 10 nodes graphs to 65,25% for 100 nodes graphs.

References

  • [1]

    R. Gupta and V. C. Pedro, “Knowledge representation and bayesian inference for response to situations,” in

    AAAI 2005 Workshop on Link Analysis, 2005.
  • [2] S. Ahn, H. Choi, T. Pärnamaa, and Y. Bengio, “A neural knowledge language model,” arXiv preprint arXiv:1608.00318, 2016.
  • [3] A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv preprint arXiv:1410.5401, 2014.
  • [4] A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwińska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou et al., “Hybrid computing using a neural network with dynamic external memory,” Nature, vol. 538, no. 7626, p. 471, 2016.
  • [5] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
  • [6] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners.”
  • [7] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
  • [8] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in Neural Information Processing Systems, 2015, pp. 2692–2700.
  • [9] J. A. Fodor and Z. W. Pylyshyn, “Connectionism and cognitive architecture: A critical analysis,” Cognition, vol. 28, no. 1-2, pp. 3–71, 1988.
  • [10] G. E. Hinton et al., “Learning distributed representations of concepts,” in Proceedings of the eighth annual conference of the cognitive science society, vol. 1.   Amherst, MA, 1986, p. 12.
  • [11] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
  • [12] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [13] D. S. Touretzky, “Boltzcons: Dynamic symbol structures in a connectionist network,” Artificial Intelligence, vol. 46, no. 1-2, pp. 5–46, 1990.
  • [14]

    P. Smolensky, “Tensor product variable binding and the representation of symbolic structures in connectionist systems,”

    Artificial intelligence, vol. 46, no. 1-2, pp. 159–216, 1990.
  • [15] J. B. Pollack, “Recursive distributed representations,” Artificial Intelligence, vol. 46, no. 1-2, pp. 77–105, 1990.
  • [16] T. A. Plate, “Holographic reduced representations,” IEEE Transactions on Neural networks, vol. 6, no. 3, pp. 623–641, 1995.
  • [17] J. Rae, J. J. Hunt, I. Danihelka, T. Harley, A. W. Senior, G. Wayne, A. Graves, and T. Lillicrap, “Scaling memory-augmented neural networks with sparse reads and writes,” in Advances in Neural Information Processing Systems, 2016, pp. 3621–3629.
  • [18] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” Journal of the American society for information science and technology, vol. 58, no. 7, pp. 1019–1031, 2007.
  • [19] E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing, and T. Jaakkola, “Mixed membership stochastic block models for relational data with application to protein-protein interactions,” in Proceedings of the international biometrics society annual meeting, vol. 15, 2006.
  • [20] S.-L. Huang, “Designing utility-based recommender systems for e-commerce: Evaluation of preference-elicitation methods,” Electronic Commerce Research and Applications, vol. 10, no. 4, pp. 398–407, 2011.
  • [21]

    M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link prediction using supervised learning,” in

    SDM06: workshop on link analysis, counter-terrorism and security, 2006.
  • [22] M. Al Hasan and M. J. Zaki, “A survey of link prediction in social networks,” in Social network data analytics.   Springer, 2011, pp. 243–275.
  • [23]

    H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu, “Scalable proximity estimation and link prediction in online social networks,” in

    Proceedings of the 9th ACM SIGCOMM conference on Internet measurement.   ACM, 2009, pp. 322–335.
  • [24] V. Nunavath, A. Prinz, and T. Comes, “Identifying first responders information needs: supporting search and rescue operations for fire emergency response,” International Journal of Information Systems for Crisis Response and Management (IJISCRAM), vol. 8, no. 1, pp. 25–46, 2016.
  • [25] M. Ben Lazreg, N. R. Chakraborty, S. Stieglitz, T. Potthoff, B. Ross, and T. A. Majchrzak, “Social media analysis in crisis situations: Can social media be a reliable information source for emergency management services?” 2018.