1 Introduction
While successful, deep networks have a few important limitations. Apart from the key issue of interpretability, the other major limitation is the requirement of a flat inputs (vectors, matrics, tensors), which limits applications to tabular,
propositional representations. On the other hand, symbolic and structured representations [14, 7, 13, 38, 1] have the advantage of being interpretable, while also allowing for rich representations that allow for learning and reasoning with multiple levels of abstraction. This representability allows them to model complex data structures such as graphs far more easily and interpretably than basic propositional representations. While expressive, these models do not incorporate or discover latent relationships between features as effectively as deep networks.Consequently, there has been focus on achieving the dream team of logical and statistical learning methods such as relational neural networks [19, 42]. While specific architectures differ, these methods generally employ handcoded relational rules
or Inductive Logic Programming (ILP,
[24]) to identify the domain’s structural rules; these rules are then used with the observed data to unroll and learn a neural network. We improve upon these methods in two specific ways: (1) we employ a rule learner that has been recently successful to automatically extract interpretable rules that are then employed as hidden layer of the neural network; (2) we exploit the notion of parameter tying from the perspective of statistical relational learning models that allow multiple instances of the same rule share the same parameter. These two extensions significantly improve the adaptation of neural networks (NNs) for relational data.We employ Relational Random Walks [22] to extract relational rules from a database, which are then used as the first layer of the NN. These random walks have the advantages of being learned from data (instead of timeconsumingly handcoded), and interpretable (as walks are rules in a database schema). Given evidence (facts), relational random walks are instantiated (grounded); parameter tying ensures that groundings of the same random walk share the same parameters with far fewer network parameters to be learned during training.
For combining outputs from different groundings of the same clause, we employ combination functions [30, 16]. For instance, given a rule: , , the   pair could have coauthored papers, while the  pair could have coauthored publications (). Combination functions are a natural way to compare such relational features arising from rules. Our network handles this in two steps: first, by ensuring that all instances (papers) of a particular pair share the same weights. Second, by combining predictions from each of these instances (papers) using a combination function. We explore the use of Or, Max and Average
combination functions. Once the network weights are appropriately constrained by parameter tying and combination functions, they can be learned using standard techniques such as backpropagation.
We make the following contributions: (1) we learn a NN that can be fully trained from data and with no significant engineering, unlike previous approaches; (2) we combine the successful paradigms of relational random walks and parameter tying from SRL methods; this allows the resulting NN to faithfully model relational data while being fully learnable; (3) we evaluate the proposed approach against recent relational NN approaches and demonstrate its efficacy.
2 Related Work
Lifted Relational Neural Networks. Our work is closest to Lifted Relational Neural Networks (LRNN) [42] due to Šourek et al., in terms of the architecture. LRNN uses expert handcrafted relational rules as input, which are then instantiated (based on data) and rolled out as a ground network. While at a highlevel, our approach appears similar to the LRNN framework, there are significant differences. First, while Šourek et al., exploit tied parameters across examples within the same rule, there is no parameter tying across multiple instances; our model, however, ensures parameter tying of multiple ground instances of the rule (in our case, a relational random walk). Second, since they adopt a fuzzy notion, their system supports weighted facts (called ground atoms in logic literature). We take a more standard approach and our observations are Boolean. Third, while the previous difference appears to be limiting in our case, note that this leads to a reduction in the number of network weight parameters.
Ŝourek et al., have extended their work to learn network structure using predicate invention [45]; our work learns relational random walks as rules for the network structure. As we show in our experiments, NNs cannot only easily handle such large number of such random walks, but can also use them effectively as a bag of weakly predictive intermediate layers capturing local features. This allows for learning a more robust model than the induced rules, which take a more global view of the domain. Another recent approach is due to Kazemi and Poole [19]
, who proposed a relational neural network by adding hidden layers to their Relational Logistic Regression
[18] model. A key limitation of their work is that they are restricted to unary relation predictions, that is, they can only predict attributes of objects instead of relations between. In contrast, ours is a more general framework in that can be used to predict relations between objects.Much of this recent work is closely related to a significant body of research called neuralsymbolic integration [12]
, which aims to combine (arguably) two of the oldest formalisms in machine learning: symbolic representations with neural learning architectures. Some of the earliest systems such as KBANN
[43] date back to the early 90s; KBANN also rolls out the network architecture from rules, though it only supports propositional rules. Current work, including ours, instead explores relational rules which serve as templates to roll out more complex architectures. Other recent approaches such as CILP++ [11] and Deep Relational Machines [26] incorporate relational information as network layers. However, such models propositionalize relational data into flatfeature vector and hence, cannot be seen as truly relational models. A rather distinctive approach in this vein is due to Hu et al. [15], where two independent networks incorporating rules and data are trained together. Finally, NNs have also been trained to approximate ILP clause evaluation [8], perform SLDresolution in firstorder logic [21], and approximate entailment operators in propositional logic [10].Relational Random Walks. The Path Ranking Algorithm (PRA, [22]) is a key framework, where a combination of random walks replaces exhaustive search in order to answer queries. Recently, Das et al. [6]
considered random walks between query entities to perform composition of embeddings of relations on each walk with recurrent neural networks. DeepWalks
[34] performs random walks on graphs by treating each node as a word, which results in learning embeddings for each node of graph. Kaur et al.[17]consider relational random walks to generate count and existential features to train a relational restricted Boltzmann machine
[23]. This feature transformation induces propositionalization that could potentially result in loss of information, as we show in our experiments.Tensor Based Models. Recently, several tensorbased models [31, 4, 41, 3, 47] have been proposed to learn embeddings of objects and relations. Such models have been very effective for largescale knowledgebase construction. However, they are computationally expensive as they learn parameters for each object and relation in the knowledge base. Furthermore, the embedding into some ambient vector space makes the models more difficult to interpret. Though rule distillation can yield humanreadable rules [48], it is another computationally intensive postprocessing step, which limits the size of the interpreted rules.
Other Models. Several NNs have been utilized with relational databases schemas [2, 37]. These models differ on how they handle 1to joins, cyclicity, and indirect relationships between relations. However, they all learn one network per relation, which makes them computationally expensive. In the same vein, graphbased models take graph structure into consideration during training. Pham et al. [35] perform collective classification via a deep neural network where connections between adjacent layers are established according to given graph structure. Niepert et al. [32] proposed an algorithm that prepares the relational data to be directly input to standard convolutional network by assigning an ordering to enable feature convolution. Scarselli et al. [39]
proposed Graph Neural Networks in which one neural network is installed at each node of the graph, which is trained by obtaining input from all the incoming edges of graph. One neural network per node makes the model computationally very expensive Finally, with the rapid growth of deep learning, relational counterparts of most of existing connectionist models have been also proposed
[40, 33, 46, 49].3 Neural Networks with Relational Parameter Tying
We first introduce some notation for relational logic, which is used for relational representation, with the domain being represented using constants, variables and predicates. We adopt the following conventions: (1) constants used to represent entities in the domain are written in lowercase (e.g., , ); (2) variables and entity types are capitalized (e.g., , ); and (3) relations and predicate symbols between entities and attributes are represented as . A grounding is a predicate applied to a tuple of terms (i.e., either a full or partial instantiation), e.g. , is a partial instantiation.
Rules are constructed from atoms using logical connectives (, ) and quantifiers (, ). Due to the use of relational random walks, the relational rules that we employ are universally conjunctions of the form , where the head is the target of prediction and the body corresponds to conditions that make up the rule (that is, each literal in the body is a predicate ). We do not consider negations in this work.
An example rule could be . This rules states that if a is a part of the project that the works on, then the is advised by that . The body of the rule is learned as a random walk that starts with and ends with . Such a random walk represents a chain of relations that could possibly connect a to a and is a relational feature that could help in the prediction. The rule head is the target that we are interested in predicting. Since these rules are essentially “soft” rules, we can also associate clauses with weights, i.e., weighted rules: .
A relational neural network is a set of weighted rules describing interactions in the domain . We are given a set of atomic facts , known to be true (the evidence) and labeled relational training examples . In general, labels can take multiple values corresponding to a multiclass problem. We seek to learn a relational neural network model to predict a relation, given relational examples , that is: .
Given: Set of instances , relation, relational data set ; Construct (structure learning): , relational random walk rules (relational feature describing the network structure of ); Train (parameter learning): , rule weights via gradient descent with rulebased parameter tying to identify a sparse set of network weights of
Example
The movie domain contains the entity types (variables) , and . In addition there are relations (features): , and . The domain also has relations for entity resolution: and . The task is to predict if worked under , with the target predicate (label): .
3.1 Generating Lifted Random Walks
The core component of a neural network model is the architecture, which determines how the various neurons are connected to each other, and ultimately how all the input features interact with each other. In a relational neural network, the architecture is determined by the
domain structure, or the set of relational rules that determines how various relations, entities and attributes interact in the domain as shown earlier with the example. While previous approaches employed carefully handcrafted rules, we, instead, use relational random walks to define the network architecture and model the local relational structure of the domain. A similar approach was also used by Kaur et al [17], though the random walk features were used to instantiate a restricted Boltzmann machine, which has a far more limited architecture and their work is not lifted since it instantiates the entire network before learning.Relational data is often represented using a lifted graph, which defines the domain’s schema; in such a representation, a relation is a predicate edge between two type nodes: . A relational random walk through a graph is a chain of such edges corresponding to a conjunction of predicates. For a random walk to be semantically sound, we should ensure that the input type (argument domain) of the th predicate is the same as the output type (argument range) of the th predicate.
Example (continued)
The body of the rule
can be represented graphically as
This is a lifted random walk between two entities in the target predicate, . It is semantically sound as it is possible to chain the second argument of a predicate to the first argument of the succeeding predicate. This walk also contains an inverse predicate , which is distinct from (since the argument types are reversed).
We use pathconstrained random walks [22] approach to generate lifted random walks , . These random walks form the backbone of the lifted neural network, as they are templates for various feature combinations in the domain. They can also be interpreted as domain rules as they impart localized structure to the domain model, that is, they provide a qualitative description of the domain. When these rules, or lifted random walks have weights associated with them, we are then able to endow the rules with a quantitative influence on the target predicate. We now describe a novel approach to network instantiation using these randomwalkbased relational features. A key component of the proposed instantiation is rulebased parameter tying, which reduces the number of network parameters to be learned significantly, while still effectively maintaining the quantitative influences as described by the relational random walks.
3.2 Network Instantiation
The relational random walks () generated in the previous subsection are the relational features of the lifted relational neural network, . Our goal is to unroll and ground the network with several intermediate layers that capture the relationships expressed by the random walks. A key difference in network construction between our proposed work and recent approaches such as that of Šourek et al., [44] is that we do not perform an exhaustive grounding to generate all possible instances before constructing the network. Instead, we only ground as needed leading to a much more compact network. We unroll the network in the following manner (cf. Figure 1).
Output Layer: For the , which is also the head in all the rules , introduce an output neuron called the target neuron,
. With onehot encoding of the target labels, this architecture can handle multiclass problems. The target neuron uses the
softmax activation function
. Without loss of generality, we describe the rest of the network unrolling assuming a single output neuron.Combining Rules Layer: The target neuron is connected to lifted rule neurons, each corresponding to one of the lifted relational random walks, . Each rule is a conjunction of predicates defined by random walks:
(1) 
and corresponds to the lifted rule neuron . This layer of neurons is fully connected to the output layer to ensure that all the lifted random walks (that capture the domain structure) influence the output. The extent of their influence is determined by learnable weights, between and the output neuron .
In Fig. 1, we see that the rule neuron is connected to the neurons ; these neurons correspond to instantiations of the randomwalk . The lifted rule neuron aims to combine the influence of the groundings/instantiations of the randomwalk feature that are true in the evidence. Thus, each lifted rule neuron can also be viewed as a rule combination neuron. The activation function of a rule combination neuron can be any aggregator or combining rule [30]. This can include value aggregators such as weighted mean, max0 or distribution aggregators
(if inputs to the this layer are probabilities) such as
NoisyOr. Many such aggregators can be incorporated into the combining rules layer with appropriate weights () and activation functions of the rule neurons. For instance, combining rule instantiations with a weighted mean will require learning , with the nodes using unit functions for activation. The formulation of this layer is much more general and subsumes the approach of Šourek et al [44], which uses a max combination layer.Grounding Layer: For each instantiated (ground) random walk , we introduce a ground rule neuron, . This ground rule neuron represents the th instantiation (grounding) of the body of the th rule, : (cf. eqn 1). The activation function of a ground rule neuron is a logical AND (); it is only activated when all its constituent inputs are true (that is, only when the entire instantiation is true in the evidence).
This requires all the constituent facts to be in the evidence. Thus, the th ground rule neuron is connected to all the fact neurons that appear in its corresponding instantiated rule body. A key novelty of our approach is regarding relational parameter tying: the weights of connections between the fact and grounding layers are tied by the rule these facts appear in together. This is described in detail further below.
Input Layer: Each instantiated (grounded) predicate that appears as a part of an instantiated rule body is a fact, that is . For each such instantiated fact, we create a fact neuron , ensuring that each unique fact in evidence has only one single neuron associated with it. Every example is a collection of facts, that is, example . Thus, an example is input into the system by simply activating its constituent facts in the input layer.
Relational Parameter Tying: The most important thing to note about this construction is that we employ rulebased parameter tying for the weights between the grounding layer and the input/facts layer. Parameter tying ensures that instances corresponding to an example all share the same weight if they occur in the same lifted rule . The shared weights are propagated through the network in a bottomup fashion, ensuring that weights in the succeeding hidden layers are influenced by them.
Our approach to parameter tying is in sharp contrast to that of Šourek et al., [44], who learn the weights of the network edges between the output layer and the combining rules layer. Furthermore, they also use fuzzy facts (weighted instances), whereas in our case, the facts/instances are Boolean, though their edge weights are tied. Our approach also differs from that of Kaur et al., [17] who also use relational random walks. From a parametric standpoint, Kaur et al., used relational random walks as features for a restricted Boltzmann machine, where the instance neurons and the rule neurons form a bipartite graph. Thus, the relational RBM formulation has significantly more edges, and commensurately many more parameters to optimize during learning.
Example (continued, see Fig. 2)
Consider two lifted random walks and for the target predicate
Note that while the inverse predicate is syntactically different from (argument order is reversed), they are both semantically same. The output layer consists of a single neuron corresponding to the binary target . The lifted rule layer (also known as combining rules layer) has two lifted rule nodes corresponding to rule and corresponding to rule . These rule nodes combine inputs corresponding to instantiations that are true in the evidence. The network is unrolled based on the specific training example, for instance: . For this example, the rule has two instantiations that are true in the evidence. Then, we introduce a ground rule node for each such instantiation:
The rule has only one instantiation, and consequently only one node:
The grounding layer consists of ground rule nodes corresponding to instantiations of rules that are true in the evidence. The edges have weights that depend on the combining rule implemented in . In this example, the combining rule is average, so we have and . The input layer consists of atomics fact in evidence: . The fact nodes and appear in the grounding and are connected to the corresponding ground rule neuron . Finally, parameters are tied on the edges between the facts layer and the grounding layer. This ensures that all facts that ultimately contribute to a rule are pooled together, which increases the influence of the rule during weight learning. This, in turn, ensures that a rule that holds strongly in the evidence gets a higher weight.
Once the network is instantiated, the weights and can be learned using standard techniques such as backpropagation. We denote our approach Neural Networks with Relational Parameter Tying (NNRPT). The tied parameters incorporate the structure captured by the relational features (lifted random walks), leading to a network with significantly fewer weights, while also endowing the it with semantic interpretability regarding the discriminative power of the relational features. We now demonstrate the importance of parameter tying and the use of relational random walks as compared to previous frameworks.
4 Experiments
Our empirical evaluation aims to answer the following questions explicitly^{1}^{1}1 https://github.com/navdeepkjohal/NNRPT: Q1:] How does compare to the stateoftheart SRL models i.e., what the value of learning a neural net over standard models? Q2: How does compare to propositionalization models i.e., what is the need for parameterization of standard neural networks? Q3: How does compare to other relational neural networks in literature?
Data Sets:
We use five standard data sets to evaluate our algorithm (see Table 1): UwCse. [38] is a standard data set that consists of predicates and relations such as , , , and etc. The data set contains information from different areas of computer science about professors, students and courses, and the task is to predict the relationship between a professor and a student. Imdb was first created by Mihalkova and Mooney [27] and contains nine predicates such as , , , and . We predict whether an actor has a director. Cora is a citation matching data set modified by Poon and Domingos [36]. It contains predicates , , , , , , , and . The task is to predict if one venue is as another.
Mutagenesis [25] was originally used to predict whether a compound is mutagenetic or not. It consists of properties of compounds, their constituent atoms and the type of bond that exists between atoms. We performed relation prediction of whether an atom is a constituent of a given molecule or not (). Sports consists of facts from the sports domain crawled by the NeverEnding Language Learner (NELL, [5]) including details of players, sports, individual plays, league information etc. The goal is to predict which sport a particular team plays.
Domain  Target  #Facts  #Pos  #Neg  #RW  #Samp/RW 
UwCse  2817  90  180  2500  1000  
Mutagenesis  29986  1000  2000  100  100  
Cora  31086  2331  4662  100  100  
Imdb  914  305  710  80    
Sports  7824  200  400  200  100 
Baselines and Experimental Details:
To answer Q1, we compare
with the more recent and stateoftheart relational gradientboosting methods,
[29],  [20], and relational restricted Boltzmann machines ,  [17]. As the random walks chain binary predicates in our model, we convert unary and ternary predicates into binary predicates for all data sets. Further, to maintain consistency in experimentation, we use the same resulting predicates across all our baselines as well. We run  and  with their default settings and learn trees for each model. Also, we train  and  according to the settings recommended in [17].For , we generate random walks by considering each predicate and its inverse to be two distinct predicates. Also, we avoid loops in the random walks by enforcing sanity constraints on the random walk generation. We consider random walks for Mutagenesis, Cora, random walks for Imdb, random walks for Sports and random walks for UwCse as suggested by Kaur et al [17] (see Table 1). Since we use a large number of random walks, exhaustive grounding becomes prohibitively expensive. To overcome this, we sample groundings for each random walk for large data sets. Specifically, we sample groundings per random walk per example for Cora, Sports, Mutagenesis, and groundings per random walk per example for UwCse (see Table 1).
For all experiments, we set the positive to negative example ratio to be for training, set combination function to be average and perform fold cross validation. For , we set the learning rate to be , batch size to
, and number of epochs to
. We train our model with regularized AdaGrad [9]. Since these are relational data sets where the data is skewed, AUCPR and AUCROC are better measures than likelihood and accuracy.
To answer Q2, we generated flat feature vectors by Bottom Clause Propositionalization (BCP, [11]), according to which one bottom clause is generated for each example. BCP considers each predicate in the body of the bottom clause as a unique feature when it propositionalizes bottom clauses to flat feature vector. We use Progol [28] to generate these bottom clauses. After propositionalization, we train two connectionist models: a propositionalized restricted Boltzmann machine () and a propositionalized neural network (). The NN has two hidden layers in our experiments, which makes  model a modified version of CILP++ [11] that had one hidden layer. The hyperparameters of both the models were optimized by line search on validation set.
To answer Q3, we compare our model with Lifted Relational Neural Networks (, [44]). To ensure fairness, we perform structure learning by using PROGOL [28] and input the same clauses to both and . PROGOL learned clauses for Cora, clauses for Imdb, clauses for Sports, clauses for UwCse and clauses for Mutagenesis in our experiment.




Data Set  Measure          




UwCse  AUCROC  0.9730.014  0.9680.014  0.9750.013  0.9680.011  0.9590.024 
AUCPR  0.9310.036  0.9160.035  0.9230.056  0.9240.040  0.8960.063  


Imdb  AUCROC  0.9550.046  0.9440.070  1.0000.000  0.9970.006  0.9840.025 
AUCPR  0.8630.112  0.8390.169  1.0000.000  0.9920.017  0.9510.082  


Cora  AUCROC  0.8950.183  0.8350.035  0.9840.009  0.8670.041  0.9520.043 
AUCPR  0.8330.259  0.7990.034  0.9480.042  0.8250.050  0.8990.070  


Mutag.  AUCROC  0.9990.000  0.9990.000  0.9990.000  0.9980.001  0.9810.024 
AUCPR  0.9990.000  0.9990.000  0.9990.000  0.9970.002  0.9700.039  


Sports  AUCROC  0.8010.026  0.8060.016  0.7600.016  0.6560.071  0.7800.026 
AUCPR  0.6700.028  0.6520.032  0.6340.020  0.6480.085  0.6680.070  







Data Set  Measure      




UwCse  AUCROC  0.9510.041  0.8680.053  0.9590.024 
AUCPR  0.8600.114  0.8690.033  0.8960.063  


Imdb  AUCROC  0.7800.164  0.5400.152  0.9840.025 
AUCPR  0.3670.139  0.5360.231  0.9510.082  


Cora  AUCROC  0.8010.017  0.6700.064  0.9520.043 
AUCPR  0.6470.050  0.6580.064  0.8990.070  


Mutag.  AUCROC  0.9910.003  0.9450.019  0.9810.024 
AUCPR  0.9950.001  0.9730.012  0.9700.039  


Sports  AUCROC  0.6640.021  0.5430.037  0.7800.026 
AUCPR  0.5320.041  0.4990.065  0.6680.070  



Results:
Table 2 compares our to , ,  and  to answer Q1. As we see, is significantly better than  for Cora and Sports on both AUCROC and AUCPR, and performs comparably to the other data sets. It also performs better than ,  on Imdb and Cora data sets, and comparably on other data sets. Similarly, it performs better than  on Sports, both on AUCROC and AUCPR and comparably on other data sets. Broadly, Q1 can be answered affirmatively in that performs comparably to or better than stateoftheart SRL models.
Table 3 shows the comparison of with two propositionalization models:  and  in order to answer Q2. performs better than  on all the data sets except Mutagenesis, where the two models have similar performance. also performs better than  on all data sets. It should be noted that BCP feature generation sometimes introduces a large positivetonegative example skew (for example, in the Imdb data set), which can sometimes gravely affect the performance of the propositional model, as we observe in Table 3. This emphasizes the need for designing models that can handle relational data directly and without propositionalization; our proposed model as an effort in this direction. Q2 can now be answered affirmatively: that performs better than propositionalization models.
Table 4 compares the performance of and when both use clauses learned by PROGOL [28]. performs better on UwCse, Sports evaluated using AUCPR. This result is especially significant because these data sets are considerably skewed. also outperforms on Cora and Mutagenesis. Lastly, has comparable performance on Imdb on both AUCROC and AUCPR. The reason for this big performance gap between the two models on Cora is likely because could not build effective models with the fewer number of clauses (i.e. four) typically learned by PROGOL. In contrast, even with very few clauses, is able to outperform . This helps us answer Q3, affirmatively, that: offers many advantages over stateoftheart relational neural networks.
In summary, our experiments clearly show the benefits of parameter tying as well as the expressivity of relational random walks in tightly integrating with a neural network model across a wide variety of domains and settings. The key strengths of are that it can (1) efficiently incorporate a large number of relational features, (2) capture local qualitative structure through relational random walk features, (3) tie feature weights (parametertying) in a manner that captures the global quantitative influences.




Model  Measure  UwCse  Imdb  Cora  Mutagen.  Sports 





AUCROC  0.9230.027  0.9950.004  0.5030.003  0.5000.000  0.7410.016 
AUCPR  0.8260.056  0.9850.013  0.3560.006  0.3350.000  0.5270.036  



AUCROC  0.7000.186  0.9970.007  0.9680.022  0.5320.019  0.6570.014 
AUCPR  0.9100.072  0.9920.017  0.9430.032  0.4120.032  0.6580.056  





Discussion:
A typical convolutional neural network (CNN) is composed of three layers: convolution, maxpooling and (fullyconnected) output layers.
can be considered a special instance of a convolutional network in relational domains, where the factgrounding layer edges are the equivalent of convolution, combining rules layer represents pooling, and softmax layer is the fullyconnected layer. If we perform a full and exhaustive grounding of the neural network in
, is the number of lifted random walks (template rules), is the number of grounded random walks (instances of a template rule) and is the number of all facts (atomic instances). The data can be represented as a threedimensional tensor of size , whose elements are precisely (see the discussion of the Input Layer in Section 3.2). In addition, if we consider the rule layer as tensor , where parameters are tied across , then constitutes the convolving filter that is repeatedly applied to each of ground instances. The resulting tensor obtained by composing representing the output of grounded layer passes through a pooling layer (which is the rulecombination layer, here) to downsample the data produce a new tensor . The tensor , when composed with the fullyconnected nonlinear layer of our model produces tensor of size that represents the probability of each class in the output: .5 Conclusion and Future Work
We considered the problem of learning neural networks from relational data. Our proposed architecture was able to exploit parameter tying i.e., different instances of the same rule shared the same parameters inside the same training example. In addition, we explored the use of relational random walks to create relational features for training these neural nets. Further experiments on larger data sets could yield insights into the scalability of this approach. Integration with an approximatecounting method could potentially reduce the training time. Given the relation to CNNs, stacking could allow for our method to be deeper. Finally, understanding the use of such randomwalkbased neural network as a function approximator can allow for efficient and interpretable learning in relational domains with minimal feature engineering.
References
 [1] Bach, S., Broecheler, M., Huang, B., Getoor, L.: Hingeloss Markov random fields and probabilistic soft logic. JMLR (2017)
 [2] Blockeel, H., Uwents, W.: Using neural networks for relational learning. In: ICML Workshop (2004)
 [3] Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for opentext semantic parsing. In: AISTATS (2012)
 [4] Bordes, A., Usunier, N., GarciaDuran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multirelational data. In: NeurIPS (2013)
 [5] Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, Jr., E.R., Mitchell, T.M.: Toward an architecture for neverending language learning. In: AAAI (2010)
 [6] Das, R., Neelakantan, A., Belanger, D., McCallum, A.: Chains of reasoning over entities, relations, and text using recurrent neural networks. In: EACL (2017)

[7]
De Raedt, L., Kersting, K., Natarajan, S., Poole, D.: Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan & Claypool (2016)
 [8] DiMaio, F., Shavlik, J.: Learning an approximation to inductive logic programming clause evaluation. In: ILP (2004)

[9]
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. JMLR (2011)
 [10] Evans, R., et al.: Can neural networks understand logical entailment? ICLR (2018)
 [11] França, M.V.M., Zaverucha, G., d’Avila Garcez, A.S.: Fast relational learning using bottom clause propositionalization with artificial neural networks. MLJ (2014)
 [12] Garcez, A.S.d., Gabbay, D.M., Broda, K.B.: NeuralSymbolic Learning System: Foundations and Applications. SpringerVerlag (2002)
 [13] Getoor, L., Friedman, N., Koller, D., Pfeffer, A.: Learning probabilistic relational models. RDM (2001)
 [14] Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. MIT Press (2007)
 [15] Hu, Z., Ma, X., Liu, Z., Hovy, E.H., Xing, E.P.: Harnessing deep neural networks with logic rules. In: ACL (2016)

[16]
Jaeger, M.: Parameter learning for relational bayesian networks. In: ICML (2007)
 [17] Kaur, N., Kunapuli, G., Khot, T., Kersting, K., Cohen, W., Natarajan, S.: Relational restricted boltzmann machines: A probabilistic logic learning approach. In: ILP (2017)
 [18] Kazemi, S.M., Buchman, D., Kersting, K., Natarajan, S., Poole, D.: Relational logistic regression. In: KR (2014)
 [19] Kazemi, S.M., Poole, D.: RelNN: A deep neural model for relational learning. In: AAAI (2018)
 [20] Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning Markov logic networks via functional gradient boosting. In: ICDM (2011)
 [21] Komendantskaya, E.: Firstorder deduction in neural networks. In: LATA (2007)
 [22] Lao, N., Cohen, W.: Relational retrieval using a combination of pathconstrained random walks. JMLR (2010)
 [23] Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: ICML (2008)
 [24] Lavrac, N., Džeroski, v.: Inductive Logic Programming: Techniques and Applications. Prentice Hall (1993)
 [25] Lodhi, H., Muggleton, S.: Is mutagenesis still challenging ? In: ILP (2005)
 [26] Lodhi, H.: Deep relational machines. In: ICONIP (2013)
 [27] Mihalkova, L., Mooney, R.: Bottomup learning of Markov logic network structure. In: ICML (2007)
 [28] Muggleton, S.: Inverse entailment and Progol. New Generation Computing (1995)
 [29] Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradientbased boosting for statistical relational learning: Relational dependency network case. MLJ (2012)
 [30] Natarajan, S., Tadepalli, P., Dietterich, T.G., Fern, A.: Learning firstorder probabilistic models with combining rules. ANN MATH ARTIF INTEL (2008)
 [31] Nickel, M., Tresp, V., Kriegel, H.P.: A threeway model for collective learning on multirelational data. In: ICML (2011)
 [32] Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: ICML (2016)
 [33] Palm, R.B., Paquet, U., Winther, O.: Recurrent relational networks for complex relational reasoning. In: ICLR (2018)
 [34] Perozzi, B., AlRfou’, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD (2014)
 [35] Pham, T., Tran, T., Phung, D.Q., Venkatesh, S.: Column networks for collective classification. In: AAAI (2016)
 [36] Poon, H., Domingos, P.: Joint inference in information extraction. In: AAAI (2007)
 [37] Ramon, J., Raedt, L.D.: Multi instance neural network. In: ICML Workshop (2000)
 [38] Richardson, M., Domingos, P.: Markov logic networks. MLJ (2006)
 [39] Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Transactions on Neural Networks (2009)
 [40] Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC (2018)
 [41] Socher, R., Chen, D., Manning, C., Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: NeurIPS (2013)
 [42] Šourek, G., Manandhar, S., Železný, F., Schockaert, S., Kuželka, O.: Learning predictive categories using lifted relational neural networks. In: ILP (2016)
 [43] Towell, G.G., Shavlik, J.W., Noordewier, M.O.: Refinement of approximate domain theories by knowledgebased neural networks. In: AAAI (1990)
 [44] Šourek, G., Aschenbrenner, V., Železny, F., Kuželka, O.: Lifted relational neural networks. In: NeurIPS Workshop (2015)
 [45] Šourek, G., Svatoš, M., Železný, F., Schockaert, S., Kuželka, O.: Stacked structure learning for lifted relational neural networks. In: ILP (2017)

[46]
Wang, H., Shi, X., Yeung, D.: Relational stacked denoising autoencoder for tag recommendation. In: AAAI (2015)

[47]
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI (2014)
 [48] Yang, B., Yih, W.T., He, X., Gao, J., Deng, L.: Embedding entitities and relations for learning and inference in knowledge bases. In: ICLR (2015)
 [49] Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: COLING (2014)
Comments
There are no comments yet.