Motivation
Predicate invention is one of the core challenges of Inductive Logic Programming (ILP) and Statistical relational learning (SRL) since their beginnings. The task is to extend the initial vocabulary that is given to a relational learner by discovering novel concepts and relations from data. The discovered concepts should be explained in terms of the observable ones. The invented predicates can also be
statistical if the uncertainty in the discovered predicates is represented explicitly.The benefits of predicate invention are numerous. Firstly, it can produce more compact and comprehensive models by capturing dependencies between observed predicates, which consequently yields less parameters and reduces the risk of overfitting. Secondly, as each invented predicate can later be reused, it allows a learner to take larger steps through the search space. Finally, invented predicates can represent latent states of a datagenerating process, potentially increasing the performance of a model.
The progress so far, however, has been limited. We argue here that one of the main obstacles is the lack of a framework that formalizes the problem. The existing work is a collection of individual approaches harnessing different ideas, greedily searching for new predicates to improve classification accuracy or invent predicates to compress a complete logical program (known as theory revision).
In this work, we propose a unifying framework for statistical predicate invention and theory revision. Our proposal departs from the conventional approaches and addresses it from the perspective of unsupervised representation learning [Bengio2009]. The main motivation for this proposal lies in the key ingredient of representation learning success: representation learning methods proved to be very effective at constructing many layers of features, that can be reused to address the final classification task. This significantly resembles the idea behind predicate invention.
We argue here that the construction of layers of features can be seen as propositional predicate invention
, where each hidden node, a new feature, can be seen as a binary variable dependent on the states of a subset of variables in the preceding layer. Furthermore, discovered hidden variables can later compose more complex dependencies throughout the layers of a deep model. Secondly, we argue that relational learners suffer from conceptually the same problem as the tasks successfully addressed by deep learning: that of high dimensionality. The connection comes from the interpretation that formulas in a relational model are seen as boolean features of the model. Given a knowledge base and its vocabulary, the search space of possible formulas (or features) is huge even in domains with a small number of predicates, and is therefore highdimensional. When learning the structure, learners aim at selecting a small subset of all possible formulas that are most relevant for the task.
We base the framework on a particular line of research within deep learning: autoencoders [Vincent et al.2010, Bengio et al.2007, Vincent et al.2008, Tieleman2008] and sparse coding methods [Lee et al.2007, Gregor and LeCun2010]. These approaches take a generative view on representation learning. They create a hidden representation
able to regenerate the original data from a smaller subset of features. Autoencoders achieve this by means of a neural network with a single hidden layer and putting input, instead of the labels, at the output of the same neural network. Sparse coding approaches, on the contrary, discover a set of hidden vectors that would, by linear combination, reconstruct the original examples. Both approaches are instantiations of the
encoderdecoder approach, where one learns an encoderto map original data to a hidden representation, and a separate
decoder to reconstruct the original data from the hidden representation. Both methods can be further stacked to obtain layers of features. Deep models built in this manner are proven to be effective in extracting useful features in a completely unsupervised way, successfully applied to text and image recognition.With this proposal, we intend to contribute towards bridging the relational and deep learning communities on the problem of predicate invention. The main underlying idea is to encode the provided set of features into a new set of latent features that could reconstruct the majority of the original features.
State of the Art
Within the ILP research, predicates can be invented by analyzing firstorder formulas, and forming a predicate to represent either their commonalities [Wogulis and Langley1989] or their differences [Muggleton and Buntine1988]. A weakness of such approaches is that they are prone to overgenerating predicates, many not useful ones. Predicates can also be invented by instantiating secondorder templates [Silverstein and Pazzani1991], or to represent exceptions to learned rules [Srinivasan, Muggleton, and Bain1992]. More recently, MuggletonMIL (MuggletonMIL) introduce a metainterpreter perspective on predicate invention.
Within the SRL research, Popescul2004 (Popescul2004) apply kmeans clustering to the objects of each type in a domain, create predicates to represent clusters, and learn relations among them. Perlich2003 (Perlich2003) present a number of approaches for aggregating multirelational data. Craven2001 (Craven2001) propose a learning mechanism for hypertext domains in which class predictions produced by naive Bayes are added to an ILP system (FOIL) as invented predicates. DavisOSBPC07 (DavisOSBPC07) learn Horn clauses with an offtheshelf ILP system, create a predicate for each clause learned, and add it as a feature to the database. Kok07statisticalpredicate (Kok07statisticalpredicate) cluster both predicate and constant symbols to create new predicates. WangMC15 (WangMC15) capture differences between similar formulas and represent it with a new predicate.
New view: Theory reconstruction
Our proposal is based on the encoderdecoder architecture, utilized by many representation learning approaches. An essential difference between our approach and previously proposed ones is that the autoencoder does not try to encode the knowledge base, but only the patterns that occur in it. Furthermore, in contrast to the majority of the previous approaches, our approach is entirely unsupervised.
Let be a knowledge base with a vocabulary , with a set of predicates and a set of constants. A sentence in the firstorder logic is a formula with no free variables. Let be a set of sentences called the language bias (typically specified using syntactic constraints, e.g., all conjunctive formulas containing at most 3 literals and at most 2, existentially quantified, variables). Given a knowledge base , let be a set of all sentences in that are true in .
As enumeration of all possible satisfiable formulas w.r.t. would be infeasible, represents a tradeoff between expressivity and efficiency. Therefore, plays an important role that significantly influences the kind of predicates that will be invented.
Example 1. Assume {smokes/1, cancer/1, friends/2}, {jane, john} and {smokes(john), cancer(john), friends(john,jane), smokes(jane)}. Let be all existentially quantified conjunctive formulas with connected variables with each term being a variable, and length in range (2,3). Then is {smokes(X),cancer(X); smokes(X),friends(X,Y); smokes(Y),friends(X,Y); cancer(X),friends(X,Y); smokes(X),friends(X,Y),smokes(Y); cancer(X),friends(X,Y),smokes(Y); smokes(X),cancer(X),friends(X,Y) }.
Let be a set of predicates that do not occur in ; these predicates are called hidden predicates. Let be a set of sentences with the following properties: contains exactly one definition for each predicate in , and each definition is of the form where is a sentence built using and is a predicate symbol from . Let contain truth assignments to given their corresponding definitions and constants .
Definition 1.
Hidden representation. Let be a language bias with a vocabulary . A hidden representation of is a set of sentences in that are true in .
Example 2. Assume {} and {, , , }. Limiting to existentially quantified conjunctive formulas of maximal length 2, is {(X); (X); (X,Y); (X,Y); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X,Y)}.
Definition 2.
Relational autoencoder. A relational autoencoder is a program that, given a logical theory , constructs an encoder and a decoder , together with the .
Definition 3.
Relational encoder. A relational encoder is a function that maps a set of sentences in to a set of sentences in , given .
Definition 4.
Relational decoder. A relational decoder is a function that maps a hidden representation to a set of sentences in .
Definition 5.
Theory reconstruction. The theory reconstruction task is then defined as learning the such that
where is a difference measure between two logical theories, and measures a quality of the hidden representation according to a specified criterion, such as sparsity, compression or others.
The main role of is to prevent the identity mapping between and which, though lossless, would be a useless one. In contrast to the neural autoencoder, where one fixes the structure and learns just the weights, the proposed framework learns the structure itself while putting equal weights on all connections. One can further instantiate the representation and repeat the procedure to obtain more complex predicates.
Example 3. Assuming from Ex. 1 and and in Ex. 2, can then perfectly reconstruct using the subset of and substituting predicates in with their definitions: {(X); (X); (X,Y); (X,Y); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X)} with the same order as in Ex. 1. In contrast to , the representation using hidden predicates is more concise since it requires less atoms per formula to represent exactly the same knowledge. In practice one hopes that would as well contain less facts than the original .
This formulation encapsulates both predicate invention and theory revision, in which case is substituted with the formulas in an existing model , . Moreover, the formulation can be further extended to account on uncertainty by means of weighted reconstruction
, where weights can resemble a probability of a formula being true in data, similar to many SRL approaches.
This formulation intends to establish a common ground between relational and deep learning, and start a discussion to define a framework for predicate invention.
References
 [Bengio et al.2007] Bengio, Y.; Lamblin, P.; Popovici, D.; and Larochelle, H. 2007. Greedy layerwise training of deep networks. In Schölkopf, B.; Platt, J. C.; and Hoffman, T., eds., Advances in Neural Information Processing Systems 19. MIT Press. 153–160.
 [Bengio2009] Bengio, Y. 2009. Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1):1–127.
 [Craven and Slattery2001] Craven, M., and Slattery, S. 2001. Relational learning with statistical predicate invention: Better models for hypertext. Mach. Learn. 43(12):97–119.
 [Davis et al.2007] Davis, J.; Ong, I. M.; Struyf, J.; Burnside, E. S.; Page, D.; and Costa, V. S. 2007. Change of representation for statistical relational learning. In Veloso, M. M., ed., IJCAI, 2719–2726.

[Gregor and LeCun2010]
Gregor, K., and LeCun, Y.
2010.
Learning fast approximations of sparse coding.
In
Proceedings of the 27th International Conference on Machine Learning (ICML10), June 2124, 2010, Haifa, Israel
, 399–406.  [Kok and Domingos2007] Kok, S., and Domingos, P. 2007. Statistical predicate invention. In In Z. Ghahramani (Ed.), Proceedings of the 24’th annual international conference on machine learning (ICML2007, 433–440.
 [Lee et al.2007] Lee, H.; Battle, A.; Raina, R.; and Ng, A. Y. 2007. Efficient sparse coding algorithms. In Schölkopf, B.; Platt, J. C.; and Hoffman, T., eds., Advances in Neural Information Processing Systems 19. MIT Press. 801–808.
 [Muggleton and Buntine1988] Muggleton, S., and Buntine, W. L. 1988. Machine invention of firstorder predicates by inverting resolution. In Laird, J., ed., Proceedings of the 5th International Conference on Machine Learning (ICML’88), 339–352. Morgan Kaufmann.
 [Muggleton and Lin2013] Muggleton, S., and Lin, D. 2013. Metainterpretive learning of higherorder dyadic datalog: Predicate invention revisited.
 [Perlich and Provost2003] Perlich, C., and Provost, F. 2003. Aggregationbased feature invention and relational concept classes. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, 167–176. New York, NY, USA: ACM.
 [Popescul and Ungar2004] Popescul, A., and Ungar, L. H. 2004. Clusterbased concept invention for statistical relational learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, 665–670. New York, NY, USA: ACM.
 [Silverstein and Pazzani1991] Silverstein, G., and Pazzani, M. J. 1991. Relational clich’es: Constraining constructive induction during relational learning. In In Proceedings of the Eighth International Workshop on Machine Learning, 203–207. Morgan Kaufmann.
 [Srinivasan, Muggleton, and Bain1992] Srinivasan, A.; Muggleton, S.; and Bain, M. 1992. Distinguishing exceptions from noise in nonmonotonic learning. In In Proceedings of the 2nd International Workshop on Inductive Logic Programming, 97–107.

[Tieleman2008]
Tieleman, T.
2008.
Training restricted boltzmann machines using approximations to the likelihood gradient.
In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1064–1071. New York, NY, USA: ACM. 
[Vincent et al.2008]
Vincent, P.; Larochelle, H.; Bengio, Y.; and Manzagol, P.A.
2008.
Extracting and composing robust features with denoising autoencoders.
In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. New York, NY, USA: ACM.  [Vincent et al.2010] Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; and Manzagol, P.A. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408.

[Wang, Mazaitis, and Cohen2015]
Wang, W. Y.; Mazaitis, K.; and Cohen, W. W.
2015.
A soft version of predicate invention based on structured sparsity.
In
Proceedings of the TwentyFourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 2531, 2015
, 3918–3924.  [Wogulis and Langley1989] Wogulis, J., and Langley, P. 1989. Improving efficiency by learning intermediate concepts. In Proceedings of the 11th International Joint Conference on Artificial Intelligence  Volume 1, IJCAI’89, 657–662. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Comments
There are no comments yet.