Theory reconstruction: a representation learning view on predicate invention

06/28/2016 ∙ by Sebastijan Dumančić, et al. ∙ 0

With this positional paper we present a representation learning view on predicate invention. The intention of this proposal is to bridge the relational and deep learning communities on the problem of predicate invention. We propose a theory reconstruction approach, a formalism that extends autoencoder approach to representation learning to the relational settings. Our intention is to start a discussion to define a unifying framework for predicate invention and theory revision.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Motivation

Predicate invention is one of the core challenges of Inductive Logic Programming (ILP) and Statistical relational learning (SRL) since their beginnings. The task is to extend the initial vocabulary that is given to a relational learner by discovering novel concepts and relations from data. The discovered concepts should be explained in terms of the observable ones. The invented predicates can also be

statistical if the uncertainty in the discovered predicates is represented explicitly.

The benefits of predicate invention are numerous. Firstly, it can produce more compact and comprehensive models by capturing dependencies between observed predicates, which consequently yields less parameters and reduces the risk of overfitting. Secondly, as each invented predicate can later be re-used, it allows a learner to take larger steps through the search space. Finally, invented predicates can represent latent states of a data-generating process, potentially increasing the performance of a model.

The progress so far, however, has been limited. We argue here that one of the main obstacles is the lack of a framework that formalizes the problem. The existing work is a collection of individual approaches harnessing different ideas, greedily searching for new predicates to improve classification accuracy or invent predicates to compress a complete logical program (known as theory revision).

In this work, we propose a unifying framework for statistical predicate invention and theory revision. Our proposal departs from the conventional approaches and addresses it from the perspective of unsupervised representation learning [Bengio2009]. The main motivation for this proposal lies in the key ingredient of representation learning success: representation learning methods proved to be very effective at constructing many layers of features, that can be re-used to address the final classification task. This significantly resembles the idea behind predicate invention.

We argue here that the construction of layers of features can be seen as propositional predicate invention

, where each hidden node, a new feature, can be seen as a binary variable dependent on the states of a subset of variables in the preceding layer. Furthermore, discovered hidden variables can later compose more complex dependencies throughout the layers of a deep model. Secondly, we argue that relational learners suffer from conceptually the same problem as the tasks successfully addressed by deep learning: that of high dimensionality. The connection comes from the interpretation that formulas in a relational model are seen as boolean features of the model. Given a knowledge base and its vocabulary, the search space of possible formulas (or features) is huge even in domains with a small number of predicates, and is therefore high-dimensional. When learning the structure, learners aim at selecting a small subset of all possible formulas that are most relevant for the task.

We base the framework on a particular line of research within deep learning: autoencoders [Vincent et al.2010, Bengio et al.2007, Vincent et al.2008, Tieleman2008] and sparse coding methods [Lee et al.2007, Gregor and LeCun2010]. These approaches take a generative view on representation learning. They create a hidden representation

able to re-generate the original data from a smaller subset of features. Autoencoders achieve this by means of a neural network with a single hidden layer and putting input, instead of the labels, at the output of the same neural network. Sparse coding approaches, on the contrary, discover a set of hidden vectors that would, by linear combination, reconstruct the original examples. Both approaches are instantiations of the

encoder-decoder approach, where one learns an encoder

to map original data to a hidden representation, and a separate

decoder to reconstruct the original data from the hidden representation. Both methods can be further stacked to obtain layers of features. Deep models built in this manner are proven to be effective in extracting useful features in a completely unsupervised way, successfully applied to text and image recognition.

With this proposal, we intend to contribute towards bridging the relational and deep learning communities on the problem of predicate invention. The main underlying idea is to encode the provided set of features into a new set of latent features that could reconstruct the majority of the original features.

State of the Art

Within the ILP research, predicates can be invented by analyzing first-order formulas, and forming a predicate to represent either their commonalities [Wogulis and Langley1989] or their differences [Muggleton and Buntine1988]. A weakness of such approaches is that they are prone to over-generating predicates, many not useful ones. Predicates can also be invented by instantiating second-order templates [Silverstein and Pazzani1991], or to represent exceptions to learned rules [Srinivasan, Muggleton, and Bain1992]. More recently, MuggletonMIL (MuggletonMIL) introduce a meta-interpreter perspective on predicate invention.

Within the SRL research, Popescul2004 (Popescul2004) apply k-means clustering to the objects of each type in a domain, create predicates to represent clusters, and learn relations among them. Perlich2003 (Perlich2003) present a number of approaches for aggregating multi-relational data. Craven2001 (Craven2001) propose a learning mechanism for hypertext domains in which class predictions produced by naive Bayes are added to an ILP system (FOIL) as invented predicates. DavisOSBPC07 (DavisOSBPC07) learn Horn clauses with an off-the-shelf ILP system, create a predicate for each clause learned, and add it as a feature to the database. Kok07statisticalpredicate (Kok07statisticalpredicate) cluster both predicate and constant symbols to create new predicates. WangMC15 (WangMC15) capture differences between similar formulas and represent it with a new predicate.

New view: Theory reconstruction

Our proposal is based on the encoder-decoder architecture, utilized by many representation learning approaches. An essential difference between our approach and previously proposed ones is that the auto-encoder does not try to encode the knowledge base, but only the patterns that occur in it. Furthermore, in contrast to the majority of the previous approaches, our approach is entirely unsupervised.

Let be a knowledge base with a vocabulary , with a set of predicates and a set of constants. A sentence in the first-order logic is a formula with no free variables. Let be a set of sentences called the language bias (typically specified using syntactic constraints, e.g., all conjunctive formulas containing at most 3 literals and at most 2, existentially quantified, variables). Given a knowledge base , let be a set of all sentences in that are true in .

As enumeration of all possible satisfiable formulas w.r.t. would be infeasible, represents a trade-off between expressivity and efficiency. Therefore, plays an important role that significantly influences the kind of predicates that will be invented.

Example 1. Assume {smokes/1, cancer/1, friends/2}, {jane, john} and {smokes(john), cancer(john), friends(john,jane), smokes(jane)}. Let be all existentially quantified conjunctive formulas with connected variables with each term being a variable, and length in range (2,3). Then is {smokes(X),cancer(X); smokes(X),friends(X,Y); smokes(Y),friends(X,Y); cancer(X),friends(X,Y); smokes(X),friends(X,Y),smokes(Y); cancer(X),friends(X,Y),smokes(Y); smokes(X),cancer(X),friends(X,Y) }.

Let be a set of predicates that do not occur in ; these predicates are called hidden predicates. Let be a set of sentences with the following properties: contains exactly one definition for each predicate in , and each definition is of the form where is a sentence built using and is a predicate symbol from . Let contain truth assignments to given their corresponding definitions and constants .

Definition 1.

Hidden representation. Let be a language bias with a vocabulary . A hidden representation of is a set of sentences in that are true in .

Example 2. Assume {} and {, , , }. Limiting to existentially quantified conjunctive formulas of maximal length 2, is {(X); (X); (X,Y); (X,Y); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X,Y)}.

Definition 2.

Relational auto-encoder. A relational auto-encoder is a program that, given a logical theory , constructs an encoder and a decoder , together with the .

Definition 3.

Relational encoder. A relational encoder is a function that maps a set of sentences in to a set of sentences in , given .

Definition 4.

Relational decoder. A relational decoder is a function that maps a hidden representation to a set of sentences in .

Definition 5.

Theory reconstruction. The theory reconstruction task is then defined as learning the such that

where is a difference measure between two logical theories, and measures a quality of the hidden representation according to a specified criterion, such as sparsity, compression or others.

The main role of is to prevent the identity mapping between and which, though lossless, would be a useless one. In contrast to the neural auto-encoder, where one fixes the structure and learns just the weights, the proposed framework learns the structure itself while putting equal weights on all connections. One can further instantiate the representation and repeat the procedure to obtain more complex predicates.

Example 3. Assuming from Ex. 1 and and in Ex. 2, can then perfectly reconstruct using the subset of and substituting predicates in with their definitions: {(X); (X); (X,Y); (X,Y); (X,Y),(X,Y); (X,Y),(X,Y); (X,Y),(X)} with the same order as in Ex. 1. In contrast to , the representation using hidden predicates is more concise since it requires less atoms per formula to represent exactly the same knowledge. In practice one hopes that would as well contain less facts than the original .

This formulation encapsulates both predicate invention and theory revision, in which case is substituted with the formulas in an existing model , . Moreover, the formulation can be further extended to account on uncertainty by means of weighted reconstruction

, where weights can resemble a probability of a formula being true in data, similar to many SRL approaches.

This formulation intends to establish a common ground between relational and deep learning, and start a discussion to define a framework for predicate invention.

References

  • [Bengio et al.2007] Bengio, Y.; Lamblin, P.; Popovici, D.; and Larochelle, H. 2007. Greedy layer-wise training of deep networks. In Schölkopf, B.; Platt, J. C.; and Hoffman, T., eds., Advances in Neural Information Processing Systems 19. MIT Press. 153–160.
  • [Bengio2009] Bengio, Y. 2009. Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1):1–127.
  • [Craven and Slattery2001] Craven, M., and Slattery, S. 2001. Relational learning with statistical predicate invention: Better models for hypertext. Mach. Learn. 43(1-2):97–119.
  • [Davis et al.2007] Davis, J.; Ong, I. M.; Struyf, J.; Burnside, E. S.; Page, D.; and Costa, V. S. 2007. Change of representation for statistical relational learning. In Veloso, M. M., ed., IJCAI, 2719–2726.
  • [Gregor and LeCun2010] Gregor, K., and LeCun, Y. 2010. Learning fast approximations of sparse coding. In

    Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel

    , 399–406.
  • [Kok and Domingos2007] Kok, S., and Domingos, P. 2007. Statistical predicate invention. In In Z. Ghahramani (Ed.), Proceedings of the 24’th annual international conference on machine learning (ICML-2007, 433–440.
  • [Lee et al.2007] Lee, H.; Battle, A.; Raina, R.; and Ng, A. Y. 2007. Efficient sparse coding algorithms. In Schölkopf, B.; Platt, J. C.; and Hoffman, T., eds., Advances in Neural Information Processing Systems 19. MIT Press. 801–808.
  • [Muggleton and Buntine1988] Muggleton, S., and Buntine, W. L. 1988. Machine invention of first-order predicates by inverting resolution. In Laird, J., ed., Proceedings of the 5th International Conference on Machine Learning (ICML’88), 339–352. Morgan Kaufmann.
  • [Muggleton and Lin2013] Muggleton, S., and Lin, D. 2013. Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited.
  • [Perlich and Provost2003] Perlich, C., and Provost, F. 2003. Aggregation-based feature invention and relational concept classes. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, 167–176. New York, NY, USA: ACM.
  • [Popescul and Ungar2004] Popescul, A., and Ungar, L. H. 2004. Cluster-based concept invention for statistical relational learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, 665–670. New York, NY, USA: ACM.
  • [Silverstein and Pazzani1991] Silverstein, G., and Pazzani, M. J. 1991. Relational clich’es: Constraining constructive induction during relational learning. In In Proceedings of the Eighth International Workshop on Machine Learning, 203–207. Morgan Kaufmann.
  • [Srinivasan, Muggleton, and Bain1992] Srinivasan, A.; Muggleton, S.; and Bain, M. 1992. Distinguishing exceptions from noise in non-monotonic learning. In In Proceedings of the 2nd International Workshop on Inductive Logic Programming, 97–107.
  • [Tieleman2008] Tieleman, T. 2008.

    Training restricted boltzmann machines using approximations to the likelihood gradient.

    In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1064–1071. New York, NY, USA: ACM.
  • [Vincent et al.2008] Vincent, P.; Larochelle, H.; Bengio, Y.; and Manzagol, P.-A. 2008.

    Extracting and composing robust features with denoising autoencoders.

    In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. New York, NY, USA: ACM.
  • [Vincent et al.2010] Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; and Manzagol, P.-A. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408.
  • [Wang, Mazaitis, and Cohen2015] Wang, W. Y.; Mazaitis, K.; and Cohen, W. W. 2015. A soft version of predicate invention based on structured sparsity. In

    Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015

    , 3918–3924.
  • [Wogulis and Langley1989] Wogulis, J., and Langley, P. 1989. Improving efficiency by learning intermediate concepts. In Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’89, 657–662. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.