1 Introduction
Statistical relational learning (SRL) methods combine probabilistic reasoning with knowledge representations that capture the structure in problem domains. Markov logic networks (MLN) [Richardson and Domingos2006] and probabilistic soft logic (PSL) [Bach et al.2017] are notable SRL frameworks that define model structure with weighted firstorder logic. However, specifying logical clauses for each problem is laborious and requires domain knowledge. The task of discovering these weighted clauses from data is referred to as structure learning, and has been wellstudied for MLNs [Kok and Domingos2005, Kok and Domingos2009, Kok and Domingos2010, Mihalkova and Mooney2007, Biba, Ferilli, and Esposito2008, Huynh and Mooney2008, Khosravi et al.2010, Khot et al.2015]. The extensive related work for MLNs underscores the importance of structure learning for SRL.
Structure learning approaches alleviate the cost of model discovery. However, they face several critical computational challenges. First, even when the model space is restricted to be finite, it results in a combinatorial search. Second, heuristic approaches that iteratively refine and grow a set of rules require interleaving of several costly rounds of parameter estimation and scoring. Finally, scoring the model often involves computing the model likelihood which is typically intractable to evaluate exactly.
Structure learning approaches for MLNs vary in the degree to which they address these scalability challenges. An efficient and extensible class of MLN structure learning algorithms adopt a bottomup strategy, mining patterns and motifs from training data to generate informative clauses [Mihalkova and Mooney2007, Kok and Domingos2009, Kok and Domingos2010]. The datadriven heuristics reduce the search space to useful clauses but still interleave rounds of parameter estimation and scoring, which is expensive for SRL methods.
Motivated by the success of structure learning for MLNs, in this paper, we formalize the structure learning problem for PSL. We extend the datadriven approach to generating clauses and propose two contrasting PSL structure learning methods that differ in scalability and choice of approximations. We build on pathconstrained relational random walk methods [Lao and Cohen2010, Gardner et al.2013] to generate clauses that capture patterns in the data. To find the best set of clauses, we introduce a greedy searchbased algorithm and an optimization method that uses a piecewise pseudolikelihood (PPLL) objective function. PPLL decomposes the search over clauses into a single optimization over clause weights that is solved with an efficient parallel algorithm. Our proposed PPLL approach addresses the scalability challenges of structure learning and its formulation can be easily extended to other SRL techniques, including MLNs. In this paper, our key technical contributions are to:

formulate pathconstrained clause generation that efficiently finds relational patterns in the data.

propose greedy search and PPLL methods that select the best pathconstrained clauses by trading off scalability and approximations for structure learning.

validate the predictive performance and runtimes of both methods with realworld tasks in biological paper recommendation, drug interaction prediction and knowledge base completion.
We compare both proposed PSL structure learning methods and show that our novel PPLL method achieves an order of magnitude runtime speedup and AUC improvements of up to 15% over the greedy search method.
2 Background
We briefly review of structure learning for statistical relational learning (SRL) and probabilistic soft logic (PSL), the framework for which we propose structure learning approaches.
2.1 Structure Learning for SRL
Our work focuses on SRL methods such as MLNs and PSL that encode dependencies with firstorder logic. Below, we formalize the joint distributions defined using logical clauses before outlining structure learning for these methods.
An atom consists of a predicate (e.g. Works, Lives) over constants (e.g. Alice, Bob) or variables (e.g. ). An atom whose predicate arguments are all constants is a ground atom. A literal is an atom or its negation. A clause is a formula where and are literals. Given clauses and realvalued weights , a model is a set of clause and weight pairs.
Given constants from a domain, we substitute the variables appearing in literals over with these constants to obtain a set of ground clauses for each clause . The corresponding set of ground atoms is where each
is a random variable with assignments
. The model defines a distribution over as:(1) 
Each instantiated from a clause is a function over assignments to that returns 0 if is satisfied by
values and 1 otherwise. Intuitively, assignments that satisfy more ground rules are exponentially more probable.
The problem of structure learning finds the model which best fits a set of observed assignments , regularized by model complexity. We denote the set of possible clauses as the language . Although can be infinite, it is standard to impose restrictions that make finite for structure learning. Formally, the structure learning problem finds that maximize a regularized log likelihood function given observed assignments:
(2) 
where represents priors on the weights and structure. Typical choices for combine a Gaussian prior on weights and an exponential prior on clause length.
The log likelihood requires an exponential sum to compute and the optimization combines a combinatorial search over with a maximization of continuous weights (called weight learning). Consequently, solving structure learning requires further approximations to search and scoring. Approaches to structure learning broadly interleave two key components: clause generation and model evaluation, or scoring. The clause generation phase produces a candidate language over which to search. In practice, is a subset of all possible clauses, chosen to restrict the search to useful regions of the space. Model evaluation typically iteratively refines the existing model by learning and scoring candidate clauses in using approximations to .
2.2 Probabilistic Soft Logic
Probabilistic soft logic (PSL) is a SRL framework that defines hingeloss Markov random fields, a special class of the undirected graphical model given by eq:srlpdf. HLMRFs are conditional distributions over realvalued atom assignments in and apply a continuous relaxation of Boolean logic to the ground clauses to derive of the form:
(3) 
where and denote the set of nonnegated and negated ground atoms in the clause and . In contrast to ground Boolean clauses that are satisfied or violated, a ground clause in soft logic returns a continuous distance to satisfaction. Intuitively, corresponds to a linear or quadratic penalty for violating clause .
PSL defines distributions over the target variables for a particular task conditioned on the remaining evidence variables. Formally, given a set of target predicates , a PSL model consists of nonnegative weights and disjunctive clauses where the predicate for literal belongs to . Given a set of atoms where random variable and a set of evidence atoms where each is an observed variable, a PSL model defines an HLMRF distribution of the form:
(4) 
PSL has been successfully applied to many problem including natural language processing
[Beltagy, Erk, and Mooney2014], social media analysis [Johnson and Goldwasser2016, Ebrahimi, Dou, and Lowd2016] and information extraction [Platanios et al.2017].3 Structure Learning for PSL
Given target predicates , structure learning for PSL finds a model to infer . We denote language space for PSL , which is restricted to clauses of the form . We again constrain to be finite. To overcome the intractable likelihood score, pseudolikelihood [Besag1975] is an approximation that is commonly used across SRL structure learning and weight learning methods. For HLMRFs, the pseudolikelihood approximates the likelihood as:
(5) 
The notation selects ground clauses where appears.
Given target predicates , realvalued variable assignments and where each atom consists of , following the objective in Equation 2, structure learning for PSL maximizes log pseudolikelihood :
(6) 
where denotes all ground rules that can be instantiated from clauses . In the next section, we propose two approaches to the structure learning problem for HLMRFs that rely on an efficient clause generation algorithm.
4 Approaches to PSL Structure Learning
To formulate PSL structure learning algorithms, we introduce approaches for both key method components: clause generation and model evaluation. We outline an efficient algorithm for datadriven clause generation. For model evaluation over these clauses, we first propose a straightforward greedy local search algorithm (GLS). To improve upon the computationally expensive searchbased approach, we introduce a novel optimization approach, piecewise pseudolikelihood (PPLL). PPLL unifies the efficient clause generation with a surrogate convex objective that can be optimized exactly and in parallel.
4.1 PathConstrained Clause Generation
The clause generation phase of structure learning outputs the language of firstorder logic clauses over which to search. Driven by relational random walk methods used for information retrieval tasks [Lao, Mitchell, and Cohen2011, Gardner et al.2013], we formulate a special class of pathconstrained clauses that capture relational patterns in the data. Pathconstrained clause generation is also related to the preprocessing steps in bottomup structure learning methods [Mihalkova and Mooney2007, Kok and Domingos2009, Kok and Domingos2010]. Bottomup methods typically use relational paths as heuristics to cluster predicates into templates and enumerate all clauses that contain predicate literals from the same template. The structure learning algorithm greedily selects from these clauses. Pathconstrained clause generation also produces prior to structure learning. Here, we use a breadthfirst traversal algorithm which directly generates informative pathconstrained clauses by variablizing relational paths in the data.
The inputs to pathconstrained clause generation are the ground atoms of a domain, the set of all predicates and target predicate . In this work, we consider predicates with arity of two but our approach will be extended to support predicates with arity three and higher. We begin with a running example that illustrates the definitions below. Consider a ground atom set with Cites(Paper1, Paper2), Mentions(Paper2, Gene), Mentions(Paper1, Gene) and . In this simple example, all ground atoms have an assignment of 1. In general, realvalued assignments to atoms must be rounded to 0 or 1 during pathconstrained clause generation.
A target relational path for denoted is defined by an ordered list of ground atoms such that each , its last argument is the first argument of , and is a target atom. Given a target relational path , the corresponding firstorder pathconstrained clause has the form where each is a logical variable and the th literal in the clause variablizes the th atom in . The negation of is the clause with , the target predicate literal negated.
For eg:reldata, given target relational path [Cites(Paper1, Paper2), Mentions(Paper2, Gene), Mentions(Paper1, Gene)], we obtain the firstorder pathconstrained clause:
We generate the set of all possible pathconstrained clauses up to length , by performing breadthfirst search (BFS) of up to depth from the first argument of each target atom . A connected BFS search tree for training example is rooted at and one of its leaf nodes must be . Every nonleaf constant in has child entities connected by ground atoms . For eg:reldata, the connected BFS search tree of depth for target atom Mentions(Paper1, Gene) is:
Given a tree , each path from its root to leaf node is a target relational path . For target predicate , is the set of connected BFS search trees corresponding to all target atoms. For all , we enumerate all such from each and obtain the unique set of these paths . For each , we form the corresponding pathconstrained clause and its negation to obtain all such clauses . Moreover, we can further restrict to those clauses that connect target atoms, preferring clauses that cover, or explain, at least training examples. The language defined by guides the search over models that capture informative relational patterns in the data. Although produces only Horn clauses and is thus a subset of the language [Kazemi and Poole2018], it has been successfully used in several relational learning tasks [Lao and Cohen2010, Gardner et al.2013]. While our pathconstrained clause generation performs well in the tasks we study, where needed, we will explore more expressive strategies.
4.2 Greedy Local Search
Given pathconstrained clauses, exactly maximizing the pseudolikelihood objective given by eq:pseudolikelihood requires evaluating subsets of clauses, which is already infeasible with only 100 clauses. Instead, we propose an approximate greedy search algorithm that selects locally optimal clauses in each iteration to maximize pseudolikelihood.
alg:local gives the pseudocode for greedy local search (GLS) which approximately maximizes the pseudolikelihood score . GLS iteratively picks the that maximizes and adds it to the model until the score has only improved by or a maximum number of iterations has been reached. While GLS is straightforward to implement, it requires rounds of weight learning and evaluating where denotes the size of . As grows, the GLS becomes prohibitively expensive unless we sacrifice performance by increasing or decreasing . To overcome the scalability pitfalls of GLS and searchbased methods at large, we introduce a new structure learning objective that can be optimized efficiently and exactly.
4.3 Piecewise Pseudolikelihood
The partition function in pseudolikelihood involves an integration that couples all model clauses. Optimizing pseudolikelihood requires evaluating all subsets of the language , necessitating greedy approximations to the combinatorial problem. To overcome this computational bottleneck, we propose a new, efficienttooptimize objective function called piecewise pseudolikelihood (PPLL). Below, we derive two key results which have significant consequences for scalability of structure learning: 1) with PPLL, structure learning is solved by performing weight learning once; and 2) the factorization used by PPLL admits an inherently parallelizable gradientbased algorithm for optimization.
PPLL was first proposed for weight learning in conditional random fields (CRF) [Sutton and McCallum2007]. For HLMRFs, PPLL factorizes the joint conditional distribution along both random variables and clauses and is defined as:
(7) 
The key advantage of PPLL over pseudolikelihood arises from the factorization of into , which requires only clause and variable for its computation.
Following standard convention for structure learning, we optimize the log of PPLL denoted . We highlight a connection between PPLL and pseudolikelihood that is useful in deriving the two key scalability results of PPLL. The product of terms in PPLL corresponding to clause is the log pseudolikelihood of the model containing only clause . We denote this :
(8) 
We now show that for the log PPLL objective function, performing weight learning on the model containing all clauses in is equivalent to optimizing the objective function over the space of all models. Formally:
(9) 
Optimizing over the set of weights w is equivalent to optimizing over each separately.
Proof.
Each is a function of only . By definition of , we have
∎
For PPLL, maximizing the weights of the model containing all clauses in is equivalent to optimizing the structure learning objective.
Proof.
∎
As a result of thm:ppll, instead of combinatorial search, we perform a simpler continuous optimization over weights that can be solved efficiently. Since the objective is convex, and the weights are nonnegative, we optimize the above objective using projected gradient descent.
The projected gradient descent algorithm for optimizing the objective function is shown in alg:piecewise. The partial derivative of for a given weight is of the form:
(10) 
The gradient for any weight is the difference between observed and expected penalties summed over corresponding ground clauses . For both pseudolikelihood and PPLL, we can compute observed penalties once and cache their values but the repeated expected value computations, even for a onedimensional integral, remain costly. However, unlike the gradients for pseudolikelihood, each expectation term in the PPLL gradient considers a single clause. Thus, when evaluating gradients for weight updates in alg:piecewise, we use multithreading to compute the expectation terms in parallel. The dual advantages of parallelizing and requiring weight learning only once makes PPLL highly scalable. After convergence of the gradient descent procedure, we return the set of clauses with nonzero weights as the final model.
5 Experimental Evaluation
Method  FlyGene  YeastGene  DDIInteracts  FreebaseFilmRating  FreebaseBookAuthor 

GLS  0.95 0.01  0.86 0.02  0.66 0.01  0.65 0.04  0.67 0.03 
PPLL  0.97 0.002  0.90 0.003  0.76 0.01  0.65 0.05  0.65 0.04 
The PPLL optimization method uses a fully factorized approximation for scalability while GLS greedily maximizes the less decoupled pseudolikelihood at the expense of speed. We explore the tradeoffs made by these two methods by evaluating predictive performance and scalability. We investigate these experimental questions with five prediction tasks and compare PPLL against GLS after generating pathconstrained clauses. The evaluation tasks include paper recommendation in biological citation networks, drug interaction prediction and knowledge base completion.
5.1 Datasets
For our datasets, we obtain citation networks for biological publications, drugdrug interaction pharmacological networks and knowledge graphs.
Biological Citation Networks
Our first dataset consists of biologyrelated papers and entities such as authors, venues, words, genes, proteins and chemical compounds [Lao, Mitchell, and Cohen2011]. The dataset includes relations over these entity types for two domains, “Fly” and “Yeast”, resulting in two citation networks. The prediction target is the Gene relation between genes and papers that mention them. To enforce training only on papers from the past, we partition papers into periods of time, using those from 2006 as observations, training on papers from 2007 and evaluating on papers from 2008. We randomly subsample targets to obtain 1500 train and test links, and generate five such random splits for crossvalidation.
Drugdrug interaction
The second dataset we use includes chemical interactions between drug pairs, called drugdrug interactions (DDI) across 196 drug compounds obtained from the DrugBank database. This dataset also contains a directed graph of relations from Drugbank between these drugs and gene targets, enzymes, and transporters. Our target for prediction is the Interacts relation between drugs. We subsample the tens of thousands of labeled interaction and shuffle the remaining labeled DDI links into five folds for crossvalidation. Each fold contains almost 2000 labeled DDI targets. We alternate using one fold of DDI edges as observations, one for training and one for heldout evaluation.
Freebase
Our third dataset comes from the Freebase knowledge graph and is wellused in validating knowledge base (KB) completion tasks [Gardner et al.2014]. We study KB completion for two relations: links between films and their ratings (FilmRating) and links between authors and books written (BookAuthor). The remaining relations in the KB are observed. For both target relations, we subsample edges and split the resultant edges into five folds for crossvalidation, yielding 1000 labeled edges per fold.
5.2 Experimental Setup
Our first experimental question evaluates predictive performance using area under the ROC curve (AUC) on heldout data with fivefold crossvalidation across the five tasks described above. Our second question validates scalability by comparing runningtimes for both methods as the number of clauses grows. For both methods, we use ADMM inference implemented in the probabilistic soft logic (PSL) framework [Bach et al.2017]. For GLS, we use the pseudolikelihood learning algorithm in PSL and implement its corresponding scoring function within in PSL ^{1}^{1}1psl.linqs.org. For PPLL, we implement the parallelized learning algorithm in PSL. For all tasks, we enumerate target relational paths using the BFS utility in the Path Ranking Algorithm (PRA) ^{2}^{2}2github.com/mattgardner/pra [Lao and Cohen2010, Gardner et al.2013, Gardner et al.2014] and generate pathconstrained clauses from these paths. PRA generates and includes the inverses of all atoms when performing BFS. To form clause literals from these inverses, we use the original predicate and reverse the order of its variablized arguments.
As the number of generated clauses grows, GLS becomes prohibitive as we show in our scalability results and necessitates a clausepruning strategy. We prune the set of clauses by retaining those that connect at least 10 target atoms and select the top 50 clauses by number of targets connected. For each target predicate in the prediction tasks detailed above, we also add a negative prior clause to the candidate clauses. For link prediction tasks, the negative prior captures the intuition that true positive links are rare and most links do not form. We refer the reader to [Bach et al.2017] for detailed discussion on the importance of negative priors. For the biological citation networks and Freebase settings, we subsample negative examples of the targets to mitigate the imbalance in labeled training data. We perform 150 iterations of gradient descent for PPLL and 15 for GLS since it requires several rounds of weight learning.
5.3 Predictive Performance
Our first experimental question investigates the ramifications for predictive performance of the approximations made by each method. PPLL approximates the likelihood by fully factorizing across clauses and target variables while GLS uses the pseudolikelihood approximation which still couples clauses. We examine whether the decoupling in PPLL limits its predictive performance. We generate pathconstrained clauses as input to both methods and evaluate their performance on heldout data. Table 1 compares both methods using AUC for all five prediction tasks averaged across multiple folds and splits.
Table 1 shows that PPLL gains significantly in AUC over GLS in three out of five settings. For the Gene
link prediction task in the Yeast and Fly biological citation networks, PPLL also yields lower variance given the same rules. In the DDI setting where we predict
Interacts links between drugs, PPLL enjoys a 15% AUC gain over GLS from 0.66 to 0.76. In the Freebase setting, for the BookAuthor task, PPLL again achieves comparable performance with GLS. GLS only improves slightly over the PPLL approximation in one setting, predicting FilmRating with a statistically insignificant gain of 0.02 in AUC.5.4 Scalability Study
Our second experimental question focuses on the scalability tradeoffs made by GLS and PPLL. PPLL requires weight learning over clauses, made faster with parallelized updates while GLS requires iterative rounds of weight learning and model evaluation. We select the two Freebase tasks, BookAuthor and FilmRating where pathconstrained clause generation initially yielded several hundred rules. We plot the running time for both methods as the size of the candidate clause set increases from 25 to 200.
Figure 1 shows the running times (in seconds) for both methods plotted in log scale across the two Freebase tasks as the number of clauses to evaluate increases. The results show that while PPLL remains computationally feasible as the number of clauses increases, GLS quickly becomes intractable as the clause set grows. Indeed, for BookAuthor, GLS requires almost two days to learn a model with 200 candidate clauses. In contrast, PPLL completes in four minutes using 200 clauses in the same setting. PPLL overcomes the requirement of interleaving weight learning and scoring while also admitting parallel weight learning updates, boosting scalability. The results suggest that PPLL can explore a larger space of models in significantly less time.
6 Related Work
Finally, we review related work on structure learning approaches for undirected graphical models, which underpin the SRL methods we highlight in this paper. We also provide an overview of work in relational information retrieval which motivates our pathconstrained clause generation.
For general Markov random fields (MRF) and their conditional variants, structure learning typically induces feature functions represented as propositional logical clauses of boolean attributes [McCallum2002, Davis and Domingos2010]
. An approximate model score is optimized with a greedy search that iteratively picks clausal feature functions to include while refining candidate features by adding, removing or negating literals to singleliteral clauses. MRF structure learning is also viewed as a feature selection problem solved by performing L1regularized optimization over candidate features, admitting fast gradient descent and online algorithms
[Perkins, Lacker, and Theiler2003, Zhu, Lao, and Xing2010].Although structure learning has not been studied in PSL, many algorithms have been proposed to learn MLNs. The initial approach to MLN structure learning performs greedy beam search to grow the set of model clauses starting from singleliteral clauses. The clause generation performs all possible negations and additions to an existing set of clauses while the search procedure iteratively selects clauses to refine. To efficiently guide the search towards useful models, bottomup approaches generate informative clauses by using relational paths to capture patterns and motifs in the data [Mihalkova and Mooney2007, Kok and Domingos2009, Kok and Domingos2010]. This relational path mining in bottomup approaches is related to the path ranking algorithm (PRA) for relational information retrieval [Lao and Cohen2010]. PRA performs random walks or breadthfirst traversal on relational data to find useful pathbased features for retrieval tasks [Lao and Cohen2010, Gardner et al.2013, Gardner et al.2014].
Most recently, MLN structure learning has been viewed from the perspectives of moralizing learned Bayesian networks
[Khosravi et al.2010]and functional gradient boosting
[Khot et al.2011, Khot et al.2015]. These methods improve scalability while maintaining predictive performance. Alternately, approaches have been proposed to learn MLNs for target variables specific to a task of interest as we do for PSL. Structure learning methods for particular tasks use inductive logic programming
[Muggleton1991] to generate clauses which are pruned with L1regularized learning [Huynh and Mooney2008, Huynh and Mooney2011] or perform iterative local search [Biba, Ferilli, and Esposito2008] to refine rules with the operations described above.7 Conclusion and Future Work
In this work, we formalize the structure learning problem for PSL and introduce an efficienttooptimize and convex surrogate objective function, PPLL. We unify scalable optimization with datadriven pathconstrained clause generation. Compared to the straightforward but inefficient greedy local search method, PPLL remains scalable as the space of candidate rules grows and demonstrates good predictive performance across five realworld tasks. Although we focus on PSL in this work, our PPLL method can be generalized for MLNs and other SRL frameworks. An important line of future work for PSL structure learning is extending L1regularized feature selection and functional gradient boosting approaches which have been applied successfully to MRFs and MLNs. These methods have been shown to scale while maintaining good predictive performance.
Acknowledgements
This work is sponsored by the Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), and supported by NSF grants CCF1740850 and NSF IIS1703331. We thank Sriraam Natarajan and Devendra Singh Dhami for sharing their DrugBank dataset.
References

[Bach et al.2017]
Bach, S. H.; Broecheler, M.; Huang, B.; and Getoor, L.
2017.
Hingeloss Markov random fields and probabilistic soft logic.
Journal of Machine Learning Research
18(109):1–67.  [Beltagy, Erk, and Mooney2014] Beltagy, I.; Erk, K.; and Mooney, R. J. 2014. Probabilistic soft logic for semantic textual similarity. In ACL.
 [Besag1975] Besag, J. 1975. Statistical analysis of nonlattice data. The statistician 179–195.
 [Biba, Ferilli, and Esposito2008] Biba, M.; Ferilli, S.; and Esposito, F. 2008. Discriminative structure learning of Markov logic networks. In ILP.
 [Davis and Domingos2010] Davis, J., and Domingos, P. 2010. Bottomup learning of Markov network structure. In ICML.
 [Ebrahimi, Dou, and Lowd2016] Ebrahimi, J.; Dou, D.; and Lowd, D. 2016. Weakly supervised tweet stance classification by relational bootstrapping. In EMNLP.
 [Gardner et al.2013] Gardner, M.; Talukdar, P. P.; Kisiel, B.; and Mitchell, T. 2013. Improving learning and inference in a large knowledgebase using latent syntactic cues.

[Gardner et al.2014]
Gardner, M.; Talukdar, P. P.; Krishnamurthy, J.; and Mitchell, T.
2014.
Incorporating vector space similarity in random walk inference over knowledge bases.
In EMNLP.  [Huynh and Mooney2008] Huynh, T. N., and Mooney, R. J. 2008. Discriminative structure and parameter learning for Markov logic networks. In ICML.
 [Huynh and Mooney2011] Huynh, T. N., and Mooney, R. J. 2011. Online structure learning for markov logic networks. In ECMLPKDD.
 [Johnson and Goldwasser2016] Johnson, K., and Goldwasser, D. 2016. “All I know about politics is what I read in Twitter”: Weakly supervised models for extracting politicians’ stances from Twitter. In COLING.
 [Kazemi and Poole2018] Kazemi, S. M., and Poole, D. 2018. Bridging weighted rules and graph random walks for statistical relational models. Frontiers in Robotics and AI 5:8.
 [Khosravi et al.2010] Khosravi, H.; Schulte, O.; Man, T.; Xu, X.; and Bina, B. 2010. Structure learning for Markov logic networks with many descriptive attributes. In AAAI.
 [Khot et al.2011] Khot, T.; Natarajan, S.; Kersting, K.; and Shavlik, J. 2011. Learning Markov logic networks via functional gradient boosting. In ICDM.
 [Khot et al.2015] Khot, T.; Natarajan, S.; Kersting, K.; and Shavlik, J. 2015. Gradientbased boosting for statistical relational learning: the markov logic network and missing data cases. Machine Learning 100(1):75–100.
 [Kok and Domingos2005] Kok, S., and Domingos, P. 2005. Learning the structure of Markov logic networks. In ICML.
 [Kok and Domingos2009] Kok, S., and Domingos, P. 2009. Learning Markov logic network structure via hypergraph lifting. In ICML.
 [Kok and Domingos2010] Kok, S., and Domingos, P. 2010. Learning Markov logic networks using structural motifs. In ICML.
 [Lao and Cohen2010] Lao, N., and Cohen, W. W. 2010. Relational retrieval using a combination of pathconstrained random walks. Machine learning 81(1):53–67.
 [Lao, Mitchell, and Cohen2011] Lao, N.; Mitchell, T.; and Cohen, W. W. 2011. Random walk inference and learning in a large scale knowledge base. In EMNLP.
 [McCallum2002] McCallum, A. 2002. Efficiently inducing features of conditional random fields. In UAI.
 [Mihalkova and Mooney2007] Mihalkova, L., and Mooney, R. J. 2007. Bottomup learning of Markov logic network structure. In ICML.
 [Muggleton1991] Muggleton, S. 1991. Inductive logic programming. New generation computing 8(4):295–318.
 [Perkins, Lacker, and Theiler2003] Perkins, S.; Lacker, K.; and Theiler, J. 2003. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research 3(Mar):1333–1356.
 [Platanios et al.2017] Platanios, E.; Poon, H.; Mitchell, T. M.; and Horvitz, E. J. 2017. Estimating accuracy from unlabeled data: A probabilistic logic approach. In NIPS.
 [Richardson and Domingos2006] Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine learning 62(12):107–136.
 [Sutton and McCallum2007] Sutton, C., and McCallum, A. 2007. Piecewise pseudolikelihood for efficient training of conditional random fields. In ICML.
 [Zhu, Lao, and Xing2010] Zhu, J.; Lao, N.; and Xing, E. P. 2010. Graftinglight: fast, incremental feature selection and structure learning of markov random fields. In KDD.
Comments
There are no comments yet.