From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)

12/13/2019 ∙ by Zied Bouraoui, et al. ∙ Université d'Orléans IRIT Université de Technologie de Compiègne Cardiff University Université de Toulouse 17

This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been developing quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or the need for explanations and causal understanding. Then some methodologies combining reasoning and learning are reviewed (such as inductive logic programming, neuro-symbolic reasoning, formal concept analysis, rule-based representations and ML, uncertainty in ML, or case-based reasoning and analogical reasoning), before discussing examples of synergies between KRR and ML (including topics such as belief functions on regression, EM algorithm versus revision, the semantic description of vector representations, the combination of deep learning with high level inference, knowledge graph completion, declarative frameworks for data mining, or preferences and recommendation). This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reasoning and learning are two basic concerns at the core of Artificial Intelligence (AI). In the last three decades, Knowledge Representation and Reasoning (KRR) on the one hand, and Machine Learning (ML) on the other hand, have been considerably developed and have specialised themselves in a large number of dedicated sub-fields. These technical developments and specialisations, while they were strengthening the respective corpora of methods in KRR and in ML, also contributed to an almost complete separation of the lines of research in these two areas, making many researchers on one side largely ignorant of what is going on, on the other side.

This state of affairs is also somewhat relying on general, overly simplistic, dichotomies that suggest there exists a large gap between KRR and ML: KRR deals with knowledge, ML handles data; KRR privileges symbolic, discrete approaches, while numerical methods dominate ML. Even if such a rough picture points out facts that cannot be fully denied, it is also misleading, as for instance KRR can deal with data as well [389]

(e.g., in formal concept analysis) and ML approaches may rely on symbolic knowledge (e.g., in inductive logic programming). Indeed, the frontier between the two fields is actually much blurrier than it appears, as both are involved in approaches such as Bayesian networks, or case-based reasoning and analogical reasoning, and they share important concerns such as uncertainty representation (e.g., probabilistic or possibilistic models, belief functions, imprecise probability-based approaches).

These remarks already suggest that KRR and ML may have more in common than one might think at first glance. In that respect, it is also important to remember that the human mind is able to perform both reasoning and learning tasks with many interactions between these two types of activity. In fact, from the very beginning of the AI history, both reasoning and learning tasks have been considered, but not by the same researchers; see, e.g., [162]. So, especially if the ultimate goal of AI is to have machines that perform tasks handled by the human mind, it might be natural and useful to increase the cooperation between KRR and ML.

The intended goal pursued in this work is to start and construct an inventory of common concerns in KRR and ML, of methodologies combining reasoning principles and learning, of examples of KRR/ML synergies. Yet, this paper is not an overview of the main issues in KRR crossed with an overview of the main issues in ML, trying to identify when they meet. Doing so would lead to a huge and endless survey since providing a survey of methodologies and tools for KRR alone, or for ML alone would be already a colossal task22footnotetext: Nevertheless the interested reader is referred to appropriate chapters in [334] or to monographs such as [66, 33, 230, 436, 350].. In the following we rather try to identify a collection of meeting points between KRR and ML. Since it is a work in progress, we do not expect to reach any form of exhaustiveness, and even some important topics may remain absent from the document at this stage.

The paper is not either an introductory paper to KRR and/or ML. It is rather intended for readers who are quite familiar with either KRR or ML, and who are curious about the other field. It aims in the long range at contributing to a better mutual understanding of the two communities, and maybe to identify some synergies worth of further research combining KRR and ML.

2 Common Concerns

In order to suggest and illustrate differences and also similarities between KRR and ML, let us start with the simple example of a classification or recommendation-like task ,such as, e.g., associating the profile of a candidate (in terms of skills, tastes) with possible activities suitable for him/her in a vocational guidance system. Such a problem may be envisioned in different manners. On the one hand, one may think of it in terms of a rule-based system relying on some expertise (where rules may be pervaded with uncertainty), or on the other hand in terms of machine learning by exploiting a collection of data (here pertaining to past cases in career guidance).

It is worth noticing that beyond the differences of types of representation that are used in both kinds of approach (e.g., conditional tables for uncertainty assessment vs. weights in a neural net), there are some noticeable similarities between (graphical) structures that can be associated with a rule-based reasoning device, handling uncertainty (or an information fusion process) and with a neural net. This remark suggests that, beyond differences in perspective, there is some structural resemblance between the two types of process. This resemblance has been investigated recently in detail in the setting of belief function theory [143], but an example may also be found in an older work on a possibilistic (-) matrix calculus devoted to explainability (where each matrix represents a rule) [177].

Beyond this kind of parallel, it is clear that KRR and ML have common concerns. This section gives an overview of the main ones regarding the representation issues, the complexity, the role of knowledge, the handling of lack of information, or information in excess, uncertainty, and last but not least regarding causality and explanation. Each subsection below tries to follow the same basic structure, by each time providing i) the KRR view, ii) the ML view, and iii) some synthesis and discussion.

2.1 Types of Representation

In KRR, as suggested by the name, the main representations issues concern the representation of pieces of knowledge (rather than data). The large variety of real world information has led to a number of logical formalisms ranging from classical logic (especially proportional and first order) to modal logics (for dealing with e.g., time, deontic, or epistemic notions) and to non classical logics for handling commonsense reasoning.

The representation may use different formats, directed or undirected: sets of if-then rules, or sets of logical formulas. A rule “if then ” is a 3-valued entity (as first noticed in [128]), since it induces a partition between its set of examples, its set of counterexamples and the set of items for which the rule is irrelevant (i.e., when is false). So a rule strongly departs from its apparent logical counterpart in terms of material implication (which is indeed non-directed, since equivalent ). This discrepancy can be also observed in the probabilistic setting, since in general. Rules may hold up to (implicit) exceptions (see subsection 2.3).

Knowledge may be pervaded with uncertainty, which can be handled in different settings, in terms of probability, possibility, belief functions, or imprecise probabilities (see subsection 2.3

). In all of these cases, a joint distribution can be decomposed in sub-distributions laying bare some form of conditional independence relations, with a graphical counterpart; the prototypical graphical models in each representation are respectively Bayesian networks (probabilistic), possibilistic networks, credal networks (imprecise probabilities

[112]) or valuation-based systems (belief functions). Conceptual graphs [448, 86] offer a graph representation for logic, especially for ontologies/description logics.

The main goal of KRR is to develop sound and (as far as possible complete) inference mechanisms to draw conclusions from generic knowledge and factual data, in a given representation setting [229, 230]. The mathematical tools underlying KRR are those of logic and uncertainty theories, and more generally discrete mathematics. An important issue in KRR is to find good compromises between the expressivity of the representation and the computational tractability for inferring the conclusions of interest from it [310]. This concern is especially at work with description logics that are bound to use tractable fragments of first order logic.

The situation in ML is quite different concerning representation issues. ML aims at learning a model of the world from data. There are thus two key representation problems: the representation of data and the representation of models. See, e.g., [100, 101]. In many approaches the data space is assimilated to a subset of , in which the observations are described by

numerical attributes. This is the simplest case, allowing the use of mathematical results in linear algebra and in continuous optimization. Nevertheless data may also be described by qualitative attributes, as for instance binary attributes, thus requiring different mathematical approaches, based on discrete optimisation and on enumeration coupled with efficient pruning strategies. Quite often, data is described by both types of attributes and only few ML tools, for instance decision trees, are able to handle them. Therefore, changes of representation are needed, as for instance discretization, or the encoding of qualitative attributes into numerical ones, all inducing a bias on the learning process. More complex data, such as relational data, trees, graphs need more powerful representation languages, such as first order logic or some proper representation trick as for instance propositionalization or the definition of appropriate kernels. It is important to notice that the more sophisticated the representation language, the more complex the inference process and a trade-off must be found between the granularity of the representation and the efficiency of the ML tool.

Regarding models, they depend on the ML task: supervised or unsupervised classification, learning to rank, mining frequent patterns, etc. They depend also on the type of approach that one favours: more statistically or more artificial-intelligence oriented. There is thus a distinction between generative and discriminative models (or decision functions). In the generative approach

, one tries to learn a probability distribution

over the input space . If learning a precise enough probability distribution is successful, it becomes possible in principle to generate further examples , the distribution of which is indistinguishable from the true underlying distribution. It is sometimes claimed that this capability makes the generative approach “explicative”. This is a matter of debate. The discriminative

approach does not try to learn a model that allows the generation of more examples. It only provides either a means of deciding when in the supervised mode, or a means to express some regularities in the data set in the unsupervised mode. These regularities as well as these decision functions can be expressed in terms of logical rules, graphs, neural networks, etc. While they do not allow to generate new examples, they nonetheless can be much more interpretable than probability distributions.

Very sketchily, one can distinguish between the following types of representations.

  • Linear models and their generalisations, such as linear regression or the linear perceptron first proposed by Rosenblatt

    [406]

    . Because these models are based on linear weightings of the descriptors of the entries, it looks easy to estimate the importance of each descriptor and thus to offer some understanding of the phenomenon at hand. This, however, assumes that the descriptors are uncorrelated and are well chosen.

  • Nonlinear models are often necessary in order to account for the intricacies of the world. Neural networks, nowadays involving very numerous layers of non linearity, are presently the favourite tools for representing and learning non linear models.

  • Linear models as well as nonlinear ones provide a description of the world or of decision rules through (finite) combinations of descriptors. They are parametric models. Another approach is to approximate the world by learning a non previously fixed number of prototypes and use a nearest-neighbour technique to define decision functions. These systems are capable of handling any number of prototypes as long as the can fit the data appropriately.

    Support Vector Machines (SVM) fall in this category since they adjust the number of support vectors (learning examples) in order to fit the data. Here, explaining a rule may mean providing a list of the most relevant prototypes that the rule uses.

  • The above models are generally numerical in essence, and the associated learning mechanisms most often rely on some optimisation process over the space of parameters. Another class of models relies on logical descriptions, e.g., sets of clauses. Decision trees can also be considered as logic-based, since each tree can be transformed into a set of clauses. The learning algorithms use more powerful structures over the space of models than numerical models. In many cases the discrete nature of the search space and the definition of a generality relation between formulas allow the organization of models in a lattice and the design of heuristics to efficiently prune the search space. More generally, these approaches are usually modeled as enumeration problems (e.g., pattern mining) or discrete optimization problems (supervised learning, clustering, …). Moreover such models offer more opportunities to influence the learning process using prior knowledge. Finally, they can be easily interpreted. The downside is their increased brittleness when coping with noisy data.

2.2 Computational Complexity

Complexity issues are a major concern in any branch of computer science. In KRR, very expressive representation languages have been studied, but interesting reasoning problems for these languages are often at least at the second level of the polynomial hierarchy for time complexity. There is a trade-off between the expressive power of a language and the complexity of the inference it allows. Reasoning tasks in languages with suitably restricted expressivity are tractable, like for instance languages using Horn clauses or Lightweight description logics such as DL-lite [76] or EL [25].

The study of complexity has motivated a large number of works in many fields of KRR including non-monotonic reasoning, argumentation, belief merging and uncertainty management. In particular when the desirable solution (i.e., gold standard) of the problem (for instance, merging operator, inconsistency-tolerant consequence relation, etc.) has a high computational complexity, then it is common to look for an approximation that has reasonable complexity. For instance, the observation that answering meaningful queries from an inconsistent DL-Lite knowledge base using universal consequence relation is NP-Complete, has led to the introduction of several tractable approximations [28].

The attempt to cope with hardness of inference has also been a driving force in research around some important and expressive languages, including propositional clauses and CSPs, where inference is NP-complete; for instance, powerful methods nowadays enable the solving of SAT problems with up to hundreds of thousands of variables, and millions of clauses in a few minutes (see section 4.8). Some of the most competitive current SAT solvers are described in [8, 337, 326]. Two other ways to cope with time complexity are anytime methods, which can be interrupted at any time during the solving process and then return an incomplete, possibly false or sub-optimal solution; and approximate methods. A recent trend in KRR is to study so-called compilation schemes [125, 335]: the idea here is to pre-process some pieces of the available information in order to improve the computational efficiency (especially, the time complexity) of some tasks; this pre-processing leads to a representation in a language where reasoning tasks can be performed in polynomial time (at the cost of a theoretical blow up in worst-case space complexity, which fortunately does not often happen in practice).

Contrastingly, ML algorithms often have a time complexity which is polynomial in the number of variables, the size of the dataset and the size of the model being learnt, especially when the domains are continuous. However, because of the possible huge size of the dataset or of the models, capping the degree of the polynomial remains an important issue. In the case of discrete domains, finding the optimal model,i.e., the one that best fits a given set of examples, can be hard (see [259] ), but one is often happy with finding a “good enough” model in polynomial time: there is no absolute guarantee that the model that best fits the examples is the target model anyway, since this may depend on the set of examples. In fact, an important aspect of complexity in ML concerns the prediction of the quality of the model that one can learn from a given dataset: in the PAC setting for instance [462], one tries to estimate how many examples are needed to guarantee that the model learnt will be, with a high probability, a close approximation to the unknown target model. Intuitively, the more expressive the hypothesis space is, the more difficult it will be to correctly identify the target model, and the more examples will be needed for that [463].

2.3 Lack and Excess of Information: Uncertainty

With respect to a considered reasoning or decision task, information may be missing, or, on the contrary, may be in excess, hence in conflict, which possibly generates uncertainty. Uncertainty has always been an important topic in KRR [230]. While in ML uncertainty is almost always considered to be of statistical or probabilistic origin (aleatory uncertainty), other causes for uncertainty exist, such as the sheer lack of knowledge, and the excess of information leading to conflicts (epistemic uncertainty). However, the role of uncertainty handling in KRR and in ML seems to have been very different so far. While it has been an important issue in KRR and has generated a lot of novel contributions beyond classical logic and probability, it has been considered almost only from a purely statistical point of view in ML [464].

The handling of uncertainty in KRR has a long history, as much with the handling of incomplete information in non-monotonic reasoning as with the handling of probabilities in Bayesian nets [377], and in probabilistic logic languages [412, 113]. Other settings that focus on uncertainty due to incomplete information are possibility theory, with weighted logic bases (possibilistic logic [155, 158]) and graphical representations (possibilistic nets [39, 43]). Belief functions also lend themselves to graphical representations (valuation networks [438], evidential networks [490]) and imprecise probability as well (credal nets [111]).

Uncertainty theories distinct from standard probability theory, such as possibility theory or evidence theory are now well-recognised in knowledge representation. They offer complementary views to uncertainty with respect to probability, or as generalisations of it, dedicated to epistemic uncertainty when information is imprecise or partly missing.

In KRR, at a more symbolic level, the inevitability of partial information has motivated the need for exception-tolerant reasoning. For instance, one may provisionally conclude that “Tweety flies” while only knowing that "Tweety is a bird”, although the default rule “birds fly" has exceptions, and we may later conclude that “Tweety does not fly”, when getting more (factual) information about Tweety. Thus non-monotonic reasoning [67]

has been developed for handling situations with incomplete data, where only plausible tentative conclusions can be derived. Generic knowledge may be missing as well. For example, one may not have the appropriate pieces of knowledge for concluding about some set of facts. Then it may call for interpolation between rules

[421].

When information is in excess in KRR, it may mean that it is just redundant, but it becomes more likely that some inconsistency appears. Redundancy is not always a burden, and may sometimes be an advantage by making more things explicit in different formats (e.g., when looking for solutions to a set of constraints).

Inconsistency is a natural phenomenon in particular when trying to use information coming from different sources. Reasoning from inconsistent information is not possible in classical logic (without trivialisation). It has been extensively studied in AI [48, 41, 78], in order to try and salvage non-trivial conclusions not involved in contradictions. Inconsistency usually appears at the factual level, for instance a logical base with no model. However, a set of rules may be said to be incoherent when there exists an input fact that, together with the rules, would create inconsistency [24].

Machine Learning can face several types of situations regarding the amount of information available. It must be said at once that induction, that goes from observations to regularities, is subject to the same kind of conservation law as in Physics. The information extracted is not created, it is just a reformulation, often with loss, of the incoming information.

If the input data is scarce, then prior knowledge, in one form or another, must complete it. The less data is available, the more prior knowledge is needed to focus the search of regularities by the learning system. This is in essence what the statistical theory of learning says [464]. In recent years, lots of methods have been developed to confront the case where data is scarce and the search space for regularities is gigantic, specially when the number of descriptors is large, often in the thousands or more. The idea is to express special constraints in the so-called regularization term in the inductive criterion that the system use to search the hypothesis space. For instance, a constraint is often that the hypothesis should use a very limited set of descriptors [456].

When there is plenty of data, the problem is more one of dealing with potential inconsistencies. However, except in the symbolic machine learning methods, mostly studied in the 1980s, there is no systematic or principled ways of dealing with inconsistent data. Either the data is pre-processed in order to remove these inconsistencies, and this means having the appropriate prior knowledge to do so, or one relies on the hope that the learning method is robust enough to these inconsistencies and can somehow smooth them up. Too much data may also call for trying to identify a subset of representative data (a relevant sample), as sometimes done in case-based reasoning, when removing redundant cases. Regarding the lack of data there is a variety of approaches for the imputation of missing values ranging from the EM algorithm

[136] to analogical proportion-based inference [59]. However these methods get rid of incompleteness and do not reason about uncertainty.

Finally, a situation that is increasingly encountered is that of multi-source data. Then, the characteristics of the multiple data sets can vary, both in the format, the certainty, the precision, and so on. Techniques like data fusion, data aggregation or data integration are called for, often resorting again to prior knowledge, using for instance ontologies to enrich the data.

2.4 Causality and Explainability

"What is an explanation", "What has to be explained, and how" are issues that have been discussed for a long time by psychologists and philosophers [453, 69]. The interest in AI for explanations is not new either. It appears with the development of rule-based expert systems in the mid-1980’s. Then there was a natural need for explanations that are synthetic, informative, and understandable for the user of an expert system [84]. This raises issues such as designing strategic explanations for a diagnosis, for example in order to try to lay bare the plans and methods used in reaching a goal [232], or using “deep” knowledge for improving explanations [280]. Another issue was the ability to provide negative explanations (as well as positive ones) for answering questions of the form "Why did you not conclude X?" [408], even in the presence of uncertainty [177].

Introductory surveys about explanations in AI may be found in a series of recent papers [237, 241, 289, 239, 205, 206]. Let us also mention the problem of explaining the results of a multi-attribute preference model that is like a “black box”. It has been more recently studied in [296].

As now discussed, explanations are often related to the idea of causality [231]. Indeed most of the explanations we produce or we expect involve some causal relationships (e.g., John imposed on himself to go to the party because he thought that Mary would be there). In many domains where machines can provide aid for decision making, as in medicine, court decisions, credit approval and so on, decision makers and regulators more and more want to know what is the basis for the decision suggested by the machine, why it should be made, and what alternative decision could have been made, had the situation been slightly different. One of the difficulties of explainability in Machine learning is due to the fact that algorithms focus on correlations between features and output variables rather than on causality. The example of wolf vs dog identification [398]

perfectly illustrates this problem. When using a deep classifier, the main feature that determines if a picture represents a dog or a wolf is the presence of snow. There obviously exists a correlation between snow and wolves but it is clearly not a causality link. When this unwanted bias is known, this can be corrected by adding constraint or balancing the dataset. However, due to the lack of interpretability of some algorithms, identifying these biases is challenging. On the other hand, the problem of constraining an algorithm to learn causality rather than correlation is still open.

If explainability is quickly becoming a hot topic in ML, while it was such for expert systems about 30 years ago, the search for solutions is still at an early stage. Some tentative distinction between interpretability and explainability has been suggested.

Interpretability may have a statistical or a KR interpretation (which are of course not mutually exclusive). From a statistical point of view, an interpretable model is a model that comes with mathematical guarantees. They are usually bounds for the approximation errors (linked to the expression power of the hypothesis space) or the generalization error (linked to the robustness of the algorithm with respect to variations of the sample set). These can be also guarantees about the uncertainty around the parameters of the model (represented by confidence intervals for instance). Linear approaches are, in this scope, the most statistically interpretable ML algorithm. Robustness properties of statistical models are also desirable for interpretable models. This is especially the case when considering explanations based on counterfactual examples. Given a binary classifier and an example

, the counterfactual of is the closest example to with respect to a metric that is labeled by the classifier with the opposite label of (the counterfactual is not necessarily in the dataset). Consider for instance a model that determines if a credit is allowed or not with respect to the profile of a customer and a client to which the credit is not granted. The counterfactual in this case answers the question of what is the minimal change on his profile that would ensure that the credit is granted. If the model is based on propositional logic rules, the counterfactual will correspond to a minimal change of the considered example representation in Boolean logic. In this case, the counterfactual is an understandable explanation for the prediction. In deep-learning, the counterpart of counterfactuals are adversarial examples [211]. In most situations, an adversarial example corresponds to an imperceptible modification of the considered example. From this point of view, the lack of robustness of deep networks makes explanations based on counterfactuals very difficult to obtain.

A decision function learned by a system is assumed to be interpretable if it is simple, often meaning a linear model with few parameters, or if it is expressed with terms or rules that a domain expert is supposed to understand, like in rule-based systems or in decision trees. One influencing work promotes the use of locally linear models in order to offer some interpretability even to globally non linear models [398].

Explainability can be understood at two levels. Either at the level of the learning algorithm itself that should be easily understandable by ML experts as well as by practitioners or users of the system. Or at the level of the learned model that, for instance, could incorporate causal relationships. One question is: is it possible to extract causal relationships from data alone, without some prior knowledge that suggest those relationships? Judea Pearl [378, 375, 376] argues that this is not possible, but gives to the ML techniques the role of identifying possible correlations between variables in huge data sets that are impossible to sift through for human experts. A recent work [324] suggests that it would be possible to identify the direction of a causal relationship from observational data. However, the necessary interplay between ML, prior knowledge and reasoning is still a matter of debate.

When dealing with high-dimensional structured data (such as image or text), interpretable and explainable approaches (in a statistical or a KR point of view) are known to be less effective than heavy numerical approaches such as bagging (random forest, gradient boosting) or deep learning

[304, 209]. Deep learning models are neither explainable nor interpretable due to the large number of parameters and their entanglement. There also exist few statistical results for deep learning and the currently known properties are restricted to very specific architectures (see [170] for instance).

Some approaches have been proposed for improving the explainability of a deep learning algorithms (or in some cases any black box algorithm). A first solution for explaining the solution is to analyze the sensitivity of the prediction with respect to small variations of the input. For instance, activation maps [504] will emphasise the most important pixel of a picture for a given prediction. Although this type of representation is easily readable, there are some cases where this is not enough for explanation. Another solution is to approximate the model (globally or locally) with an explainable one. Even when the approximation error is reasonable, we have no guarantee that the interpretation associated with the surrogate model is related to the way the initial model makes the prediction. In [324], authors propose to locally replace the model with a surrogate interpretable model. This allows to reduce the approximation error but it is based on a neighbourhood notion that can be difficult to define in high-dimensional structured spaces. Moreover, using an interpretable/explainable model is not always a guarantee for explainable prediction. Indeed, a linear function with millions of parameters or a set of rules with thousand of literals may be not readable at all.

3 Some Methodologies Combining Reasoning Principles and Learning

The idea of combining KRR ingredients with ML tools is not new. It has been done in different ways. This section presents a series of examples of methodologies mixing KRR and ML ideas, with no intent to be exhaustive however. Each subsection roughly follows the same structure, stating the goal of the methodology, presenting its main aspects, and identifying the KR and the ML parts.

3.1 Injecting Knowledge in Learning

Induction cannot be made to work without prior knowledge that restrains the space of models to be explored. Two forms of prior knowledge, aka. biases, are distinguished: representation biases that limit the expressiveness of the language used to express the possible hypotheses on the world, and search biases that control how the hypothesis space is explored by the learning algorithm.

Representation biases can take various forms. They can directly affect the language in which the possible hypotheses can be expressed. For instance, “hypotheses can involve a maximum of two disjuncts”. In the same way, but less declaratively, looking for linear models only is a severe representation bias. Often, one does not want to be so strict, and prefers to favour more flexible preference criteria over the space of possible hypotheses. Generally, this is expressed through a regularized optimisation criterion that balances a measure of fit of the model to the data, and a measure of fit of the model to the bias. For instance, the following quality measure over linear hypotheses for regression favours hypotheses that involve fewer parameters:

where the norm counts the nonzero parameters .

The search bias dictates how the learning algorithm explores the space of hypotheses. For instance, in the case of neural networks, the search starts with a randomly initialized neural network and then proceeds by a gradient descent optimization scheme. In some other learning methods, such as learning with version space, the search uses generalization relations between hypotheses in order to converge towards good hypotheses. In this latter case, it is more easy to incorporate prior knowledge from the experts. Indeed, the exploration of the hypothesis space is akin to a reasoning process, very much like theorem proving.

3.2 Inductive Logic Programming

Inductive Logic Programming (ILP) (see [355, 129, 167] for general presentations) is a subfield of ML that aims at learning models expressed in (subsets of) First Order Logic. It is an illustration of Symbolic Learning, where the hypotheses space is discrete and structured by a generality relation. The aim is then to find a hypothesis that covers the positive examples (it is then said to be complete) and rejects the negative ones (it is said to be consistent). The structure of the hypothesis space allows to generalize an incomplete hypothesis, so as to cover more positive examples, or to specialize an inconsistent hypothesis in order to exclude negative covered examples. The main reasoning mechanism is induction in the sense of generalization (subsumption).

In ILP, examples and models are represented by clauses. Relying on First Order Logic allows to model complex problems, involving structured objects (for instance to determine whether a molecule is active or not, a system must take into account the fact it is composed of atoms with their own properties and shared relations), or involving objects in relation with each other (a social network or temporal data). Reasoning is a key part of ILP. First, the search for a model is usually performed by exploring a search space structured by a generality relation. A key point is then the definition of a generality relation between clauses. The more natural definition of subsumption should be expressed in terms of logical consequences, which allows comparing the models of both formula, but since the problem is in general not decidable, the notion of -subsumption, as introduced in [384] is usually preferred: a clause is more general that a clause if there exists a substitution such that . In this definition a clause, i.e. a disjunction of literals, is represented by its set of literals. For instance, the rule -subsumes .Indeed, the first one leads to the clause and the second one . Second, expert knowledge can be expressed using facts (ground atoms) or by rules, or yet reasoning mechanisms to be applied. This can be illustrated by the well-known systems FOIL [395] and Progol [357]

ILP, and more generally Symbolic Learning, has thus some interesting properties. First, the model is expressed in logic and therefore is claimed to be easily understandable by a user (See for instance [356] for an interesting study of the comprehensibility or not of programs learned with ILP). Second, expert knowledge can be easily expressed by means of clauses and integrated into the learning algorithm. Although initially developed for the induction of logic programs, it has now shown its interest for learning with structured data.

However, ILP suffers from two drawbacks: the complexity of its algorithms and its inability to deal with uncertain data. Several mechanisms have been introduced to reduce the complexity, as for instance the introduction of syntactic biases, restricting the class of clauses that can be learned. Another interesting idea is propositionnalization, introduced in [301] and then developed for instance in the system RSD [499]. It is a process that transforms a relational problem into a classical attribute-value problem by the introduction of new features capturing relations between objects. Once the transformation performed, any supervised learner can be applied to the problem. The main difficulty is then to define these new features.

This last problem has led to the emergence of Statistical Relational Learning [201, 130] that aims at coupling ILP with probabilistic models. Many systems have been developed, extending naive Bayesian classifier [297], Bayesian Networks [180] or Markov Logic Networks [400] or developing new probabilistic framework as in Problog [134]. In all these works, inference and learning are tightly connected since learning parameters requires to maximize the likelihood for generative learning (estimation of the probabilities to generate the data, given a set of parameters), or the conditional likelihood in case of discriminative learning (estimation of the probabilities of the labels given the data). Optimizing the parameters thus require at each step to estimate the corresponding probabilities. This has led to intensive research on the complexity of inference.

In the last decade new works have emerged linking deep learning and Inductive Logic Programming. Two directions are investigated. The first one relies on propositionnalization as in [283]: first, a set of interpretable rules is built through a Path Ranking Algorithm and then the examples are transformed into an attribute-value representation. Two settings are considered: an existential one making the feature true if an instantiation of this path exists in the example, a counting one that counts the number of times the random walk is satisfied. Once this transformation is performed, a multilayered discriminative RBM can be applied. The second direction, illustrated in [447]

, consists in encoding directly facts and ground rules as neurons. Aggregation neurons allow combining rules with the same head and the activation function are approximations of Łukasiewicz fuzzy logic.

3.3 Neuro-Symbolic Reasoning

Several works have proposed to combine learning and reasoning by studying schemes to translate logical representations of knowledge into neural networks. A long-term goal of a series of works on neural-symbolic integration, surveyed for instance by [49], is “to provide a coherent, unifying view for logic and connectionism …[in order to] …produce better computational tools for integrated ML and reasoning." Typical works propose translation algorithms from a symbolic to a connectionist representation, enabling the use of computation methods associated with neural networks to perform tasks associated with the symbolic representation.

Early works in this vein are [381, 382, 57, 243]. Bornscheuer et al. [57] show for example how an instance of the Boolean satisfiability problem can be translated into a feed-forward network that parallelizes GSAT, a local-search algorithm for Boolean satisfiability. They also show that a normal logic program can be turned into a connectionnist network that can approximate arbitrarily well the semantics of the program.

The paper [121]

exploits this idea to represent propositional logic programs with recurrent neural networks (RNNs) which can be used to compute the semantics of the program. They show that this program can also be used as background knowledge to learn from examples, using back-propagation. Essentially, the RNN defined to represent a logic program

has all atoms of in the input layer; one neuron, a kind of “and” gate, for each rule in a single hidden layer; and one neuron for every atom in the output layer, these neurons working like “or” gates. Re-entrant connections from an atom in the output layer to its counterpart in the input layer enable the chaining of rules.

Franca et al. [186] extend these results to first-order programs, using a propositionalization method called Bottom Clause Propositionalization. In [118, 120], methods are proposed to translate formulas that contain modalities into neural networks, enabling the representation of time and knowledge, and the authors in [116, 117] show that there exists a neural network ensemble that computes a fixed-point semantics of an intuitionistic theory. Pinkas and Cohen [380] perform experiments with so-called higher-order sigma-pi units (which compute a sum of products of their inputs) instead of hidden layers, for planning problems on simple block-world problems: the number of units is fixed at the design time, and is a function of the maximum number of blocks and the maximum number of time steps; for example, for every pair of possible blocks and every time step , there is a unit representing the proposition ; their results indicate that a learning phase enables the network to approximately learn the constraints with a reasonable amount of iterations, which speeds up the computation of an approximate solution for subsequent instances.

A new direction of research has recently emerged, bringing in particular to Statistical Relational Learning [201, 133]

the power of tensor networks. For instance, in

[491, 242], they use recursive tensor networks to predict classes and / or binary relations from a given knowledge base. It has also been proposed in, e.g., [145, 446, 428] to depart from the usual semantics of logic based on Boolean truth values in order to use the numerical operators of neural networks. The universe of an interpretation of first-order logic can be a set of vectors of real numbers, and the truth values of predicates can be real numbers in the interval ; truth values of general formulas can then be defined using usual operators of fuzzy logic. Donadello et al. [148] describe how this approach can be used to learn semantic image interpretation using background knowledge in the form of simple first-order formulas.

3.4 Formal Concept Analysis

Formal Concept Analysis (FCA) [195, 179] is another example of a setting that stands in between KRR and ML concerns. It offers a mathematical framework that is based on the duality between a set of objects or items and a set of descriptors. In the basic setting, we start from a formal context which is a relation linking objects and (Boolean) attributes (or properties). Thus a formal context constitutes a simple repository of data. A concept is formalized as a pair composed of a set of attributes and a set of objects representing the intention and the extension of the concept respectively; it has the property that these objects and only them satisfy the set of attributes and this set of attributes refers to these objects and only them. Such a set of attributes is called a closed pattern or a closed itemset. More precisely, two operators, forming a Galois connection, respectively associate their common descriptors to a subset of objects, and the set of objects that satisfy all of them to a subset of descriptors. In an equivalent way, a pair made of a set of objects and a set of attributes is a formal concept if and only if their Cartesian product forms a maximal rectangle for set inclusion in the formal context. The set of concepts forms a complete lattice.

Formal Concept Analysis is deeply rooted in Artificial Intelligence by the formalization of the notion of concept. Recent years have witnessed a renewed interest in FCA with the emergence of Pattern Mining: it has been shown that the set of closed itemsets forms a condensed representation of the set of itemsets, thus reducing the memory space for storing them. Moreover it a possible to define an equivalence relation between itemsets (two itemsets are equivalent if they share the same closure) and from the equivalence classes, it becomes possible to extract all the exact association rules (rules with a confidence equal to 1). See [220, 373, 35]

Two extensions are especially worth mentioning. One uses fuzzy contexts, where the links between objects and attributes are a matter of degree [38]. This may be useful for handling numerical attributes [340]. Another extension allows for structured or logical descriptors using so-called pattern structures [194, 178, 22]. Besides, operators other than the ones defining formal concepts make sense in formal concept analysis, for instance to characterize independent subcontexts [161].

3.5 Rule-Based Models

Knowledge representation by if-then rules is a format whose importance was early acknowledged in the history of AI, with the advent of rule-based expert systems. Their modelling has raised the question of the adequacy of classical logic for representing them, especially in case of uncertainty where conditioning is often preferred to material implication. Moreover, the need for rules tolerating exceptions, or expressing gradedness, such as default rules and fuzzy rules has led KRR to develop tools beyond classical logic.

3.5.1 Default rules

Reasoning in a proper way with default rules (i.e., having potential exceptions) was a challenging task for AI during three decades [67]. Then a natural question is: can rules having exceptions extracted from data be processed by a nonmonotonic inference system yielding new default rules? How can we insure that these new rules are still agreeing with the data? The problem is then to extract genuine default rules that hold in a Boolean database. It does not just amount to mining association rules with a sufficiently high confidence level. We have to guarantee that any new default rule that is deducible from the set of extracted default rules is indeed valid with respect to the database. To this end, we need a probabilistic semantics for nonmonotonic inference. It has been shown [42] that default rules of the form “if then generally ”, denoted by , where obey the postulates of preferential inference [290], have both

  1. a possibilistic semantics expressed by the constraint , for any max-decomposable possibility measure (),

  2. a probabilistic semantics expressed by the constraint for any big-stepped probability .

This is a very special kind of probability such that if (where is the probability of one of the possible worlds), the following inequalities hold . Then, one can safely infer a new default from a set of defaults if and only if the constraints modeling entail the constraints modeling . Thus, extracting defaults amounts to looking for big-stepped probabilities, by clustering lines describing items in Boolean tables, so as to find default rules, see [40] for details. Then the rules discovered are genuine default rules that can be reused in a nonmonotonic inference system, and can be encoded in possibilistic logic (assuming rational monotony for the inference relation).

It may be also beneficial to rank-order a set of rules expressed in the setting of classical logic in order to handle exceptions in agreement with nonmonotonic reasoning. This what has been proposed in [430] where a new formalization of inductive logic programming (ILP) in first-order possibilistic logic allows us to handle exceptions by means of prioritized rules. Indeed, in classical first-order logic, exceptions of the rules can be assigned to more than one class, even if only one is the right one, which is not correct. The possibilistic formalization provides a sound encoding of non-monotonic reasoning that copes with rules with exceptions and prevents an example from being classified in more than one class.

Possibilistic logic [155] is also a basic logic for handling epistemic uncertainty. It has been established that any set of Markov logic formulas [400] can be exactly translated into possibilistic logic formulas [294, 158], thus providing an interesting bridge between KRR and ML concerns.

3.5.2 Fuzzy rules

The idea of fuzzy if-then rules was first proposed by L. A. Zadeh [497]. They are rules whose conditions and /or conclusions express fuzzy restrictions on the possible values of variables. Reasoning with fuzzy rules is based on a combination / projection mechanism [495] where the fuzzy pieces of information (rules, facts) are conjunctively combined and projected on variables of interest. Special views of fuzzy rules have been used when designing fuzzy rule-based controllers: fuzzy rules may specify the fuzzy graph of a control law which once applied to an input yields a fuzzy output that is usually defuzzified [330]. Or rules may have precise conclusions that are combined on the basis of the degrees of matching between the current situation and the fuzzy condition parts of the rules [451]. In both cases, an interpolation mechanism is at work, implicitly or explicitly [496]. Fuzzy rules-based controllers are universal approximators [79]

. The functional equivalence between a radial basis function-based neural network and a fuzzy inference system has been established under certain conditions

[269].

Moreover, fuzzy rules may provide a rule-based interpretation [114, 413] for (simple) neural nets, and neural networks can be used for extracting fuzzy rules from the training data [379, 191]. Regarding neural nets, let us also mention a non-monotonic inference view [29, 197].

Association rules [12, 228] describe relations between variables together with confidence and support degrees. See [153] for the proper assessment of confidence and support degrees in the fuzzy case. In the same spirit, learning methods for fuzzy decision trees have been devised in [336], in the case of numerical attributes. The use of fuzzy sets to describe associations between data and decision trees may have some interest: extending the types of relations that may be represented, making easier the interpretation of rules in linguistic terms [159], and avoiding unnatural boundaries in the partitioning of the attribute domains.

There are other kinds of fuzzy rules whose primary goal is not to approximate functions nor to quantify associations, but rather to offer representation formats of interest. This is, for instance, the case of gradual rules, which express statements of the form “the more is , the more is , where , and are gradual properties modelled by fuzzy sets [429, 367].

3.5.3 Threshold rules

Another format of interest is the one of multiple threshold rules, i.e., selection rules of the form “if and and then ” (or deletion rules of the form ‘if and and then ”), which are useful in monotone classification / regression problems [215, 54]. Indeed when dealing with data that are made of a collection of pairs , where is a tuple of feature evaluations of item , and where is assumed to increase with the ’s in the broad sense, it is of interest of describing the data with such rules of various lengths. It has been noticed [216, 157] that, once the numerical data are normalized between 0 and 1, rules where all (non trivial) thresholds are equal can be represented by Sugeno integrals (a generalization of weighted min and weighted max, which is a qualitative counterpart of Choquet integrals [213]). Moreover, it has been shown recently [65] that generalized forms of Sugeno integrals are able to describe a global (increasing) function, taking values on a finite linearly ordered scale, under the form of general thresholded rules. Another approach, in the spirit of the version space approach [348], provides a bracketing of an increasing function by means of a pair of Sugeno integrals [388, 387].

3.6 Uncertainty in ML: in the data or in the model

We will focus on two aspects in which cross-fertilisation of ML with KRR could be envisaged: uncertainty in the data and uncertainty in the models/predictions.

3.6.1 Learning under uncertain and coarse data

In general, learning methods assume the data to be complete, typically in the form of examples being precise values (in the unsupervised case) or precise input/output pairs (in the supervised case). There are however various situations where data can be expected to be uncertain, such as when they are provided by human annotators in classification or measured by low-quality sensors, or even missing, such as when sensors have failed or when only a few examples could be labelled. An important remark is that the uncertainty attached to a particular piece of data can hardly be said to be of objective nature (representing frequency) as it has a unique value, and this even if this uncertainty is due to an aleatory process.

While the case of missing (fully imprecise) data is rather well-explored in the statistical [319] and learning [83] literature, the general case of uncertain data, where this uncertainty can be modelled using different representation tools of the literature, largely remains to be explored. In general, we can distinguish between two strategies:

  • The first one intends to extend the precise methods so that they can handle uncertain data, still retrieving a precise model from them. The most notable approaches consist in either extending the likelihood principle to uncertain data (e.g., [142] for evidential data, or [109]

    for coarse data), or to provide a precise loss function defined over partial data and then using it to estimate the empirical risk, see for instance

    [255, 105, 94]. Such approaches are sometimes based on specific assumptions, usually hard to check, about the process that makes data uncertain or partial. Some other approaches such as the evidential likelihood approach outlined in [142] do not start from such assumptions, and simply propose a generic way to deal with uncertain data. We can also mention transductive methods such as the evidential -nearest neighbour (-NN) rule [141, 140, 138], which allows one to handle partial (or “soft”) class labels without having to learn a model.

  • The second approach, much less explored, intends to make no assumptions at all about the underlying process making the data uncertain, and considers building the set of all possible models consistent with the data. Again, we can find proposals that extend probability-based approaches [127], as well as loss-based ones [110]. The main criticism one could address to such approaches is that they are computationally very challenging. Moreover they do not yield a single predictive model, making the prediction step potentially difficult and ill-defined, but also more robust.

The problem of handling partial and uncertain data is certainly widely recognised in the different fields of artificial intelligence, be it KRR or ML. One remark is that mainstream ML has, so far, almost exclusively focused on providing computationally efficient learning procedures adapted to imprecise data given in the form of sets, as well as the associated assumptions under which such a learning procedure may work [320]. While there are proposals around that envisage the handling of more complex form of uncertain data than just sets, such approaches remain marginal, at least for two major reasons:

  • More complex uncertainty models require more efforts at the data collection step, and the benefits of such an approach (compared to set-valued data or noisy precise data) do not always justify the additional efforts. However, there are applications in which the modeling of data uncertainty in the belief function framework does improve the performances in classification tasks [91, 396]. Another possibility could be that those data are themselves prediction of an uncertain model, then used in further learning procedures (such as in stacking [168]);

  • Using more complex representations may involve a higher computational cost, and the potential gain of using such representations is not always worth the try. However, some specific cases approaches such as the EM algorithm [142] or the -NN rule [138] in the evidential setting, make it possible to handle uncertain data without additional cost.

3.6.2 Uncertainty in the prediction model

Another step of the learning process where uncertainty can play an important role is in the characterisation of the model or its output values. In the following, we will limit ourselves to the supervised setting where we search to learn a (predictive) function linking an input observation to an output (prediction) . Assessing the confidence one has in a prediction can be important in sensitive applications. This can be done in different ways:

  • By directly impacting the model itself, for instance associating to every instance not a deterministic prediction , but an uncertainty model over the domain . The most common one is of course probabilities, but other solutions such as possibility distributions, belief functions or convex sets of probabilities are possible;

  • By allowing the prediction to become imprecise, the main idea behind such a strategy being to have weaker yet more reliable predictions. In the classical setting, this is usually done by an adequate replacement of the loss function [226, 214], yet recent approaches take a different road. For instance, imprecise probabilistic approaches consider sets of models combined with a skeptic inference (also a typical approach in KR), where a prediction is rejected if it is so for every possible model [98]. Conformal prediction [434] is another approach that can be plugged to any model output to obtain set-valued predictions. Approaches to quantify statistical predictions in the belief function framework are described in [278, 489, 139].

If such approaches are relatively well characterised for the simpler cases of multi-class classification, extending them to more complex settings such as multi-label or ranking learning problems that involve combinatorial spaces remain largely unexplored, with only a few contributions [90, 20]. It is quite possible that classical AI tools such as SAT or CSP solvers could help to deal with such combinatorial spaces.

3.7 Case-Based Reasoning, Analogical Reasoning and Transfer Learning

Case-based reasoning (CBR for short), e.g., [6] is a form of reasoning that exploits data (rather than knowledge) under the form of cases, often viewed as pairs problem, solution. When one seeks for potential solution(s) to a new problem, one looks for previous solutions to similar problems in the repertory of cases, and then adapts them (if necessary) to the new problem.

Case-based reasoning, especially when similarity is a matter of degree, thus appears to be close to k-NN methods and instance-based learning [141, 254]. The k-NN method is a prototypical example of transduction, i.e., the class of a new piece of data is predicted on the basis of previously observed data, without any attempt at inducing a generic model for the observed data. The term transduction was coined in [193], but the idea dates back to Bertrand Russell [411].

Another example of transduction is analogical proportion-based learning. Analogical proportions are statements of the form “ is to as is to ”, often denoted by , which express that “ differs from as differs from and differs from as differs from ”. This statement can be encoded into a Boolean logical expression [342, 385] which is true only for the 6 following assignments , , , , , and for . Note that they are also compatible with the arithmetic proportion definition , where , which is not a Boolean expression. Boolean Analogical proportions straightforwardly extend to vectors of attributes values such as , by stating . The basic analogical inference pattern [449], is then

Thus analogical reasoning amounts to finding completely informed triples appropriate for inferring the missing value(s) in . When there exist several suitable triples, possibly leading to distinct conclusions, one may use a majority vote for concluding. This inference method extends to analogical proportions between numerical values, and the analogical proportion becomes graded [156]. It has been successfully applied, for Boolean, nominal or numerical attributes, to classification [341, 59] (then the class (viewed as a nominal attribute) is the unique solution, when it exists, such as holds), and more recently to case-based reasoning [317] and to preference learning [176, 383]. It has been theoretically established that analogical classifiers always yield exact prediction for Boolean affine functions (which includes x-or functions), and only for them [103]. Good results can still be obtained in other cases [104]. Moreover, analogical inequalities [386] of the form “ is to at least as much as is to ” might be useful for describing relations between features in images, as in [302].

The idea of transfer learning, which may be viewed as a kind of analogical reasoning performed at the meta level, is to take advantage of what has been learnt on a source domain in order to improve the learning process in a target domain related to the source domain. When studying a new problem or a new domain, it is natural to try to identify a related, better mastered, problem or domain from which, hopefully, some useful information can be called upon for help. The emerging area of transfer learning is concerned with finding methods to transfer useful knowledge from a known source domain to a less known target domain.

The easiest and most studied problem is encountered in supervised learning. There, it is supposed that a decision function has been learned in the source domain and that a limited amount of training data is available in the target domain. For instance, suppose we have learnt a decision function that is able to recognize poppy fields in satellite images. Then the question is: could we use this in order to learn to recognize cancerous cells in biopsies rather than to start anew on this problem or when few labeled data is available in the biological domain?

This type of transfer problem has witnessed a spectacular rise of interest in recent years thanks both to the big data area that makes lots of data available in some domains, and to the onset of deep neural networks. In deep neural networks, the first layers of neuron like elements elaborate on the raw input descriptions by selecting relevant descriptors, while the last layers learn a decision function using these descriptors. Nowadays, most transfer learning methods rely on the idea of transferring the first layers when learning a new neural network on the target training data, adjusting only the last layers. The underlying motivation is that the descriptors are useful in both the source and target domains and what is specific is the decision function built upon these descriptors. But it could be defended just as well that the decision function is what is essential in both domains while the ML part should concentrate on learning an appropriate representation. This has been achieved with success in various tasks [99].

One central question is how to control what should be transferred. A common assumption is that transfer learning should involve a minimal amount of change of the source domain knowledge in order for it to be used in the target domain. Several ways of measuring this “amount of change” have been put forward (see for instance [106, 437]), but much work remains to be done before a satisfying theory is obtained.

One interesting line of work is related to the study of causality. Judea Pearl uses the term “transportability” instead of transfer learning, but the fundamental issues are the same. Together with colleagues, they have proposed ways of knowing if and what could be transferred from one domain to another [376]. The principles rely on descriptions of the domains using causal diagrams. Thanks to the “do-calculus”, formal rules can be used in order to identify what can be used from a source domain to help solve questions in the target domain. One foremost assumption is that causality relationships capture deep knowledge about domains and are somewhat preserved between different situations. For instance, proverbs in natural language are a way of encapsulating such deep causality relationships and their attractiveness comes from their usefulness in many domains or situations, when properly translated.

4 Examples of KRR/ML Synergies

In the previous section, we have surveyed various paradigms where KRR and ML aspects are intricately entwined together. In this section, we rather review examples of hybridizations where KRR and ML tools cooperate. In each case, we try to identify the purpose, the way the KRR and ML parts interact, and the expected benefits of this synergy.

4.1 Dempster-Shafer Reasoning and Generalized Logistic Regression Classifiers

The theory of belief functions originates from Dempster’s seminal work [137]

who proposed, at the end of the 1960’s, a method of statistical inference that extends both Fisher’s fiducial inference and Bayesian inference. In a landmark book, Shafer

[435] developed Dempster’s mathematical framework and extended its domain of application by showing that it could be proposed as a general language to express “probability judgements” (or degrees of belief) induced by items of evidence. This new theory rapidly became popular in Artificial Intelligence where it was named “Dempster-Shafer (DS) theory”, evidence theory, or the theory of belief functions. DS theory can be considered from different perspectives:

  • A belief function can be defined axiomatically as a Choquet monotone capacity of infinite order [435].

  • Belief functions are intimately related to the theory of random sets: any random set induces a belief function and, conversely, any belief function can be seen as being induced by a random set [363].

  • Sets are in one-to-one correspondence with so-called “logical” belief functions, and probability measures are special belief functions. A belief function can thus be seen both as a generalised probability measure and as a generalised set; it makes it possible to combine reasoning mechanisms from probability theory (conditioning, marginalisation), with set-theoretic operations (intersection, union, cylindrical extension, interval computations, etc.)

DS theory thus provides a very general framework allowing us to reason with imprecise and uncertain information. In particular, it makes it possible to represent states of knowledge close to total ignorance and, consequently, to model situations in which the available knowledge is too limited to be properly represented in the probabilistic formalism. Dempster’s rule of combination [435] is an important building block of DS theory, in that it provides a general mechanism for combining independent pieces of evidence.

The first applications of DS theory to machine learning date back to the 1990’s and concerned classifier combination [487, 405], each classifier being considered as a piece of evidence and combined by Dempster’s rule (see, e.g., [397] for a refinement of this idea taking into account the dependence between classifier outputs). In [141], Denœux combined Shafer’s idea of evidence combination with distance-based classification to introduce the evidential -NN classifier [141]. In this method, each neighbour of an instance to be classified is considered as a piece of evidence about the class of that instance and is represented by a belief function. The belief functions induced by the nearest neighbour are then combined by Dempster’s rule. Extensions of this simple scheme were later introduced in [507, 313, 138].

The evidential

-NN rule is, thus, the first example of an “evidential classifier”. Typically, an evidential classifier breaks down the evidence of each input feature vector into elementary mass functions and combines them by Dempster’s rule. The combined mass function can then be used for decision-making. Thanks to the generality and expressiveness of the belief function formalism, evidential classifiers provide more informative outputs than those of conventional classifiers. This expressiveness can be exploited, in particular, for uncertainty quantification, novelty detection and information fusion in decision-aid or fully automatic decision systems

[138].

In [138], it is shown that that not only distance-based classifiers such as the evidential

-NN rule, but also a broad class of supervised machine learning algorithms, can be seen as evidential classifiers. This class contains logistic regression and its non linear generalizations, including multilayer feedforward neural networks, generalized additive models, support vector machines and, more generally, all classifiers based on linear combinations of input or higher-order features and their transformation through the logistic or softmax transfer function. Such

generalized logistic regression classifiers can be seen as combining elementary pieces of evidence supporting each class or its complement using Dempster’s rule. The output class probabilities are then normalized plausibilities according to some underlying belief function. This “hidden” belief function provides a more informative description of the classifier output than the class probabilities, and can be used for decision-making. Also, the individual belief functions computed by each of the features provide insight into the internal operation of classifier and can help to interpret its decisions. This finding opens a new perspective for the study and practical application of a wide range of machine learning algorithms.

4.2 Maximum Likelihood Under Coarse Data

When data is missing or just imprecise (one then speaks of coarse data), statistical methods need to be adapted. In particular the question is whether one wishes to model the observed phenomenon along with the limited precision of observations, or despite imprecision. The latter view comes down to complete the data in some way (using imputation methods). A well-known method that does it is the EM algorithm [136]. This technique makes strong assumptions on the measurement process so as to relate the distribution ruling the underlying phenomenon and the one ruling the imprecise outcomes.It possesses variants based on belief functions [142]. EM is extensively used for clustering (using Gaussian mixtures) and learning Bayesian nets.

However the obtained result, where by virtue of the algorithm, data has become complete and precise, is not easy to interpret. If we want to be faithful to the data and its imperfections, one way is to build a model that accounts for the imprecision of observations, i.e., a set-valued model. This is the case if a belief function is obtained via maximum likelihood on imprecise observations: one optimises the visible likelihood function [109]. The idea is to cover all precise models that could have been derived, had the data been precise. Imprecise models are useful to lay bare ignorance when it is present, so as to urge finding more data, but it may be problematic for decision or prediction problems, when we have to act or select a value despite ignorance.

Ideally we should optimize the likelihood function based on the actual values hidden behind the imprecise observations. But such a likelihood function is ill-known in the case of coarse data [109]. In that case, we are bound

  • To make assumptions on the measurement process so as to create a tight link between the hidden likelihood function pertaining to the outcomes of the real phenomenon, and the visible likelihood of the imprecise observations (for instance the CAR (coarsening at random) assumption [234], or the superset assumption  [252]. In that case, the coarseness of the data can be in some sense ignored. See [266] for a general discussion.

  • Or to pick a suitable hidden likelihood function among the ones compatible with the imprecise data, for instance using an optimistic maximax approach that considers that the true sample is the best possible sample in terms of likelihood compatible with the imprecise observation [255]. This approach chooses a compatible probability distribution with the minimum of entropy, hence tends to disambiguate the data. On the contrary maximin approach considers that the true sample is the worst compatible sample in terms of likelihood. This approach chooses a compatible probability distribution with the maximum of entropy. Those two approaches suppose extreme point of view on the entropy of the probability distribution. More recently, an approach base on the likelihood ratio that maximize the minimal possible ratio over the compatible probability distribution is proposed in [221]. This approach achieving a trade-off between these two more extreme approaches and is able to quantify the quality of the chosen probability distribution in regards to all possible probability distribution. In these approaches, the measurement process is ignored.

See [107, 253] for more discussions about such methods for statistical inference with poor quality data.

Besides, another line of work for taking into account the scarcity of data in ML is to use a new cumulative entropy-like function that together considers the entropy of the probability distribution and the uncertainty pertaining to the estimation of its parameters. It takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples [431, 432]. Such a function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy [154]. This approach enables the expansion of decision trees to be limited when the number of examples at the current final nodes is too small.

4.3 EM Algorithm and Revision

Injecting concepts from KRR, when explaining the EM algorithm may help better figure out what it does. In the most usual case, the coarse data are elements of a partition of the domain of values of some hidden variable. Given a class of parametric statistical models, the idea is to iteratively construct a precise model that fits the data as much as possible, by first generating at each step a precise observation sample in agreement with the incomplete data, followed by the computation of a new model obtained by applying the maximum likelihood method to the last precise sample. These two steps are repeated until convergence to a model is achieved.

In [108], it has been shown that the observation sample implicitly built at each step can be represented by a probability distribution on the domain of the hidden variable that is in agreement with the observed frequencies of the coarse data. It is obtained by applying, at each step of the procedure, the oldest (probabilistic) revision rule well-known in AI, namely Jeffrey’s rule [272]

, to the current best parametric model. This form of revision considers a prior probability

on the domain of a variable , and new information made of a probability distribution over a partition of (representing the coarse data). If is the “new” probability of , the old distribution is revised so as to be in agreement with the new information. The revised probability function is of the form . The revision step minimally changes the prior probability function in the sense of Kullback-Leibler relative entropy.

In the case of the EM algorithm, is the frequency of the coarse observation , and is the current best parametric model. The distribution corresponds to a new sample of in agreement with the coarse observation. In other words the EM algorithm in turn revises the parametric model to make it consistent with the coarse data, and applies maximum entropy to the new obtained sample, thus minimizing the relative (entropic) distance between a parametric model and a probability distribution in agreement with the coarse data.

4.4 Conceptual Spaces and the Semantic Description of Vector Representations

Neural networks, and many other approaches in machine learning, crucially rely on vector representations. Compared to symbolic representations, using vectors has many advantages (e.g., their continuous nature often makes optimizing loss functions easier). At the same time, however, vector representations tend to be difficult to interpret, which makes the models that rely on them hard to explain as well. Since this is clearly problematic in many application contexts, there has been an increasing interest in techniques for linking vector spaces to symbolic representations. The main underlying principles go back to the theory of conceptual spaces [198], which was proposed by Gärdenfors as an intermediate representation level between vector space representations and symbolic representations. Conceptual spaces are essentially vector space models, as each object from the domain of discourse is represented as a vector, but they differ in two crucial ways. First, the dimensions of a conceptual space usually correspond to interpretable salient features. Second, (natural) properties and concepts are explicitly modelled as (convex) regions. Given a conceptual space representation, we can thus, e.g., enumerate which properties are satisfied by a given object, determine whether two concepts are disjoint or not, or rank objects according to a given (salient) ordinal feature.

Conceptual spaces were proposed as a framework for studying cognitive and linguistic phenomena, such as concept combination, metaphor and vagueness. As such, the problem of learning conceptual spaces from data has not received much attention. Within a broader setting, however, several authors have studied approaches for learning vector space representations that share important characteristics with conceptual spaces. The main focus in this context has been on learning vector space models with interpretable dimensions. For example, it has been proposed that non-negative matrix factorization leads to representations with dimensions that are easier to interpret than those obtained with other matrix factorization methods [306], especially when combined with sparseness constraints [245]. More recently, a large number of neural network models have been proposed with the aim of learning vectors with interpretable dimensions, under the umbrella term of disentangled representation learning [88, 235]. Another possibility, advocated in [144], is to learn (non-orthogonal) directions that model interpretable salient features within a vector space whose dimensions themselves may not be interpretable. Beyond interpretable dimensions, some authors have also looked at modelling properties and concepts as regions in a vector space. For example, [171] proposed to learn region representations of word meaning. More recent approaches along these lines include [468]

, where words are modelled as Gaussian distributions, and

[268], where word regions were learned using an ordinal regression model with a quadratic kernel. Some authors have also looked at inducing region based representations of concepts from the vector representations of known instances of these concepts [61, 223]. Finally, within a broader setting, some approaches have been developed that link vectors to natural language descriptions, for instance linking word vectors to dictionary definitions [236] or images to captions [279].

The aforementioned approaches have found various applications. Within the specific context of explainable machine learning, at least two different strategies may be pursued. One possibility is to train a model in the usual way (e.g., a neural network classifier), and then approximate this model based on the semantic description of the vector representations involved. For instance, [11] suggests a method for learning a rule based classifier that describes a feedforward neural network. The second possibility is to extract an interpretable (qualitative) representation from the vector space, e.g., by treating interpretable dimensions as ordinal features, and then train a model on that interpretable representation [144].

4.5 Combining Deep Learning with High Level Inference

While neural networks are traditionally learned in a purely data-driven way, several authors have explored approaches that are capable of taking advantage of symbolic knowledge. One common approach consists in relaxing symbolic rules, expressing available background knowledge, using fuzzy logic connectives. This results in a continuous representation of the background knowledge, which can then simply be added to the loss function of the neural network model [135]. Rather than directly regularizing the loss function in this way, [247] proposes an iterative method to ensure that the proportion of ground instances of the given rules that are predicted to be true by the neural network is in accordance with the confidence we have in these rules. To this end, after each iteration, they solve an optimisation problem to find the set of predictions that is closest to the predictions of the current neural network while being in accordance with the rules. The neural network is subsequently trained to mimic these regularized predictions. Yes another approach is proposed in [485], which proposes a loss function that encourages the output of a neural network to satisfy a predefined set of symbolic constraints, taking advantage of efficient weighted model counting techniques.

While the aforementioned approaches are aimed at using rules to improve neural network models, some authors have also proposed ways in which neural networks can be incorporated into symbolic formalisms. One notable example along these lines is the DeepProbLog framework from [332], where neural networks are essentially used to define probabilistic facts of a probabilistic logic program. Within a broader setting, vector space embeddings have also been used for predicting plausible missing rules in ontological rule bases [62, 311].

4.6 Knowledge Graph Completion

Knowledge graphs are a popular formalism for expressing factual relational knowledge using triples of the form (entity, relation, entity). In application fields such as natural language processing, they are among the most widely used knowledge representation frameworks. Such knowledge graphs are almost inevitably incomplete, however, given the sheer amount of knowledge about the world that we would like to have access to and given the fact that much of this knowledge needs to be constantly updated. This has given rise to a wide range of methods for automatic knowledge graph completion. On the one hand, several authors have proposed approaches for automatically extracting missing knowledge graph triples from text

[402]. On the other hand, a large number of approaches have been studied that aim to predict plausible triples based on statistical regularities in the given knowledge graph. Most of these approaches rely on vector space embeddings of the knowledge graph [56, 460]. The main underlying idea is to learn a vector of typically a few hundred dimensions for each entity , and a scoring function for each relation , such that the triple holds if and only if is sufficiently high. Provided that the number of dimensions is sufficiently high, any knowledge graph can in principle be modelled exactly in this way [284]. To a more limited extent, such vector representations can even capture ontological rules [224]. In practice, however, our aim is usually not to learn an exact representation of the knowledge graph, but to learn a vector representation which is predictive of triples that are plausible, despite not being among those in the given knowledge graph. Some authors have also proposed methods for incorporating textual information into knowledge graph embedding approaches. Such methods aim to learn vector space representations of the knowledge graph that depend on both the given knowledge graph triples and textual descriptions of the entities [503, 267, 481, 480], or their relationships [401, 459].

4.7 Declarative Frameworks for Data Mining and Clustering

Machine Learning and Data Mining can also be studied from the viewpoint of problem solving. From this point of view, two families of problems can be distinguished: enumeration and optimisation, the latter being either discrete or continuous.

Pattern mining is the best known example of enumeration problems, with the search for patterns satisfying some properties, as for instance to be frequent, closed, emergent …Besides, supervised classification is seen as the search for a model minimizing a given loss function, coupled to a regularization term for avoiding over-fitting, whereas unsupervised learning is modeled as the search of a set of clusters (a partition in many cases) optimizing a quality criterion (the sum of squared errors for instance for k-means). For sake of complexity optimisation problems usually rely on heuristic, with the risk of finding only a local optimum. All these approaches suffer from drawbacks. For instance in pattern mining the expert is often drowned under all the patterns satisfying the given criteria. In optimisation problems, a local optimum can be far from the expert expectations.

To prevent this, the notion of Declarative Data Mining has emerged, allowing the experts to express knowledge in terms of constraints on the desired models. It can be seen as a generalization of semi-supervised classification, where some points are already labelled with classes. Classical algorithms must then be adapted to take into account constraints and that has led to numerous extensions. Nevertheless, most extensions are dedicated to only one type of constraints, since the loss function has to be adapted to integrate their violation and the optimization method (usually a gradient descent) has to be adapted to the new optimization criterion. It has been shown in [131] that declarative frameworks, namely Constraint Programming in that paper, allow to model and handle different kinds of constraints in a generic framework, with no needs to rewrite the solving algorithm. This has been applied to pattern mining and then extended to k-pattern set mining with different applications, such as conceptual clustering or tiling [132, 287].

This pioneering work has opened the way to a new branch of research, mainly in Pattern mining and in Constrained Clustering. In this last domain, the constraints were mainly pairwise, e.g., a Mustlink (resp. Cannotlink) constraint expresses that two points must (resp. must not) be in the same cluster. Other constraints have been considered such as cardinality constraints on the size of the clusters, minimum split constraints between clusters. Different declarative frameworks have been used, as for instance SAT [126, 265], CP [123]

, Integer Linear Programming

[353, 27, 293, 368]. An important point is that such frameworks allow to easily embed symbolic and numerical information, for instance by considering a continuous optimisation criterion linked with symbolic constraints, or by considering two optimisation criteria and building a Pareto front [293].

Thus, declarative frameworks not only allow to easily integrate constraints in Machine Learning problems, but they enable the integration of more complex domain knowledge that goes beyond classical Machine Learning constraints, thus integrating truly meaningful conntraints [124]. Moreover, new use case for clustering can be considered as for instance given a clustering provided by an algorithm, find a new clustering satisfying new expert knowledge, modifying a minima the previous clustering. The price to pay is computational complexity, and the inability to address large datasets. A new research direction aims at studying how constraints could be integrated to deep learner [500]. Besides, the Constraint Programming community has also benefited from this new research direction by the development of global constraints tailored to optimize the modeling of Data Mining tasks, as for instance [285].

4.8 Machine Learning vs. Automated Reasoners

This subsection provides a brief overview of the applications of ML in practical Automated Reasoners, but also overviews the recent uses of Automating Reasoning in ML

111We adopt a common understanding of Automated Reasoning as “The study of automated reasoning helps produce computer programs that allow computers to reason completely, or nearly completely, automatically” (from https://en.wikipedia.org/wiki/Automated_reasoning).. With a few exceptions, the subsection emphasizes recent work, published over the last decade. Tightly related work, e.g. inductive logic programming (see Section 3.2) or statistical relational learning [133], is beyond the scope of this subsection. The subsection is organized in three main parts. First, subsubsection 4.8.1 overviews the applications of ML in developing and organizing automated reasoners. Second, subsubsection 4.8.2 covers the recent users of automated reasoning in learning ML models, improving the robustness of ML models, but also in explaining ML models. Finally, subsubsection 4.8.3 covers a number of recent topics at the intersection of automated reasoning and ML.

4.8.1 Learning for Reasoning

Until recently, the most common connection between ML and automated reasoning would be to apply the former when devising solutions for the latter. As a result, a wealth of attempts have been made towards applying ML in the design of automated reasoners, either for improving existing algorithms or for devising new algorithms, built on top of ML models. Uses of ML can be organized as follows. First, uses of ML for improving specific components of automated reasoners, or for automatic configuration or tuning of automated reasoners. Second, approaches that exploit ML for solving computationally hard decision, search and counting problems, and so offering alternatives to dedicated automated reasoners.

Improving Reasoners.

Earlier efforts on exploiting ML in automated reasoners was to improve specific components of reasoners by seeking guidance from some ML model. A wealth of examples exist, including the improvement of restarts in Boolean Satisfiability (SAT) solvers [227], improvement of branching heuristics [184, 217, 315, 314, 316], selection of abstractions for Quantified Boolean Formulas (QBF) solving [271, 305], but also for improving different components of theorem provers for first-order and higher order logics [461, 276, 275, 277, 264, 323, 93].

ML has found other uses for improving automated reasoners. A well-known example is the organization of portfolio solvers [488, 257, 258, 288, 338]. Another example is the automatic configuration of solvers, when the number of options available is large [256, 444]. One additional example is the automatic building of automated reasoners using ML [288].

Tackling Computationally Hard Problems.

Another line of work has been to develop solutions for solving computationally hard decision and search problems. Recent work showed promise in the use of NNs for solving satisfiable instances of SAT represented in clausal form [426, 427, 425, 472], for solving instances of SAT represented as circuits [15, 16], but also NP-complete problems in general [390]. The most often used approach has been to exploit variants of Graph Neural Networks (GCNs) [418], including Message Passing Neural Networks (MPNNs) [204]. There has also been recent work on solving CSPs [484] using convolutional NNs. Furthermore, there have been proposals for learning to solve SMT [30]

, combinatorial optimization problems 

[286, 37, 312, 46], planning problems [147], but also well-known specific cases of NP-complete decision problems, e.g. Sudoku [370] and TSP [469].

Efforts for tackling computationally harder problems have also been reported, including QBF [493], ontological reasoning [242], probabilistic logic programming [331], inference in probabilistic graphical models [494] and theorem proving for first order [274, 248, 32, 369, 249] and higher order logics [475, 471, 273, 492].

4.8.2 Reasoning for Learning

This subsection overviews uses of automated reasoning approaches for verifying, explaining and learning ML models.

Robust Machine Learning.

Concerns about the behavior of neural networks can be traced at least to the mid 90s and early 00s [474, 498, 423]. Additional early work on ensuring safety of neural networks also involved SAT solvers [393]. More recently, efforts on the verification of neural networks have focused on the avoidance of so-called adversarial examples.

Adversarial examples [450], already briefly mentioned in Subsection 2.4, illustrate the brittleness of ML models. In recent years, a number of unsettling examples served to raise concerns on the fragility neural networks can be in practice [23, 175, 81, 181, 233]

. Among other alternative approaches, automated reasoners have been applied to ensuring the robustness of ML models, emphasizing neural networks. A well-known line of work focuses on the avoidance of adversarial examples for neural networks using ReLU units 

[358] and proposes Reluplex, an SMT-specific dedicated reasoning engine for implementing reasoning with ReLU units [281, 212, 282]

. Another recent line of work addresses binarized neural networks 

[251] and develops a propositional encoding for assessing the robustness of BNNs [360, 362, 309]. Additional lines of work have been reported in the last two years [250, 409, 77, 295, 200, 347, 441, 443, 442, 26, 165, 74, 164, 394, 163, 470, 73, 473, 72, 433, 151, 31].

Interpretable ML Models.

Interpretable ML models are those from which rule-like explanations can be easily produced. For example, decision trees, decision sets (or rule sets) and rule lists are in general deemed interpretable, since one can explain predictions using rules. On area of research is the learning (or synthesis) of interpretable ML models using automated reasoning approaches. There has been continued efforts at learning decision trees [364, 366, 50, 365, 466, 359, 467, 465, 246], decision sets [299, 263, 328, 203] and rule lists [18, 17]. Examples of reasoners used include SAT, CP, and ILP solvers, but dedicated complete methods based on branch and bound search have also been considered. Despite a recent explosion of works on black-box ML models, there exist arguments for the use of interpretable models [410].

Explanations with Abductive Reasoning.

In many settings, interpretable models are not often the option of choice, being replaced by so-called black-box models, which include any ML model from which rules explaining predictions are not readily available. 222The definition of explanation is the subject of ongoing debate [344]. We use the intuitive notion of explanation as a IF-THEN rule [398, 325, 399], where some given prediction is made if a number of feature values holds true. The importance of reasoning about explanations is illustrated by a growing number of recent surveys [238, 241, 53, 352, 289, 239, 9, 13, 150, 240, 219, 415, 416, 344, 343, 19, 349, 483, 354]. Concrete examples include (deep) neural networks (including binarized versions), and boosted trees and random forests, among many other alternatives.

Most existing works on computing explanations resort to so-called local explanations. These models are agnostic and heuristic in nature [398, 325, 399]. Recent works [361, 262] revealed that local explanations do not hold globally, i.e., it is often the case that there are points in feature space, for which the local explanation holds, but for which the model’s prediction differs. Since 2018, a number of attempts have been reported, which propose rigorous approaches for computing explanations. Concretely, these recent attempts compute so-called abductive explanations, where each explanation corresponds to a prime implicant of the discrete function representing the constraint that the ML model predicts the target prediction. A first attempt based on compiling such a function into a tractable representation is reported elsewhere [439]. For such a representation, (shortest) prime implicants can then be extracted in polynomial time. The downside of this approach is that compilation may yield exponential size function representations. Another attempt [261] is based on computing explanations of demand, by encoding the instance, the ML model and the prediction into some logic representation. In this case, reasoners such as SMT, ILP or SAT solvers are then used for extracting (shortest) prime implicants.

Explanations vs. Adversarial Examples.

In recent years, different works realized the existence of some connection between adversarial examples (AE’s) and explanations (XP’s) [321, 452, 407, 457, 486, 371, 82]. Nevertheless, a theoretical connection between AE’s and XP’s has been elusive. Recent work [261] showed that adversarial examples can be computed from the set of explanations for some prediction. Furthermore, this work introduced the concept of counterexample (CEx) to some prediction, and identified a minimal hitting set relationship between XP’s and CEx’s, i.e., XP’s are minimal hitting sets of CEx’s and vice-versa.

4.8.3 More on Learning vs. Reasoning

The previous two subsections summarize recent efforts on using machine learning for automated reasoning, but also on using automated reasoning for learning, verifying and explaining ML models. This subsection identifies additional lines of research at the intersection of ML and automated reasoning.

Integrating Logic Reasoning in Learning.

A large body of work has been concerned with the integration of logic reasoning with ML. One well-known example is neural-symbolic learning [196, 119, 49, 372, 52, 333, 115]. See also Subsection 3.3. Examples of applications include program synthesis [502, 501, 372, 71] and neural theorem proving [345]. Other approaches do exist, including deep reasoning networks [87], neural logic machines [149], and abductive learning [122, 505]. An alternative is to embed symbolic knowledge in neural networks [482].

Learning for Inference.

One area of work is the use of ML models for learning logic representations, most often rules [403, 404, 491, 172, 173], which can serve for inference or for explaining predictions.

Understanding Logic Reasoning.

A natural question is whether ML systems understand logical formulas in order to decide entailment or unsatisfiability. There has been recent work on understanding entailment [174, 417], suggesting that this is not always the case, e.g., for convolutional NNs. In a similar fashion, recent work [89] suggests that GNNs mail fail at deciding unsatisfiability.

Synthesis of ML Models.

Recent work proposed the use of automated reasoners for the synthesis (i.e., learning) of ML models. Concrete examples include [182, 506, 318]. These approaches differ substantially from approaches for the synthesis of interpretable models, including decision trees and sets and decision lists.

As witnessed by the large bibliography surveyed in this subsection, the quantity, the breadth and the depth of existing work at the intersection between ML and automated reasoning in recent years, provides ample evidence that this body of work is expected to continue to expand at a fast pace in the near future.

4.9 Preferences and Recommendation

In both KRR and ML, models have evolved from binary ones (classical propositional or first-order logic, binary classification) to richer ones that take into account the need to propose less drastic outputs. One approach has been to add the possibility to order possible outputs / decisions. In multi-class classification tasks for instance, one approach [152] is to estimate, given an instance, the posterior probability of belonging to each possible class, and predict the class with highest probability. The possibility of learning to “order things” has numerous applications, e.g., in information retrieval, recommender systems. In KRR, the need to be able to order interpretations (rather than just classify them as possible / impossible, given the knowledge at hand) has proved to be an essential modelling paradigm, see, e.g., the success of valued CSPs [420], Bayesian networks [377], possibilistic / fuzzy logics among others.

At the intersection of ML and KRR, the field of “preference learning” has emerged. Furnkranz et al. [187, 188] describe various tasks that can be seen as preference learning, some where the output is a function that orders possible labels for each unseen instance, and some where the output is a function that orders any unseen set of new instances.

The importance of the notion of preference seems to have emerged first in economics and decision theory, and research in these fields focused essentially on utilitarian models of preferences, where utility function associates a real number with each one of the objects to be ordered. Also in this field research developed on preference elicitation, where some interaction is devised to help a decision maker form / lay bare her preferences, usually over a relatively small set of alternatives, possibly considering multiple objectives.

In contrast, preferences in AI often bear on combinatorial objects, like models of some logical theory to indicate for instance preferences over several goals of an agent; or, more recently, like combinations of interdependent decisions or configurable items of some catalog. Thus, in KRR as well as in ML, the objects to be ordered are generally characterised by a finite number of features, with a domain / set of possible values for each feature. When talking about preferences, the domains tend to be finite ; continuous domains can be discretised.

Because of the combinatorial nature of the space of objects, research in AI emphasized the need for compact models of preferences. Some probabilistic models, like Bayesian networks or Markov random fields, fall in this category, as well as e.g., additive utilities [183] and their generalisations. This focus on combinatorial objects also brought to light one difficulty with the utilitarian model: although it is often easy to compute the utilities or probabilities associated with two objects and compare them on such a basis, it appears to be often NP-hard to find optimal objects from a combinatorial set with numerical representations of preferences. Thus one other contribution of research in KRR is to provide preference representation languages where optimisation is computationally easy, like CP-nets [63].

These complex models of preferences have recently been studied from an ML perspective, both in an elicitation / active learning setting, and in a batch / passive learning setting. One particularity of these compact preference models is that they combine two elements: a structural element, indicating probabilistic or preferential interdependencies between the various features characterizing the objects of interest; and “local” preferences over small combinations of features. It is the structure learning phase which is often demanding, since finding the structure that best fits some data is often a hard combinatorial search problem. In contrast, finding the local preferences once the structure has been chosen is often easy.

The passive learning setting is particularly promising because of the vast dataset available in potential applications of preference learning in some decision aid systems like recommender systems or search engines. The possibility to learn Bayesian networks from data has been a key element for their early success in many applications. Note that in some applications, in particular in the study of biological systems, learning the structure, that is, the interdependencies between features, is interesting; in such applications, “black-box” models like deep neural networks seem less appropriate. This is also the case in decision-support systems where there is a need to explain the reasons justifying the computed ordering of possible decisions [36].

At the frontier between learning and reasoning lies what could be named lazy preference learning: given a set of preference statements which do not specify a complete preference relation, one can infer new pairwise comparisons between objects, by assuming some properties the full, unknown preference relation. As a baseline, many settings in the models studied in KRR assume transitivity of preferences, but this alone does not usually induce many new comparisons. A common additional assumption, made by [36], is that the preference relation can be represented with an additive utility function, and that the ordering over the domain of each feature is known. In [476, 477, 478], richer classes of input preference statements are considered, and the assumption is made that the preference relation has some kind of (unknown) lexicographic structure.

Lastly, as mentioned above, analogical proportions can be used for predicting preferences [176, 383, 60]. The general idea is that a preference between two items can be predicted if some analogical proportions hold that link their descriptions with the descriptions of other items for which preference relations are known.

5 Conclusion

The KRR and ML areas of AI have been developed independently to a large extent for several decades. It results that most of the researchers in one area are ignorant of what has been going on in the other area. The intended purpose of this joint work is to provide an inventory of the meeting points between KRR and ML lines of research. We have first reviewed some concerns that are shared by the two areas, maybe in different ways. Then we have surveyed various paradigms that are at the border between KRR and ML. Lastly, we have given an overview of different hybridizations of KRR and ML tools.

Let us emphasize that this is a work in progress. Subsections may be at this step unequally developed, and remain sketchy. The current list of references is certainly incomplete and unbalanced. The works covered may be old or recent, well-known as well as overlooked. However, at this step, we have absolutely no claim of completeness of any kind, not even of being fully up to date. Examples of topics not covered at all are numerous. They include reinforcement learning, argumentation and ML

[14], ontology representation and learning [21], in relation with the modeling of concepts, rough sets and rule extraction [374, 218], or formal studies of data such as version space learning [348] or logical analysis of data [58]; see also [92, 346].

Moreover, each topic covered in this paper is only outlined and would deserve to be discussed in further details. This paper is only a first step towards a more encompassing and extensive piece of work.

The aim of this paper is to help facilitating the understanding between researchers in the two areas, with a perspective of cross-fertilisation and mutual benefits. Still we should be aware that the mathematics of ML and the mathematics of KRR are quite different if we consider the main trends in each area. In ML the basic paradigm is a matter of approximating functions (which then calls for optimization). The mathematics of ML are close to those of signal processing and automatic control (as pointed out in [303]), while KRR is dominated by logic and discrete mathematics, leading to an – at least apparent – opposition between geometry and logic [329]333See also the conference https://www.youtube.com/watch?v=BpX890StRvs. But functions also underlie KRR, once one notices that a set of (fuzzy or weighted) rules is like an aggregation function [160], whose computation may take the form of a generalized matrix calculus (‘generalized’ in the sense that the operations are not necessarily restricted to sum and product). Let us also note that the convolution of functions (a key tool in signal processing) is no longer restricted to a linear, sum/product-based setting [351].

Besides, it seems difficult to envisage ML without KRR. For instance, it has been recently observed that

  • the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data [322];

  • local explanation methods for deep neural networks lack sensitivity to parameter values [10];

  • when trained on one task, then trained on a second task, many machine learning models “forget” how to perform the first task [210].

Such states of fact might call for some cooperation in the long range between ML and KRR.

6 Acknowledgements

The authors thank the CNRS GDR “Formal and Algorithmic Aspects of Artificial Intelligence” for its support.

References

  • [1] (2018)

    2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, salt lake city, ut, usa, june 18-22, 2018

    .
    IEEE Computer Society. External Links: Link Cited by: 175.
  • [2] (2018) 2018 IEEE symposium on security and privacy, SP 2018, proceedings, 21-23 may 2018, san francisco, california, USA. IEEE Computer Society. External Links: Link, ISBN 978-1-5386-4353-2 Cited by: 200.
  • [3] (2018) 21st international conference on information fusion, FUSION 2018, cambridge, uk, july 10-13, 2018. IEEE. External Links: Link, ISBN 978-0-9964527-6-2 Cited by: 457.
  • [4] (2017) 5th international conference on learning representations (ICLR’17), toulon, april 24-26, 2017, conf. track proc.. OpenReview.net, ICLR. External Links: Link Cited by: 372, 273.
  • [5] (2017) 5th international conference on learning representations, ICLR’17, toulon, - april 24-26, workshop track proc.. OpenReview.net. External Links: Link Cited by: 37.
  • [6] A. Aamodt and E. Plaza (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun. 7 (1), pp. 39–59. Cited by: §3.7.
  • [7] A. Abraham, M. Köppen, and K. Franke (Eds.) (2003) Proceedings of the third international conference on hybrid intelligent systems. Frontiers in Artificial Intelligence and Applications, Vol. 105, IOS Press. External Links: ISBN 1-58603-394-8 Cited by: 116.
  • [8] A. Abramé and D. Habet (2015) AHMAXSAT : description and evaluation of a branch and bound Max-SAT solver. Journal on Satisfiability, Boolean Modeling and Computation. Cited by: §2.2.
  • [9] A. Adadi and M. Berrada (2018) Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6, pp. 52138–52160. External Links: Document Cited by: footnote 2.
  • [10] J. Adebayo, J. Gilmer, I. J. Goodfellow, and B. Kim (2018) Local explanation methods for deep neural networks lack sensitivity to parameter values. In Workshop Track Proc. 6th Int. Conf. on Learning Representations, ICLR’18, Vancouver, April 30 - May 3, Cited by: 2nd item.
  • [11] T. Ager, O. Kuzelka, and S. Schockaert (2016) Inducing symbolic rules from entity embeddings using auto-encoders.. In NeSy@ HLAI, Cited by: §4.4.
  • [12] R. Agrawal, T. Imielinski, and A. Swami (1993) Mining association rules between sets of items in large databases. In Proc. SIGMOD Conf, Washington, DC, May 26-28, pp. 207–216. Cited by: §3.5.2.
  • [13] J. M. Alonso, C. Castiello, and C. Mencar (2018) A bibliometric analysis of the explainable artificial intelligence research field. In Proc. 17th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU’18, Cádiz, June 11-15, Part I, J. Medina, M. Ojeda-Aciego, J. L. V. Galdeano, D. A. Pelta, I. P. Cabrera, B. Bouchon-Meunier, and R. R. Yager (Eds.), Communications in Computer and Information Science, Vol. 853, pp. 3–15. Cited by: footnote 2.
  • [14] L. Amgoud and M. Serrurier (2008) Agents that argue and explain classifications. Autonomous Agents and Multi-Agent Systems 16 (2), pp. 187–209. Cited by: §5.
  • [15] S. Amizadeh, S. Matusevych, and M. Weimer (2019) Learning to solve Circuit-SAT: an unsupervised differentiable approach. In 7th Int. Conf. on Learning Representations (ICLR’19), New Orleans, LA, May 6-9, Cited by: §4.8.1.
  • [16] S. Amizadeh, S. Matusevych, and M. Weimer (2019) PDP: A general neural framework for learning constraint satisfaction solvers. CoRR abs/1903.01969. External Links: Link, 1903.01969 Cited by: §4.8.1.
  • [17] E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin (2017) Learning certifiably optimal rule lists for categorical data. J. Mach. Learn. Res. 18, pp. 234:1–234:78. External Links: Link Cited by: §4.8.2.
  • [18] E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin (2017) Learning certifiably optimal rule lists. In Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Halifax, August 13 - 17, pp. 35–44. Cited by: §4.8.2.
  • [19] S. Anjomshoae, A. Najjar, D. Calvaresi, and K. Främling (2019) Explainable agents and robots: results from a systematic literature review. In AAMAS, pp. 1078–1088. Cited by: footnote 2.
  • [20] A. Antonucci and G. Corani (2017) The multilabel naive credal classifier. Int. J. of Approximate Reasoning 83, pp. 320–336. Cited by: §3.6.2.
  • [21] R. Arp, B. Smith, and A. D. Spear (2015) Building ontologies with basic formal ontology. MIT Press. Cited by: §5.
  • [22] Z. Assaghir, M. Kaytoue, and H. Prade (2010) A possibility theory-oriented discussion of conceptual pattern structures. In Proc. 4th Int. Conf. (SUM’10)on Scalable Uncertainty Management Toulouse, Sept. 27-29, A. Deshpande and A. Hunter (Eds.), LNCS, Vol. 6379, pp. 70–83. Cited by: §3.4.
  • [23] A. M. Aung, Y. Fadila, R. Gondokaryono, and L. Gonzalez (2017) Building robust deep neural networks for road sign detection. CoRR abs/1712.09327. External Links: Link, 1712.09327 Cited by: §4.8.2.
  • [24] M. Ayel and M. Rousset (1990) La cohérence dans les bases de connaissances. Cepadues. Cited by: §2.3.
  • [25] F. Baader, S. Brandt, and C. Lutz (2005) Pushing the EL envelope. In Proc. 19th Int. Joint Conf. on Artificial Intelligence (IJCAI’05), Edinburgh, July 30 - Aug. 5, L. P. Kaelbling and A. Saffiotti (Eds.), pp. 364–369. Cited by: §2.2.
  • [26] M. Baader, M. Mirman, and M. T. Vechev (2019) Universal approximation with certified networks. CoRR abs/1909.13846. External Links: Link, 1909.13846 Cited by: §4.8.2.
  • [27] B. Babaki, T. Guns, and S. Nijssen (2014) Constrained clustering using column generation. In Integration of AI and OR Techniques in Constraint Programming - Proc. 11th Int. Conf. CPAIOR’14, Cork, Ireland, May 19-23, pp. 438–454. Cited by: §4.7.
  • [28] J. Baget, S. Benferhat, Z. Bouraoui, M. Croitoru, M. Mugnier, O. Papini, S. Rocher, and K. Tabia (2016) Inconsistency-tolerant query answering: rationality properties and computational complexity analysis. See DBLP:conf/jelia/2016, pp. 64–80. External Links: Link, Document Cited by: §2.2.
  • [29] C. Balkenius and P. Gärdenfors (1991) Nonmonotonic inferences in neural networks. In Proc. 2nd Int. Conf. on Princip. of Knowl. Represent. and Reas. (KR’91). Cambridge, MA, pp. 32–39. Cited by: §3.5.2.
  • [30] M. Balunovic, P. Bielik, and M. T. Vechev (2018) Learning to solve SMT formulas. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 10338–10349. External Links: Link Cited by: §4.8.1.
  • [31] T. Baluta, S. Shen, S. Shinde, K. S. Meel, and P. Saxena (2019) Quantitative verification of neural networks and its security applications. See Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, CCS 2019, london, uk, november 11-15, 2019, Cavallaro et al., pp. 1249–1264. External Links: Link, Document Cited by: §4.8.2.
  • [32] K. Bansal, S. M. Loos, M. N. Rabe, C. Szegedy, and S. Wilcox (2019) HOList: an environment for machine learning of higher order logic theorem proving. In Proc. 36th Int. Conf. on Machine Learning (ICML’19), 9-15 June, Long Beach, California, pp. 454–463. Cited by: §4.8.1.
  • [33] Ch. Baral (2003, 2010) Knowledge representation, reasoning and declarative problem solving. Cambridge University Press. Cited by: §1.
  • [34] P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.) (2012) Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. proceedings of a meeting held december 3-6, 2012, lake tahoe, nevada, united states. External Links: Link Cited by: 184.
  • [35] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In Computational Logic - CL 2000, Proc. 1st Int. Conf., London, 24-28 July, J. W. Lloyd, V. Dahl, U. Furbach, M. Kerber, K. Lau, C. Palamidessi, L. M. Pereira, Y. Sagiv, and P. J. Stuckey (Eds.), LNCS, Vol. 1861, pp. 972–986. Cited by: §3.4.
  • [36] K. Belahcene, C. Labreuche, N. Maudet, V. Mousseau, and W. Ouerdane (2017) Explaining robust additive utility models by sequences of preference swaps. Theory and Decision 82 (2), pp. 151–183. External Links: Document Cited by: §4.9, §4.9.
  • [37] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio (2017) Neural combinatorial optimization with reinforcement learning. See 5, External Links: Link Cited by: §4.8.1.
  • [38] R. Belohlavek (2002) Fuzzy relational systems. foundations and principles. Kluwer. Cited by: §3.4.
  • [39] S. Benferhat, D. Dubois, L. Garcia, and H. Prade (2002) On the transformation between possibilistic logic bases and possibilistic causal networks. Int. J. Approx. Reasoning 29 (2), pp. 135–173. Cited by: §2.3.
  • [40] S. Benferhat, D. Dubois, S. Lagrue, and H. Prade (2003) A big-stepped probability approach for discovering default rules. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 11 (Supplement-1), pp. 1–14. Cited by: §3.5.1.
  • [41] S. Benferhat, D. Dubois, and H. Prade (1997) Some syntactic approaches to the handling of inconsistent knowledge bases: A comparative study. Part 1: the flat case. Studia Logica 58 (1), pp. 17–45. Cited by: §2.3.
  • [42] S. Benferhat, D. Dubois, and H. Prade (1999) An overview of inconsistency-tolerant inferences in prioritized knowledge bases. In Fuzzy Sets, Logics and Reasoning about Knowledge, D. Dubois, H. Prade, and E. P. Klement (Eds.), pp. 395–417. Cited by: §3.5.1.
  • [43] S. Benferhat and K. Tabia (2012) Inference in possibilistic network classifiers under uncertain observations. Ann. Math. Artif. Intell. 64 (2-3), pp. 269–309. Cited by: §2.3.
  • [44] S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.) (2018) Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal. Cited by: 331, 312, 274, 30, 441, 370, 501, 452.
  • [45] Y. Bengio and Y. LeCun (Eds.) (2014) 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, conference track proceedings. External Links: Link Cited by: 450.
  • [46] Y. Bengio, A. Lodi, and A. Prouvost (2018) Machine learning for combinatorial optimization: a methodological tour d’horizon. CoRR abs/1811.06128. External Links: Link, 1811.06128 Cited by: §4.8.1.
  • [47] P. Berkhin, R. Caruana, and X. Wu (Eds.) (2007) Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, san jose, california, usa, august 12-15, 2007. ACM. External Links: ISBN 978-1-59593-609-7 Cited by: 364.
  • [48] P. Besnard and A. Hunter (1998) Reasoning with actual and potential contradictions. Handbook of Defeasible Reasoning and Uncertainty Management Systems, Vol.2, (D. Gabbay and Ph. Smets, series editors, Kluwer Acad. Publ.. Cited by: §2.3.
  • [49] T. R. Besold, A. S. d’Avila Garcez, S. Bader, H. Bowman, P. M. Domingos, P. Hitzler, K. Kühnberger, L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, and G. Zaverucha (2017) Neural-symbolic learning and reasoning: A survey and interpretation. CoRR abs/1711.03902. External Links: Link, 1711.03902 Cited by: §3.3, §4.8.3.
  • [50] C. Bessiere, E. Hebrard, and B. O’Sullivan (2009) Minimising decision tree size as combinatorial optimisation. In Principles and Practice of Constraint Programming - CP 2009, Proc. 15th Int. Conf. CP’09, Lisbon, Sept. 20-24, I. P. Gent (Ed.), LNCS, Vol. 5732, pp. 173–187. Cited by: §4.8.2.
  • [51] O. Beyersdorff and C. M. Wintersteiger (Eds.) (2018) Theory and applications of satisfiability testing - SAT 2018 - proc. 21st int. conf. SAT’18, held as part of the federated logic conference, floc 2018, oxford, july 9-12. LNCS, Vol. 10929, Springer. External Links: Link, Document, ISBN 978-3-319-94143-1 Cited by: 316.
  • [52] S. Bhatia, P. Kohli, and R. Singh (2018) Neuro-symbolic program corrector for introductory programming assignments. See Proc. 40th int. conf. on software engineering, ICSE 2018, gothenburg, sweden, may 27 - june 03, 2018, Chaudron et al., pp. 60–70. External Links: Link, Document Cited by: §4.8.3.
  • [53] O. Biran and C. Cotton (2017) Explanation and justification in machine learning: a survey. In IJCAI-17 workshop on explainable AI (XAI), Vol. 8, pp. 1. Cited by: footnote 2.
  • [54] J. Błaszczyński, R. Słowiński, and M. Szelag (2011) Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences 181 (5), pp. 987 – 1002. Cited by: §3.5.3.
  • [55] B. Bonet and S. Koenig (Eds.) (2015) Proceedings of the twenty-ninth AAAI conference on artificial intelligence (aaai 2015). AAAI Press. External Links: Link Cited by: 335.
  • [56] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: §4.6.
  • [57] S.-E. Bornscheuer, S. Hölldobler, Y. Kalinke, and A. Strohmaier (1998) Massively parallel reasoning. In Automated Deduction - A Basis for Applications. Volume II: Systems and Implementation Techniques, W. Bibel and P. H. Schmitt (Eds.), pp. 291–321. Cited by: §3.3.
  • [58] E. Boros, Y. Crama, P. L. Hammer, T. Ibaraki, A. Kogan, and K. Makino (2011) Logical analysis of data: classification with justification. Annals OR 188 (1), pp. 33–61. Cited by: §5.
  • [59] M. Bounhas, H. Prade, and G. Richard (2017) Analogy-based classifiers for nominal or numerical data. Int. J. Approx. Reasoning 91, pp. 36–55. Cited by: §2.3, §3.7.
  • [60] M. Bounhas, M. Pirlot, H. Prade, and O. Sobrie (2019) Comparison of analogy-based methods for predicting preferences. In Proc. 13th Int. Conf on Scalable Uncertainty Management (SUM’19), Compiègne, France, December 16-18, N. B. Amor, B. Quost, and M. Theobald (Eds.), LNCS, Vol. 11940, pp. 339–354. Cited by: §4.9.
  • [61] Z. Bouraoui and S. Schockaert (2018) Learning conceptual space representations of interrelated concepts. In Proc. 27th Int. Joint Conf. on Artificial Intelligence, pp. 1760–1766. Cited by: §4.4.
  • [62] Z. Bouraoui and S. Schockaert (2019) Automated rule base completion as bayesian concept induction. In Proc. 33rd AAAI Conference on Artificial Intelligence, Cited by: §4.5.
  • [63] C. Boutilier, R. I. Brafman, C. Domshlak, H. H. Hoos, and D. Poole (2004) CP-nets: a tool for representing and reasoning with conditional ceteris paribus preference statements. Journal of Artificial Intelligence Research 21, pp. 135–191. Cited by: §4.9.
  • [64] C. Boutilier (Ed.) (2009) Proceedings of the 21st international joint conference on artificial intelligence (ijcai’09). Cited by: 477.
  • [65] Q. Brabant, M. Couceiro, D. Dubois, H. Prade, and A. Rico (2018) Extracting decision rules from qualitative data via Sugeno utility functionals. In Proc. 17th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU’18), Cádiz, June 11-15, Part I, J. Medina, M. Ojeda-Aciego, J. L. V. Galdeano, D. A. Pelta, I. P. Cabrera, B. Bouchon-Meunier, and R. R. Yager (Eds.), CCIS, Vol. 853, pp. 253–265. Cited by: §3.5.3.
  • [66] R.J. Brachman and H.J. Levesque (2004) Knowledge representation and reasoning. Elsevier. Note: with a contribution of M. Pagnucco Cited by: §1.
  • [67] G. Brewka, V. W. Marek, and M. Truszczynski (Eds.) (2011) Nonmonotonic reasoning. essays celebrating its 30th anniversary. Studies in Logic, Vol. 31, College Publication. Cited by: §2.3, §3.5.1.
  • [68] G. Brewka, S. Coradeschi, A. Perini, and P. Traverso (Eds.) (2006) Proceedings of the 17th european conference on artificial intelligence (ecai 2006). Frontiers in Artificial Intelligence and Applications, IOS Press. Cited by: 476.
  • [69] S. Bromberger (Ed.) (1992) On what we know we don’t know: explanation, theory, linguistics, and how questions shape them. University of Chicago Press. Cited by: §2.4.
  • [70] K. Brünnler and G. Metcalfe (Eds.) (2011) Automated reasoning with analytic tableaux and related methods - proc. 20th int. conf. TABLEAUX 2011, bern, switzerland, july 4-8,. LNCS, Vol. 6793, Springer. External Links: Link, Document, ISBN 978-3-642-22118-7 Cited by: 461.
  • [71] R. Bunel, M. J. Hausknecht, J. Devlin, R. Singh, and P. Kohli (2018) Leveraging grammar and reinforcement learning for neural program synthesis. In 6th Int. Conf. on Learning Representations (ICLR’18), Vancouver, April 30 - May 3, Conf. Track Proc., Cited by: §4.8.3.
  • [72] R. Bunel, J. Lu, I. Turkaslan, P. H. S. Torr, P. Kohli, and M. P. Kumar (2019) Branch and bound for piecewise linear neural network verification. CoRR abs/1909.06588. External Links: Link, 1909.06588 Cited by: §4.8.2.
  • [73] R. Bunel, I. Turkaslan, P. H. S. Torr, P. Kohli, and M. P. Kumar (2017) Piecewise linear neural network verification: A comparative study. CoRR abs/1711.00455. External Links: Link, 1711.00455 Cited by: §4.8.2.
  • [74] R. Bunel, I. Turkaslan, P. H. S. Torr, P. Kohli, and P. K. Mudigonda (2018) A unified view of piecewise linear neural network verification. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS’18), 3-8 Dec., Montréal, pp. 4795–4804. Cited by: §4.8.2.
  • [75] C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger (Eds.) (2013) Proceedings of the 27th annual conference on neural information processing systems (NIPS 2013). Cited by: 446.
  • [76] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati (2005) DL-lite: tractable description logics for ontologies. In Proc. 20th National Conf. on Artificial Intelligence (AAAI’05), July 9-13, Pittsburgh, M. M. Veloso and S. Kambhampati (Eds.), pp. 602–607. Cited by: §2.2.
  • [77] L. Cardelli, M. Kwiatkowska, L. Laurenti, N. Paoletti, A. Patane, and M. Wicker (2019) Statistical guarantees for the robustness of bayesian neural networks. In Proc. 28th Int. Joint Conf. on Artificial Intelligence (IJCAI’19), Macao, Aug. 10-16, S. Kraus (Ed.), pp. 5693–5700. Cited by: §4.8.2.
  • [78] W. Carnielli and M. Coniglio (2016) Paraconsistent logic: consistency, contradiction and negation. Springer. Cited by: §2.3.
  • [79] J. Castro (1995) Fuzzy logic controllers are universal approximators. IEEE Trans. Systems, Man, and Cybernetics 25, pp. 629 – 635. Cited by: §3.5.2.
  • [80] L. Cavallaro, J. Kinder, X. Wang, and J. Katz (Eds.) (2019) Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, CCS 2019, london, uk, november 11-15, 2019. ACM. External Links: Link, Document, ISBN 978-1-4503-6747-9 Cited by: 31.
  • [81] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay (2018) Adversarial attacks and defences: A survey. CoRR abs/1810.00069. External Links: Link, 1810.00069 Cited by: §4.8.2.
  • [82] P. Chalasani, S. Jha, A. Sadagopan, and X. Wu (2018) Adversarial learning and explainability in structured datasets. CoRR abs/1810.06583. External Links: Link, 1810.06583 Cited by: §4.8.2.
  • [83] O. Chapelle, B. Schölkopf, and A. Zien (2006) Semi-supervised learning. MIT Press. Cited by: §3.6.1.
  • [84] L. Charnay, J. Dibie, and S. Loiseau (2019) Validation and explanation. In A Guided Tour of Artificial Intelligence Research. Vol. 1: Knowledge Representation, Reasoning and Learning, P. Marquis, O. Papini, and H. Prade (Eds.), pp. . Cited by: §2.4.
  • [85] M. Chaudron, I. Crnkovic, M. Chechik, and M. Harman (Eds.) (2018) Proc. 40th int. conf. on software engineering, ICSE 2018, gothenburg, sweden, may 27 - june 03, 2018. ACM. External Links: Link, Document, ISBN 978-1-4503-5638-1 Cited by: 52.
  • [86] M. Chein and M.-L. Mugnier (2009) Graph-based knowledge representation: computational foundations of conceptual graphs. Springer, Princeton, N.J.. Cited by: §2.1.
  • [87] D. Chen, Y. Bai, W. Zhao, S. Ament, J. M. Gregoire, and C. P. Gomes (2019) Deep reasoning networks: thinking fast and slow. CoRR abs/1906.00855. External Links: Link, 1906.00855 Cited by: §4.8.3.
  • [88] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2172–2180. Cited by: §4.4.
  • [89] Z. Chen and Z. Yang (2019) Graph neural reasoning may fail in certifying boolean unsatisfiability. CoRR abs/1909.11588. External Links: Link, 1909.11588 Cited by: §4.8.3.
  • [90] W. Cheng, E. Hüllermeier, W. Waegeman, and V. Welker (2012) Label ranking with partial abstention based on thresholded probabilistic models. In Advances in Neural Information Processing Systems 25: 26th Annual Conf. on Neural Information Processing Systems 2012. Proc. of a meeting held December 3-6, Lake Tahoe, Nevada, P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 2510–2518. Cited by: §3.6.2.
  • [91] Z. L. Cherfi, L. Oukhellou, E. Côme, T. Denœux, and P. Aknin (2012) Partially supervised independent factor analysis using soft labels elicited from multiple experts: application to railway track circuit diagnosis. Soft Computing 16 (5), pp. 741–754. Cited by: 1st item.
  • [92] I. Chikalov, V. V. Lozin, I. Lozina, M. Moshkov, H. S. Nguyen, A. Skowron, and B. Zielosko (2013) Three approaches to data analysis - test theory, rough sets and logical analysis of data. Intelligent Systems Reference Library, Vol. 41, Springer. Cited by: §5.
  • [93] K. Chvalovský, J. Jakubuv, M. Suda, and J. Urban (2019) ENIGMA-NG: efficient neural and gradient-boosted inference guidance for E. In Automated Deduction - CADE’19 - Proc. 27th Int. Conf. on Automated Deduction, Natal, Brazil, Aug. 27-30, P. Fontaine (Ed.), LNCS, Vol. 11716, pp. 197–215. Cited by: §4.8.1.
  • [94] J. Cid-Sueiro (2012) Proper losses for learning from partial labels. In Advances in neural information processing systems, pp. 1565–1573. Cited by: 1st item.
  • [95] C. A. C. Coello (Ed.) (2011) Learning and intelligent optimization - 5th international conference, LION 5, rome, italy, january 17-21, 2011. selected papers. LNCS, Vol. 6683, Springer. External Links: Link, Document, ISBN 978-3-642-25565-6 Cited by: 257.
  • [96] W. W. Cohen, A. McCallum, and S. T. Roweis (Eds.) (2008) Machine learning, proceedings of the twenty-fifth international conference (ICML 2008), helsinki, finland, june 5-9, 2008. ACM International Conference Proceeding Series, Vol. 307, ACM. External Links: ISBN 978-1-60558-205-4 Cited by: 366.
  • [97] V. Conitzer, G. K. Hadfield, and S. Vallor (Eds.) (2019) Proceedings of the 2019 AAAI/ACM conference on ai, ethics, and society, AIES 2019, honolulu, hi, usa, january 27-28, 2019. ACM. External Links: Link, Document, ISBN 978-1-4503-6324-2 Cited by: 203.
  • [98] G. Corani, A. Antonucci, and M. Zaffalon (2012) Bayesian networks with imprecise probabilities: theory and application to classification. In Data Mining: Foundations and Intelligent Paradigms, pp. 49–93. Cited by: 2nd item.
  • [99] A. Cornuéjols, S. Akkoyunlu, P. Murena, and R. Olivier (2017) Transfer learning by boosting projections from target to source. In Conférence Francophone sur l’Apprentissage Automatique (CAP’17), Cited by: §3.7.
  • [100] A. Cornuejols, F. Koriche, and R. Nock (2020) Statistical computational learning. In A Guided Tour of Artificial Intelligence Research. Vol. 1 Knowledge Representation, Reasoning and Learning, P. Marquis, O. Papini, and H. Prade (Eds.), pp. 341–388. Cited by: §2.1.
  • [101] A. Cornuejols and C. Vrain (2020) Designing algorithms for machine learning and data mining. In A Guided Tour of Artificial Intelligence Research. Vol. 2 Artificial Intelligence Algorithms, P. Marquis, O. Papini, and H. Prade (Eds.), pp. 339–410. Cited by: §2.1.
  • [102] C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.) (2015) Advances in neural information processing systems 28: annual conference on neural information processing systems, dec. 7-12, montreal. External Links: Link Cited by: 469.
  • [103] M. Couceiro, N. Hug, H. Prade, and G. Richard (2017) Analogy-preserving functions: A way to extend boolean samples. In Proc. 26th Int. Joint Conf. on Artificial Intelligence, (IJCAI’17), Melbourne, Aug. 19-25, pp. 1575– 1581. Cited by: §3.7.
  • [104] M. Couceiro, N. Hug, H. Prade, and G. Richard (2018) Behavior of analogical inference w.r.t. Boolean functions. In Proc. 27th Int. Joint Conf. on Artificial Intelligence, (IJCAI’18), Stockholm, July. 13-19, pp. 2057–2063. Cited by: §3.7.
  • [105] T. Cour, B. Sapp, and B. Taskar (2011) Learning from partial labels. Journal of Machine Learning Research 12 (May), pp. 1501–1536. Cited by: 1st item.
  • [106] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy (2016) Optimal transport for domain adaptation. IEEE Trans. on Pattern Analysis and Machine Intelligence 39 (9), pp. 1853–1865. Cited by: §3.7.
  • [107] I. Couso, D. Dubois, and E. Hüllermeier (2017) Maximum likelihood estimation and coarse data. In Proc 11th Int. Conf. on Scalable Uncertainty Management (SUM’17), LNCS, Vol. 10564, pp. 3–16. Cited by: §4.2.
  • [108] I. Couso and D. Dubois (2016) Belief revision and the EM algorithm. In Information Processing and Management of Uncertainty in Knowledge-Based Systems - Proc. 16th International Conference, IPMU 2016, Part II, J. P. Carvalho, M. Lesot, U. Kaymak, S. M. Vieira, B. Bouchon-Meunier, and R. R. Yager (Eds.), Communications in Computer and Information Science, Vol. 611, pp. 279–290. Cited by: §4.3.
  • [109] I. Couso and D. Dubois (2018) A general framework for maximizing likelihood under incomplete data. Int. J. of Approximate Reasoning 93, pp. 238–260. Cited by: 1st item, §4.2, §4.2.
  • [110] I. Couso and L. Sánchez (2016) Machine learning models, epistemic set-valued data and generalized loss functions: an encompassing approach. Information Sciences 358, pp. 129–150. Cited by: 2nd item.
  • [111] F. G. Cozman (2005) Graphical models for imprecise probabilities. Int. J. of Approximate Reasoning 39, pp. 167–184. Cited by: §2.3.
  • [112] F. G. Cozman (2000) Credal networks. Artif. Intell. 120 (2), pp. 199–233. Cited by: §2.1.
  • [113] F. G. Cozman (2020) Languages for probabilistic modeling over structured domains. In A Guided Tour of Artificial Intelligence Research. Vol. 2 Artificial Intelligence Algorithms, P. Marquis, O. Papini, and H. Prade (Eds.), pp. 247–283. Cited by: §2.3.
  • [114] F. d’Alché-Buc, V. Andrés, and J.-P. Nadal (1994) Rule extraction with fuzzy neural network. Int. J. Neural Syst. 5 (1), pp. 1–11. Cited by: §3.5.2.
  • [115] A. S. d’Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran (2019) Neural-symbolic computing: an effective methodology for principled integration of machine learning and reasoning. CoRR abs/1905.06088. External Links: Link, 1905.06088 Cited by: §4.8.3.
  • [116] A. S. d’Avila Garcez, L. C. Lamb, and D. M. Gabbay (2003) Neural-symbolic intuitionistic reasoning. See Proceedings of the third international conference on hybrid intelligent systems, Abraham et al., pp. 399–408. Cited by: §3.3.
  • [117] A. S. d’Avila Garcez, L. C. Lamb, and D. M. Gabbay (2006) Connectionist computations of intuitionistic reasoning. Theoretical Computer Science 358 (1), pp. 34–55. External Links: Document Cited by: §3.3.
  • [118] A. S. d’Avila Garcez, L. C. Lamb, and D. M. Gabbay (2007) Connectionist modal logic: representing modalities in neural networks. Theoretical Computer Science 371 (1-2), pp. 34–53. External Links: Document Cited by: §3.3.
  • [119] A. S. d’Avila Garcez, L. C. Lamb, and D. M. Gabbay (2009) Neural-symbolic cognitive reasoning. Cognitive Technologies, Springer. External Links: Link, Document, ISBN 978-3-540-73245-7 Cited by: §4.8.3.
  • [120] A. S. d’Avila Garcez and L. C. Lamb (2003) Reasoning about time and knowledge in neural symbolic learning systems. See Advances in neural information processing systems 16 (NIPS 2003), Thrun et al., pp. 921–928. Cited by: §3.3.
  • [121] A. S. d’Avila Garcez and G. Zaverucha (1999) The connectionist inductive learning and logic programming system. Applied Intelligence 11 (1), pp. 59–77. External Links: Document Cited by: §3.3.
  • [122] W. Dai, Q. Xu, Y. Yu, and Z. Zhou (2018) Tunneling neural perception and logic reasoning through abductive learning. CoRR abs/1802.01173. External Links: Link, 1802.01173 Cited by: §4.8.3.
  • [123] T. Dao, K. Duong, and C. Vrain (2017) Constrained clustering by constraint programming. Artif. Intell. 244, pp. 70–94. External Links: Link, Document Cited by: §4.7.
  • [124] T. Dao, C. Vrain, K. Duong, and I. Davidson (2016) A framework for actionable clustering using constraint programming. In ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 Aug.-2 Sept. 2016, The Hague, pp. 453–461. Cited by: §4.7.
  • [125] A. Darwiche and P. Marquis (2002) A knowledge compilation map. Journal of Artificial Intelligence Research 17, pp. 229–264. External Links: Document Cited by: §2.2.
  • [126] I. Davidson, S. S. Ravi, and L. Shamis (2010) A sat-based framework for efficient constrained clustering. In Proc. SIAM Int. Conf. on Data Mining (SDM’10), April 29 - May 1, Columbus, Ohio, pp. 94–105. External Links: Link, Document Cited by: §4.7.
  • [127] G. De Cooman and M. Zaffalon (2004) Updating beliefs with incomplete observations. Artificial Intelligence 159 (1-2), pp. 75–125. Cited by: 2nd item.
  • [128] B. De Finetti (1936) La logique des probabilités. In Congrès International de Philosophie Scientifique, Paris, France, pp. 1–9. Cited by: §2.1.
  • [129] L. De Raedt (2008) Logical and relational learning. Springer. Cited by: §3.2.
  • [130] L. De Raedt, P. Frasconi, K. Kersting, and S. Muggleton (Eds.) (2008) Probabilistic inductive logic programming - theory and applications. Lecture Notes in Computer Science, Vol. 4911, Springer. Cited by: §3.2.
  • [131] L. De Raedt, T. Guns, and S. Nijssen (2008) Constraint programming for itemset mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008, pp. 204–212. Cited by: §4.7.
  • [132] L. De Raedt, T. Guns, and S. Nijssen (2010) Constraint programming for data mining and machine learning. In Proc. 24th AAAI Conf. on Artificial Intelligence, (AAAI’10), Atlanta, July 11-15, Cited by: §4.7.
  • [133] L. De Raedt, K. Kersting, S. Natarajan, and D. Poole (2016) Statistical relational artificial intelligence: logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers. External Links: Link, Document Cited by: §3.3, §4.8.
  • [134] L. De Raedt, A. Kimmig, and H. Toivonen (2007) ProbLog: A probabilistic prolog and its application in link discovery. In IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pp. 2462–2467. Cited by: §3.2.
  • [135] T. Demeester, T. Rocktäschel, and S. Riedel (2016) Lifted rule injection for relation embeddings. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1389–1399. Cited by: §4.5.
  • [136] A. P. Dempster, N. M. Laird, and D. B. Rubin (1977) Maximum likelihood from incomplete data via the em algorithm. J. of the Royal Statistical Society, Series B 39 (1), pp. 1–38. Cited by: §2.3, §4.2.
  • [137] A. P. Dempster (1967) Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics 38, pp. 325–339. Cited by: §4.1.
  • [138] T. Denœux, O. Kanjanatarakul, and S. Sriboonchitta (2019) A new evidential k-nearest neighbor rule based on contextual discounting with partially supervised learning. International Journal of Approximate Reasoning 113, pp. 287–302. Cited by: 1st item, 2nd item, §4.1, §4.1, §4.1.
  • [139] T. Denœux and S. Li (2018) Frequency-calibrated belief functions: review and new insights. Int. J. of Approximate Reasoning 92, pp. 232–254. Cited by: 2nd item.
  • [140] T. Denœux and L. M. Zouhal (2001) Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets and Systems 122 (3), pp. 47–62. Cited by: 1st item.
  • [141] T. Denœux (1995) A -nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans. on Systems, Man and Cybernetics 25 (05), pp. 804–813. Cited by: 1st item, §3.7, §4.1.
  • [142] T. Denœux (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. on Knowledge and Data Engineering 25 (1), pp. 119–130. Cited by: 1st item, 2nd item, §4.2.
  • [143] T. Denœux (2019) Logistic regression, neural networks and Dempster-Shafer theory: A new perspective. Knowl.-Based Syst. 176, pp. 54–67. Cited by: §2.
  • [144] J. Derrac and S. Schockaert (2015) Inducing semantic relations from conceptual spaces: a data-driven approach to plausible reasoning. Artificial Intelligence, pp. 74–105. Cited by: §4.4, §4.4.
  • [145] M. Diligenti, M. Gori, M. Maggini, and L. Rigutini (2012) Bridging logic and kernel machines. Machine Learning 86 (1), pp. 57–88. External Links: Document Cited by: §3.3.
  • [146] I. Dillig and S. Tasiran (Eds.) (2019) Computer aided verification - proc. 31st int. conf., CAV’19, new york city, ny, july 15-18, part I. LNCS, Vol. 11561, Springer. External Links: Link, Document, ISBN 978-3-030-25539-8 Cited by: 282.
  • [147] A. Dittadi, T. Bolander, and O. Winther (2018) Learning to plan from raw data in grid-based games. See GCAI-2018, 4th global conference on artificial intelligence, luxembourg, september 18-21, 2018, Lee et al., pp. 54–67. External Links: Link Cited by: §4.8.1.
  • [148] I. Donadello, L. Serafini, and A. S. d’Avila Garcez (2017) Logic tensor networks for semantic image interpretation. See Proc. of the 26th int. joint conf. on artificial intelligence (IJCAI’17), Sierra, pp. 1596–1602. External Links: Document Cited by: §3.3.
  • [149] H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou (2019) Neural logic machines. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.3.
  • [150] F. K. Dosilovic, M. Brcic, and N. Hlupic (2018) Explainable artificial intelligence: A survey. In MIPRO, pp. 210–215. External Links: Document Cited by: footnote 2.
  • [151] T. Dreossi, D. J. Fremont, S. Ghosh, E. Kim, H. Ravanbakhsh, M. Vazquez-Chanlatte, and S. A. Seshia (2019) VERIFAI: A toolkit for the design and analysis of artificial intelligence-based systems. CoRR abs/1902.04245. External Links: Link, 1902.04245 Cited by: §4.8.2.
  • [152] K. Duan, S. S. Keerthi, W. Chu, S. K. Shevade, and A. N. Poo (2003) Multi-category classification by soft-max combination of binary classifiers. See Proceedings of the 4th international workshop on multiple classifier systems (MCS 2003), Windeatt and Roli, pp. 125–134. External Links: Document Cited by: §4.9.
  • [153] D. Dubois, E. Hüllermeier, and H. Prade (2006) A systematic approach to the assessment of fuzzy association rules. Data Min. Knowl. Discov. 13 (2), pp. 167–192. Cited by: §3.5.2.
  • [154] D. Dubois and E. Hüllermeier (2007) Comparing probability measures using possibility theory: A notion of relative peakedness. Int. J. Approx. Reasoning 45 (2), pp. 364–385. Cited by: §4.2.
  • [155] D. Dubois, J. Lang, and H. Prade (1994) Possibilistic logic. In Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. 3, D.M. Gabbay, C.J. Hogger, J. Robinson, and D. Nute (Eds.), pp. 439–513. Cited by: §2.3, §3.5.1.
  • [156] D. Dubois, H. Prade, and G. Richard (2016) Multiple-valued extensions of analogical proportions. Fuzzy Sets and Systems 292, pp. 193–202. Cited by: §3.7.
  • [157] D. Dubois, H. Prade, and A. Rico (2014) The logical encoding of Sugeno integrals. Fuzzy Sets and Systems 241, pp. 61–75. Cited by: §3.5.3.
  • [158] D. Dubois, H. Prade, and S. Schockaert (2017) Generalized possibilistic logic: foundations and applications to qualitative reasoning about uncertainty. Artif. Intell. 252, pp. 139–174. Cited by: §2.3, §3.5.1.
  • [159] D. Dubois, H. Prade, and Th. Sudkamp (2005) On the representation, measurement, and discovery of fuzzy associations. IEEE Trans. Fuzzy Systems 13 (2), pp. 250–262. Cited by: §3.5.2.
  • [160] D. Dubois and H. Prade (1997) Fuzzy criteria and fuzzy rules in subjective evaluation – A general discussion. In Proc. 5th Eur. Cong. Intel. Techn. Soft Comput. (EUFIT’97), Aachen, Vol. 1, 975-979, Cited by: §5.
  • [161] D. Dubois and H. Prade (2012) Possibility theory and formal concept analysis: Characterizing independent sub-contexts. Fuzzy Sets and Systems 196, pp. 4–16. Cited by: §3.4.
  • [162] D. Dubois and H. Prade (2019) Towards a reconciliation between reasoning and learning - A position paper. In Proc. 13th Int. Conf. on Scalable Uncertainty Management (SUM’19), Compiègne, Dec. 16-18, N. B. Amor, B. Quost, and M. Theobald (Eds.), LNCS, Vol. 11940, pp. 153–168. Cited by: §1.
  • [163] K. (. Dvijotham, R. Stanforth, S. Gowal, C. Qin, S. De, and P. Kohli (2019) Efficient neural network verification with exactness characterization. See Proceedings of the thirty-fifth conference on uncertainty in artificial intelligence, UAI 2019, tel aviv, israel, july 22-25, 2019, Globerson and Silva, pp. 164. External Links: Link Cited by: §4.8.2.
  • [164] K. Dvijotham, M. Garnelo, A. Fawzi, and P. Kohli (2018) Verification of deep probabilistic models. CoRR abs/1812.02795. External Links: Link, 1812.02795 Cited by: §4.8.2.
  • [165] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli (2018) A dual approach to scalable verification of deep networks. See Proceedings of the thirty-fourth conference on uncertainty in artificial intelligence, UAI 2018, monterey, california, usa, august 6-10, 2018, Globerson and Silva, pp. 550–559. External Links: Link Cited by: §4.8.2.
  • [166] J. G. Dy and A. Krause (Eds.) (2018) Proceedings of the 35th international conference on machine learning, ICML 2018, stockholmsmässan, stockholm, sweden, july 10-15, 2018. Proceedings of Machine Learning Research, Vol. 80, PMLR. External Links: Link Cited by: 347.
  • [167] S. Dzeroski and N. Lavrac (Eds.) (2001) Relational data mining. Springer. Cited by: §3.2.
  • [168] S. Džeroski and B. Ženko (2004) Is combining classifiers with stacking better than selecting the best one?. Machine learning 54 (3), pp. 255–273. Cited by: 1st item.
  • [169] T. Eiter and D. Sands (Eds.) (2017) LPAR-21, 21st international conference on logic for programming, artificial intelligence and reasoning, maun, botswana, may 7-12, 2017. EPiC Series in Computing, Vol. 46, EasyChair. External Links: Link Cited by: 323.
  • [170] R. Eldan and O. Shamir (2015) The power of depth for feedforward neural networks. CoRR abs/1512.03965. External Links: Link, 1512.03965 Cited by: §2.4.
  • [171] K. Erk (2009) Representing words as regions in vector space. In Proc. 13th Conf. on Computational Natural Language Learning, pp. 57–65. Cited by: §4.4.
  • [172] R. Evans and E. Grefenstette (2018) Learning explanatory rules from noisy data (extended abstract). See Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden, Lang, pp. 5598–5602. External Links: Link, Document Cited by: §4.8.3.
  • [173] R. Evans and E. Grefenstette (2018) Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, pp. 1–64. Cited by: §4.8.3.
  • [174] R. Evans, D. Saxton, D. Amos, P. Kohli, and E. Grefenstette (2018) Can neural networks understand logical entailment?. In 6th Int. Conf. on Learning Representations (ICLR’18), Vancouver, April 30 - May 3, Conf. Track Proc., Cited by: §4.8.3.
  • [175] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song (2018) Robust physical-world attacks on deep learning visual classification. See 1, pp. 1625–1634. External Links: Link, Document Cited by: §4.8.2.
  • [176] M. A. Fahandar and E. Hüllermeier (2018) Learning to rank based on analogical reasoning. In Proc. 32th Nat Conf. on Artificial Intelligence (AAAI’18), New Orleans, Feb. 2-7, pp. . Cited by: §3.7, §4.9.
  • [177] H. Farreny and H. Prade (1989) Positive and negative explanations of uncertain reasoning in the framework of possibility theory. In Proc. 5th Conference on Uncertainty in Artificial Intelligence (UAI’89), Windsor, ON, Aug. 18-20, pp. 95–101. Note: Available in CoRR, abs/1304.1502, 2013; Expanded version: Explications de raisonnements dans l’incertain), Revue d’Intelligence Artificielle, 4(2), 43-75,1990 Cited by: §2.4, §2.
  • [178] S. Ferré and O. Ridoux (2004) Introduction to logical information systems. Information Process. & Manag. 40 (3), pp. 383–419. Cited by: §3.4.
  • [179] S. Ferré, M. Kaytoue, M. Huchard, S. O. Kuznetsov, and A. Napoli (2020) Formal concept analysis: from knowledge discovery to knowledge processing. In A Guided Tour of Artificial Intelligence Research. Vol. 2 Artificial Intelligence Algorithms, P. Marquis, O. Papini, and H. Prade (Eds.), pp. 411–445. Cited by: §3.4.
  • [180] D. Fierens, H. Blockeel, M. Bruynooghe, and J. Ramon (2005) Logical bayesian networks and their relation to other probabilistic logical models. In Inductive Logic Programming, Proc. 15th Int. Conf. ILP’05, Bonn, Germany, Aug. 10-13, pp. 121–135. Cited by: §3.2.
  • [181] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane (2019) Adversarial attacks on medical machine learning. Science 363 (6433), pp. 1287–1289. Cited by: §4.8.2.
  • [182] M. Fischer, M. Balunovic, D. Drachsler-Cohen, T. Gehr, C. Zhang, and M. T. Vechev (2019) DL2: training and querying neural networks with logic. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pp. 1931–1941. External Links: Link Cited by: §4.8.3.
  • [183] P. C. Fishburn (1967) Interdependence and additivity in multivariate, unidimensional expected utility theory. International Economic Review 8 (3), pp. pp. 335–342. External Links: ISSN 00206598, Link Cited by: §4.9.
  • [184] A. Flint and M. B. Blaschko (2012) Perceptron learning of SAT. See Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. proceedings of a meeting held december 3-6, 2012, lake tahoe, nevada, united states, Bartlett et al., pp. 2780–2788. External Links: Link Cited by: §4.8.1.
  • [185] W. Fokkink and R. van Glabbeek (Eds.) (2019) 30th international conference on concurrency theory, CONCUR 2019, august 27-30, 2019, amsterdam, the netherlands. LIPIcs, Vol. 140, Schloss Dagstuhl - Leibniz-Zentrum für Informatik. External Links: Link, ISBN 978-3-95977-121-4 Cited by: 295.
  • [186] M. V. M. França, G. Zaverucha, and A. S. d’Avila Garcez (2014) Fast relational learning using bottom clause propositionalization with artificial neural networks. Machine Learning 94 (1), pp. 81–104. External Links: Document Cited by: §3.3.
  • [187] J. Fürnkranz and E. Hüllermeier (Eds.) (2010) Preference learning. Springer. Cited by: §4.9.
  • [188] J. Fürnkranz, E. Hüllermeier, C. Rudin, R. Slowinski, and S. Sanner (2014) Preference learning (dagstuhl seminar 14101). Dagstuhl Reports 4 (3), pp. 1–27. Cited by: §4.9.
  • [189] J. Fürnkranz and T. Joachims (Eds.) (2010) Proceedings of the 27th international conference on machine learning (icml-10), june 21-24, 2010, haifa, israel. Omnipress. Cited by: 358.
  • [190] (2010) FUZZ-IEEE 2010, proc. IEEE int. conf. on fuzzy systems, barcelona, 18-23 july,. IEEE. External Links: Link, ISBN 978-1-4244-6919-2 Cited by: 336.
  • [191] G. P. G. Leng (2005) An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network. Fuzzy Sets and Systems 150, pp. 211––243. Cited by: §3.5.2.
  • [192] D. Galmiche, S. Schulz, and R. Sebastiani (Eds.) (2018) Automated reasoning - 9th int. joint conf. IJCAR’18, held as part of the federated logic conference, floc 2018, oxford, uk, july 14-17. LNCS, Vol. 10900, Springer. External Links: Link, Document, ISBN 978-3-319-94204-9 Cited by: 263.
  • [193] A. Gammerman, V. Vovk, and V. Vapnik (1998) Learning by transduction. In Proc. 14th Conf. on Uncertainty in AI, pp. 148–155. Cited by: §3.7.
  • [194] B. Ganter and S. O. Kuznetsov (2001) Pattern structures and their projections. In Proc. 9th Int. Conf. on Conceptual Structures (ICCS’01), H. S. Delugach and G. Stumme (Eds.), LNCS, Vol. 2120, pp. 129–142. Cited by: §3.4.
  • [195] B. Ganter and R. Wille (1998) Formal concept analysis: mathematical foundations. Springer-Verlag. Cited by: §3.4.
  • [196] A. S. d. Garcez, K. B. Broda, and D. M. Gabbay (2002) Neural-symbolic learning systems: foundations and applications. Springer Science & Business Media. Cited by: §4.8.3.
  • [197] P. Gärdenfors (1991) Nonmonotonic inference, expectations, and neural networks. In Proc. Europ. Conf. on Symbolic and Quantitative Approaches to Reasoning and Uncertainty (ECSQA), Marseille, Oct. 15-17, R. Kruse and P. Siegel (Eds.), LNCS, Vol. 548, pp. 12–27. Cited by: §3.5.2.
  • [198] P. Gärdenfors (2000) Conceptual spaces: the geometry of thought. MIT Press. Cited by: §4.4.
  • [199] D. Garlan, J. Kramer, and A. L. Wolf (Eds.) (2002) Proceedings of the first workshop on self-healing systems, WOSS 2002, charleston, south carolina, usa, november 18-19, 2002. ACM. External Links: ISBN 1-58113-609-9 Cited by: 423.
  • [200] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. T. Vechev (2018) AI2: safety and robustness certification of neural networks with abstract interpretation. See 2, pp. 3–18. External Links: Link, Document Cited by: §4.8.2.
  • [201] L. Getoor and B. Taskar (Eds.) (2007) Introduction to statistical relational learning. Adaptive Computation and Machine Learning, MIT Press. External Links: ISBN 9780262072885 Cited by: §3.2, §3.3.
  • [202] M. Ghallab, C. D. Spyropoulos, N. Fakotakis, and N. M. Avouris (Eds.) (2008) Proc. ECAI 2008 - proc. 18th european conf. on artificial intelligence, patras, greece, july 21-25, 2008, proceedings. Frontiers in Artificial Intelligence and Applications, Vol. 178, IOS Press. External Links: Link, ISBN 978-1-58603-891-5 Cited by: 338.
  • [203] B. Ghosh and K. S. Meel (2019) IMLI: an incremental framework for maxsat-based learning of interpretable classification rules. See Proceedings of the 2019 AAAI/ACM conference on ai, ethics, and society, AIES 2019, honolulu, hi, usa, january 27-28, 2019, Conitzer et al., pp. 203–210. External Links: Link, Document Cited by: §4.8.2.
  • [204] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In Proc. of the 34th Int. Conf. on Machine Learning, (ICML’17), Sydney, 6-11 Aug., D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, pp. 1263–1272. Cited by: §4.8.1.
  • [205] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal (2018) Explaining explanations: an overview of interpretability of machine learning. In

    Proc. 5th IEEE Int. Conf. on Data Science and Advanced Analytics (DSAA’18), Turin, Oct. 1-3

    , F. Bonchi, F. J. Provost, T. Eliassi-Rad, W. Wang, C. Cattuto, and R.Ghani (Eds.),
    pp. 80–89. Cited by: §2.4.
  • [206] L. H. Gilpin, C. Testart, N. Fruchter, and J. Adebayo (2019) Explaining explanations to society. CoRR abs/1901.06560. Cited by: §2.4.
  • [207] A. Globerson and R. Silva (Eds.) (2018) Proceedings of the thirty-fifth conference on uncertainty in artificial intelligence, UAI 2019, tel aviv, israel, july 22-25, 2019. AUAI Press. External Links: Link Cited by: 163.
  • [208] A. Globerson and R. Silva (Eds.) (2018) Proceedings of the thirty-fourth conference on uncertainty in artificial intelligence, UAI 2018, monterey, california, usa, august 6-10, 2018. AUAI Press. External Links: Link Cited by: 165.
  • [209] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Cited by: §2.4.
  • [210] I. J. Goodfellow, M. Mirza, X. Da, A. C. Courville, and Y. Bengio (2014) An empirical investigation of catastrophic forgetting in gradient-based neural networks. In Conf. Track Proc. 2nd Int. Conf. on Learning Representations (ICLR’14), Banff, April 14-16, Cited by: 3rd item.
  • [211] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. External Links: 1412.6572 Cited by: §2.4.
  • [212] D. Gopinath, G. Katz, C. S. Pasareanu, and C. W. Barrett (2018) DeepSafe: A data-driven approach for assessing robustness of neural networks. See Automated technology for verification and analysis - proc. 16th int. symp. ATVA(18, los angeles, oct. 7-10, Lahiri and Wang, pp. 3–19. External Links: Link, Document Cited by: §4.8.2.
  • [213] M. Grabisch and Ch. Labreuche (2010) A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid. Annals of Oper. Res. 175, pp. 247–286. Cited by: §3.5.3.
  • [214] Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu (2009) Support vector machines with a reject option. In Advances in neural information processing systems, pp. 537–544. Cited by: 2nd item.
  • [215] S. Greco, M. Inuiguchi, and R. Slowinski (2006) Fuzzy rough sets and multiple-premise gradual decision rules. Int. J. Approx. Reasoning 41 (2), pp. 179–211. Cited by: §3.5.3.
  • [216] S. Greco, B. Matarazzo, and R. Slowinski (2004) Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and rough-set decision rules. Europ.J. of Operational Research 158 (2), pp. 271–292. Cited by: §3.5.3.
  • [217] C. Grozea and M. Popescu (2014) Can machine learning learn a decision oracle for NP problems? A test on SAT. Fundam. Inform. 131 (3-4), pp. 441–450. External Links: Link, Document Cited by: §4.8.1.
  • [218] J. W. Grzymala-Busse and W. Ziarko (2000) Data mining and rough set theory. Commun. ACM 43 (4), pp. 108–109. Cited by: §5.
  • [219] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi (2019) A survey of methods for explaining black box models. ACM Comput. Surv. 51 (5), pp. 93:1–93:42. External Links: Document Cited by: footnote 2.
  • [220] J. L. Guigues and V. Duquenne (1986) Familles minimales d’implications informatives résultant d’un tableau de données binaires. Mathématiques et Sciences Humaines 95, pp. 5–18. Cited by: §3.4.
  • [221] R. Guillaume and D. Dubois (2018) A maximum likelihood approach to inference under coarse data based on minimax regret. In Uncertainty Modelling in Data Science, SMPS 2018, S. Destercke, T. Denœux, M. A. Gil, P. Grzegorzewski, and O. Hryniewicz (Eds.), Advances in Intelligent Systems and Computing, Vol. 832, pp. 99–106. Cited by: 2nd item.
  • [222] Y. Guo and F. Farooq (Eds.) (2018) Proc. 24th ACM SIGKDD int. conf. on knowledge discovery & data mining, KDD 2018, london, uk, august 19-23, 2018. ACM. External Links: Link, Document Cited by: 321.
  • [223]

    A. Gupta, G. Boleda, and S. Padó

    (2018)
    Instantiation. CoRR abs/1808.01662. Cited by: §4.4.
  • [224] V. Gutiérrez-Basulto and S. Schockaert (2018) From knowledge graph embedding to ontology embedding? an analysis of the compatibility between vector space representations and rules. In Proc. of the 16th Int. Conf. on Principles of Knowledge Representation and Reasoning, pp. 379–388. Cited by: §4.6.
  • [225] I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca. Cited by: 491, 471, 286, 404.
  • [226] T. M. Ha (1997) The optimum class-selective rejection rule. IEEE Trans. on Pattern Analysis and Machine Intelligence 19 (6), pp. 608–615. Cited by: 2nd item.
  • [227] S. Haim and T. Walsh (2009) Restart strategy selection using machine learning techniques. See Theory and applications of satisfiability testing - SAT 2009, proc. 12th int. conf. SAT 2009, swansea, june 30 - july 3, Kullmann, pp. 312–325. External Links: Link, Document Cited by: §4.8.1.
  • [228] P. Hájek and P. Havránek (1978) Mechanising hypothesis formation - mathematical foundations for a general theory. Springer Verlag. Cited by: §3.5.2.
  • [229] J. Y. Halpern, R. Fagin, Y. Moses, and M. Y. Vardi (1995 & 2003) Reasoning about knowledge. MIT Press. Cited by: §2.1.
  • [230] J. Y. Halpern (2017) Reasoning about uncertainty. MIT Press. Cited by: §2.1, §2.3, §1.
  • [231] J. Y. Halpern and J. Pearl (2005) Causes and explanations: A structural-model approach. part II: explanations. The British Journal for the Philosophy of Science 56 (4), pp. 889–911. Cited by: §2.4.
  • [232] D. W. Hasling, W. J. Clancey, and G. Rennels (1984) Strategic explanations for a diagnostic consultation system. Int. J. of Man-Machine Studies 20 (1), pp. 3–19. Cited by: §2.4.
  • [233] D. Heaven (2019) Why deep-learning AIs are so easy to fool.. Nature 574 (7777), pp. 163. Cited by: §4.8.2.
  • [234] D. F. Heitjan and D. B. Rubin (1991) Ignorability and coarse data. Ann. Statist. 19 (4), pp. 2244–2253. Cited by: 1st item.
  • [235] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner (2017) -VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, Vol. 3. Cited by: §4.4.
  • [236] F. Hill, K. Cho, A. Korhonen, and Y. Bengio (2016) Learning to understand phrases by embedding the dictionary. Transactions of the Association for Computational Linguistics 4, pp. 17–30. Cited by: §4.4.
  • [237] R. R. Hoffman and G. Klein (2017) Explaining explanation, part 1: theoretical foundations. IEEE Intelligent Systems 32 (3), pp. 68–73. Cited by: §2.4.
  • [238] R. R. Hoffman and G. Klein (2017) Explaining explanation, part 1: theoretical foundations. IEEE Intelligent Systems 32 (3), pp. 68–73. External Links: Document Cited by: footnote 2.
  • [239] R. R. Hoffman, T. Miller, S. T. Mueller, G. Klein, and W. J. Clancey (2018) Explaining explanation, part 4: A deep dive on deep nets. IEEE Intelligent Systems 33 (3), pp. 87–95. External Links: Document Cited by: §2.4, footnote 2.
  • [240] R. R. Hoffman, S. T. Mueller, G. Klein, and J. Litman (2018) Metrics for explainable AI: challenges and prospects. CoRR abs/1812.04608. External Links: 1812.04608 Cited by: footnote 2.
  • [241] R. R. Hoffman, S. T. Mueller, and G. Klein (2017) Explaining explanation, part 2: empirical foundations. IEEE Intelligent Systems 32 (4), pp. 78–86. External Links: Document Cited by: §2.4, footnote 2.
  • [242] P. Hohenecker and T. Lukasiewicz (2017) Deep learning for ontology reasoning. CoRR abs/1705.10342. External Links: Link, 1705.10342 Cited by: §3.3, §4.8.1.
  • [243] S. Hölldobler, Y. Kalinke, and H. Störr (1999) Approximating the semantics of logic programs by recurrent neural networks. Applied Intelligence 11 (1), pp. 45–58. External Links: Document Cited by: §3.3.
  • [244] J. N. Hooker (Ed.) (2018) Principles and practice of constraint programming - proc. 24th int. conf. CP 2018, lille, aug. 27-31. Lecture Notes in Computer Science, Vol. 11008, Springer. External Links: Link, Document, ISBN 978-3-319-98333-2 Cited by: 484, 328.
  • [245] P. O. Hoyer (2004) Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, pp. 1457–1469. Cited by: §4.4.
  • [246] X. Hu, C. Rudin, and M. I. Seltzer (2019) Optimal sparse decision trees. In NeurIPS, Cited by: §4.8.2.
  • [247] Z. Hu, X. Ma, Z. Liu, E. H. Hovy, and E. P. Xing (2016) Harnessing deep neural networks with logic rules. In Proc. 54th Annual Meeting of the Association for Computational Linguistics, Cited by: §4.5.
  • [248] D. Huang, P. Dhariwal, D. Song, and I. Sutskever (2019) GamePad: A learning environment for theorem proving. In 7th Int. Conf. on Learning Representations (ICLR’19, New Orleans, LA, May 6-9, 2019, Cited by: §4.8.1.
  • [249] D. Huang (2019) On learning to prove. CoRR abs/1904.11099. External Links: Link, 1904.11099 Cited by: §4.8.1.
  • [250] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu (2017) Safety verification of deep neural networks. See Computer aided verification - 29th int. conf. CAV’17, heidelberg, july 24-28, 2017, proceedings, part I, Majumdar and Kuncak, pp. 3–29. External Links: Link, Document Cited by: §4.8.2.
  • [251] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio (2016) Binarized neural networks. See Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, december 5-10, 2016, barcelona, spain, Lee et al., pp. 4107–4115. External Links: Link Cited by: §4.8.2.
  • [252] E. Hüllermeier and W. Cheng (2015) Superset learning based on generalized loss minimization. In Machine Learning and Knowledge Discovery in Databases - Proc. Eur. Conf., ECML PKDD 2015, Part II, Lecture Notes in Computer Science, Vol. 9285, pp. 260–275. Cited by: 1st item.
  • [253] E. Hüllermeier, S. Destercke, and I. Couso (2019) Learning from imprecise data: adjustments of optimistic and pessimistic variants. In Scalable Uncertainty Management - 13th International Conference, SUM 2019, Compiègne, France, December 16-18, 2019, Proceedings, pp. 266–279. External Links: Link, Document Cited by: §4.2.
  • [254] E. Hüllermeier, D. Dubois, and H. Prade (2002) Model adaptation in possibilistic instance-based reasoning. IEEE Trans. Fuzzy Systems 10 (3), pp. 333–339. Cited by: §3.7.
  • [255] E. Hüllermeier (2014) Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization. International Journal of Approximate Reasoning 55 (7), pp. 1519–1534. Cited by: 1st item, 2nd item.
  • [256] F. Hutter, H. H. Hoos, K. Leyton-Brown, and T. Stützle (2009) ParamILS: an automatic algorithm configuration framework. J. Artif. Intell. Res. 36, pp. 267–306. External Links: Link, Document Cited by: §4.8.1.
  • [257] F. Hutter, H. H. Hoos, and K. Leyton-Brown (2011) Sequential model-based optimization for general algorithm configuration. See Learning and intelligent optimization - 5th international conference, LION 5, rome, italy, january 17-21, 2011. selected papers, Coello, pp. 507–523. External Links: Link, Document Cited by: §4.8.1.
  • [258] F. Hutter, L. Xu, H. H. Hoos, and K. Leyton-Brown (2014) Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, pp. 79–111. External Links: Link, Document Cited by: §4.8.1.
  • [259] L. Hyafil and R. L. Rivest (1976) Constructing optimal binary decision trees is np-complete. Information Processing Letters 5 (1), pp. 15–17. External Links: Document Cited by: §2.2.
  • [260] (2018) IEEE conference on computer vision and pattern recognition, CVPR 2019, long beach, ca, usa, june 16-20, 2019. Computer Vision Foundation / IEEE. External Links: Link Cited by: 470.
  • [261] A. Ignatiev, N. Narodytska, and J. Marques-Silva (2019) Abduction-based explanations for machine learning models. See 454, pp. 1511–1519. External Links: Link, Document Cited by: §4.8.2, §4.8.2.
  • [262] A. Ignatiev, N. Narodytska, and J. Marques-Silva (2019) On validating, repairing and refining heuristic ML explanations. CoRR abs/1907.02509. External Links: Link, 1907.02509 Cited by: §4.8.2.
  • [263] A. Ignatiev, F. Pereira, N. Narodytska, and J. Marques-Silva (2018) A SAT-based approach to learn explainable decision sets. See Automated reasoning - 9th int. joint conf. IJCAR’18, held as part of the federated logic conference, floc 2018, oxford, uk, july 14-17, Galmiche et al., pp. 627–645. External Links: Link, Document Cited by: §4.8.2.
  • [264] G. Irving, C. Szegedy, A. A. Alemi, N. Eén, F. Chollet, and J. Urban (2016) DeepMath - deep sequence models for premise selection. See Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, december 5-10, 2016, barcelona, spain, Lee et al., pp. 2235–2243. External Links: Link Cited by: §4.8.1.
  • [265] S. Jabbour, L. Sais, and Y. Salhi (2017) Mining top-k motifs with a sat-based framework. Artif. Intell. 244, pp. 30–47. Cited by: §4.7.
  • [266] M. Jaeger (2005) Ignorability in statistical and probabilistic inference. J. Artif. Intell. Res. 24, pp. 889–917. Cited by: 1st item.
  • [267] S. Jameel and S. Schockaert (2016) Entity embeddings with conceptual subspaces as a basis for plausible reasoning. In Proc. 22nd Europ. Conf. on Artificial Intelligence (ECAI’16), 29 Aug.-2 Sept. 2016, The Hague, pp. 1353–1361. Cited by: §4.6.
  • [268] S. Jameel and S. Schockaert (2017) Modeling context words as regions: an ordinal regression approach to word embedding. In Proc. 21st Conf. on Computational Natural Language Learning, pp. 123–133. Cited by: §4.4.
  • [269] J.-S. R. Jang and C.-T. Sun (1993)

    Functional equivalence between radial basis function networks and fuzzy inference systems

    .
    IEEE Trans. Neural Networks 4 (1), pp. 156–159. Cited by: §3.5.2.
  • [270] M. Janota and I. Lynce (Eds.) (2019) Theory and applications of satisfiability testing - SAT 2019 - proc. 22nd int. conf. SAT’19, lisbon, july 9-12. LNCS, Vol. 11628, Springer. External Links: Link, Document, ISBN 978-3-030-24257-2 Cited by: 361, 425.
  • [271] M. Janota (2018) Towards generalization in QBF solving via machine learning. In Proc. 32nd AAAI Conf. on Artificial Intelligence, (AAAI-18), New Orleans, Feb. 2-7, pp. 6607–6614. Cited by: §4.8.1.
  • [272] R. Jeffrey (1983) The logic of decision. 2nd ed. Chicago University Press. Cited by: §4.3.
  • [273] C. Kaliszyk, F. Chollet, and C. Szegedy (2017) HolStep: A machine learning dataset for higher-order logic theorem proving. See 4, External Links: Link Cited by: §4.8.1.
  • [274] C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsák (2018) Reinforcement learning of theorem proving. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 8836–8847. External Links: Link Cited by: §4.8.1.
  • [275] C. Kaliszyk, J. Urban, and J. Vyskocil (2014) Machine learner for automated reasoning 0.4 and 0.5. See 4th workshop on practical aspects of automated reasoning, paar@ijcar 2014, vienna, 2014, Schulz et al., pp. 60–66. External Links: Link Cited by: §4.8.1.
  • [276] C. Kaliszyk and J. Urban (2014) Learning-assisted automated reasoning with Flyspeck. J. Autom. Reasoning 53 (2), pp. 173–213. External Links: Link, Document Cited by: §4.8.1.
  • [277] C. Kaliszyk and J. Urban (2015) Learning-assisted theorem proving with millions of lemmas. J. Symb. Comput. 69, pp. 109–128. External Links: Link, Document Cited by: §4.8.1.
  • [278] O. Kanjanatarakul, S. Sriboonchitta, and T. Denœux (2016) Statistical estimation and prediction using belief functions: principles and application to some econometric models. Int. J. of Approximate Reasoning 72, pp. 71–94. Cited by: 2nd item.
  • [279] A. Karpathy and L. Fei-Fei (2015) Deep visual-semantic alignments for generating image descriptions. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137. Cited by: §4.4.
  • [280] G. Kassel (1987) The use of deep knowledge to improve explanation capabilities of rule-based expert systems. In Expertensysteme ’87: Konzepte und Werkzeuge, Tagung I/1987 des German Chapter of the ACM am 7. und 8.4.1987 in Nürnberg, H. Balzert, G. Heyer, and R. Lutze (Eds.), Berichte des German Chapter of the ACM, Vol. 28, pp. 315–326. Cited by: §2.4.
  • [281] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer (2017) Reluplex: an efficient SMT solver for verifying deep neural networks. See Computer aided verification - 29th int. conf. CAV’17, heidelberg, july 24-28, 2017, proceedings, part I, Majumdar and Kuncak, pp. 97–117. External Links: Link, Document Cited by: §4.8.2.
  • [282] G. Katz, D. A. Huang, D. Ibeling, K. Julian, C. Lazarus, R. Lim, P. Shah, S. Thakoor, H. Wu, A. Zeljic, D. L. Dill, M. J. Kochenderfer, and C. W. Barrett (2019) The marabou framework for verification and analysis of deep neural networks. See Computer aided verification - proc. 31st int. conf., CAV’19, new york city, ny, july 15-18, part I, Dillig and Tasiran, pp. 443–452. External Links: Link, Document Cited by: §4.8.2.
  • [283] N. Kaur, G. Kunapuli, T. Khot, K. Kersting, W. Cohen, and S. Natarajan (2017)

    Relational restricted boltzmann machines: A probabilistic logic learning approach

    .
    In Inductive Logic Programming - 27th International Conference, ILP 2017, Orléans, France, September 4-6, 2017, Revised Selected Papers, pp. 94–111. Cited by: §3.2.
  • [284] S. M. Kazemi and D. Poole (2018) Simple embedding for link prediction in knowledge graphs. In Advances in Neural Information Processing Systems 31: Annual Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp. 4289–4300. Cited by: §4.6.
  • [285] A. Kemmar, Y. Lebbah, S. Loudni, P. Boizumault, and T. Charnois (2017) Prefix-projection global constraint and top-k approach for sequential pattern mining. Constraints 22 (2), pp. 265–306. Cited by: §4.7.
  • [286] E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song (2017) Learning combinatorial optimization algorithms over graphs. See Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, Guyon et al., pp. 6348–6358. External Links: Link Cited by: §4.8.1.
  • [287] M. Khiari, P. Boizumault, and B. Crémilleux (2010) Constraint programming for mining n-ary patterns. In Principles and Practice of Constraint Programming - Proc. CP 16th Int. Conf. CP 2010, St. Andrews, Scotland, Sept. 6-10, pp. 552–567. Cited by: §4.7.
  • [288] A. R. KhudaBukhsh, L. Xu, H. H. Hoos, and K. Leyton-Brown (2016) SATenstein: automatically building local search SAT solvers from components. Artif. Intell. 232, pp. 20–42. External Links: Link, Document Cited by: §4.8.1.
  • [289] G. Klein (2018) Explaining explanation, part 3: the causal landscape. IEEE Intelligent Systems 33 (2), pp. 83–88. External Links: Document Cited by: §2.4, footnote 2.
  • [290] S. Kraus, D. Lehmann, and M. Magidor (1990) Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence 44, pp. 167–207. Cited by: §3.5.1.
  • [291] B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi (Eds.) (2016) Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016. ACM. External Links: Link, Document, ISBN 978-1-4503-4232-2 Cited by: 398, 299.
  • [292] O. Kullmann (Ed.) (2009) Theory and applications of satisfiability testing - SAT 2009, proc. 12th int. conf. SAT 2009, swansea, june 30 - july 3. LNCS, Vol. 5584, Springer. External Links: Link, Document, ISBN 978-3-642-02776-5 Cited by: 227.
  • [293] C. Kuo, S. S. Ravi, T. Dao, C. Vrain, and I. Davidson (2017) A framework for minimal clustering modification via constraint programming. In Proc. 31st AAAI Conf. on Artificial Intelligence, February 4-9, 2017, San Francisco, pp. 1389–1395. Cited by: §4.7.
  • [294] O. Kuzelka, J. Davis, and S. Schockaert (2015) Encoding Markov logic networks in possibilistic logic. In Proc. 31st Conf. on Uncertainty in Artificial Intelligence (UAI’15), July 12-16, Amsterdam, M. Meila and T. Heskes (Eds.), pp. 454–463. Cited by: §3.5.1.
  • [295] M. Z. Kwiatkowska (2019) Safety verification for deep neural networks with provable guarantees (invited paper). See 30th international conference on concurrency theory, CONCUR 2019, august 27-30, 2019, amsterdam, the netherlands, Fokkink and van Glabbeek, pp. 1:1–1:5. External Links: Link, Document Cited by: §4.8.2.
  • [296] Ch. Labreuche (2011) A general framework for explaining the results of a multi-attribute preference model. Artif. Intell. 175 (7-8), pp. 1410–1448. Cited by: §2.4.
  • [297] N. Lachiche and P. A. Flach (2002) 1BC2: A true first-order bayesian classifier. In Inductive Logic Programming, 12th International Conference, ILP 2002, Sydney, Australia, July 9-11, 2002. Revised Papers, pp. 133–148. Cited by: §3.2.
  • [298] S. K. Lahiri and C. Wang (Eds.) (2018) Automated technology for verification and analysis - proc. 16th int. symp. ATVA(18, los angeles, oct. 7-10. LNCS, Vol. 11138, Springer. External Links: Link, Document, ISBN 978-3-030-01089-8 Cited by: 212, 433.
  • [299] H. Lakkaraju, S. H. Bach, and J. Leskovec (2016) Interpretable decision sets: A joint framework for description and prediction. See Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016, Krishnapuram et al., pp. 1675–1684. External Links: Link, Document Cited by: §4.8.2.
  • [300] J. Lang (Ed.) (2018) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden. ijcai.org. External Links: Link, ISBN 978-0-9992411-2-7 Cited by: 439, 172, 409, 362, 359.
  • [301] N. Lavrac, S. Dzeroski, and M. Grobelnik (1991) Learning nonrecursive definitions of relations with LINUS. In Machine Learning - EWSL-91, European Working Session on Learning, Porto, Portugal, March 6-8, 1991, Proceedings, pp. 265–281. Cited by: §3.2.
  • [302] M.T. Law, N. Thome, and M. Cord (2017) Learning a distance metric from relative comparisons between quadruplets of images. Int. J. of Computer Vision 121 (1), pp. 65–94. Cited by: §3.7.
  • [303] Y. Le Cun (2019) Quand la machine apprend. la révolution des neurones artificiels et de l’apprentissage profond. Odile Jacob. Cited by: §5.
  • [304] Y. LeCun, Y. Bengio, and G. E. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §2.4.
  • [305] G. Lederman, M. N. Rabe, and S. A. Seshia (2018) Learning heuristics for automated reasoning through deep reinforcement learning. CoRR abs/1807.08058. External Links: Link, 1807.08058 Cited by: §4.8.1.
  • [306] D. D. Lee and H. S. Seung (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401 (6755), pp. 788. Cited by: §4.4.
  • [307] D. D. Lee, A. Steen, and T. Walsh (Eds.) (2018) GCAI-2018, 4th global conference on artificial intelligence, luxembourg, september 18-21, 2018. EPiC Series in Computing, Vol. 55, EasyChair. External Links: Link Cited by: 147.
  • [308] D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett (Eds.) (2016) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, december 5-10, 2016, barcelona, spain. External Links: Link Cited by: 251, 264.
  • [309] F. Leofante, N. Narodytska, L. Pulina, and A. Tacchella (2018) Automated verification of neural networks: advances, challenges and perspectives. CoRR abs/1805.09938. External Links: Link, 1805.09938 Cited by: §4.8.2.
  • [310] H. J. Levesque and R. J. Brachman (1987) Expressiveness and tractability in knowledge representation and reasoning. Computational Intelligence 3, pp. 78–93. Cited by: §2.1.
  • [311] N. Li, Z. Bouraoui, and S. Schockaert (2019) Ontology completion using graph convolutional networks. In The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I, pp. 435–452. External Links: Link, Document Cited by: §4.5.
  • [312] Z. Li, Q. Chen, and V. Koltun (2018) Combinatorial optimization with graph convolutional networks and guided tree search. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 537–546. External Links: Link Cited by: §4.8.1.
  • [313] C. Lian, S. Ruan, and T. Denœux (2016) Dissimilarity metric learning in the belief function framework. IEEE Transactions on Fuzzy Systems 24 (6), pp. 1555–1564. Cited by: §4.1.
  • [314] J. H. Liang, V. Ganesh, P. Poupart, and K. Czarnecki (2016) Exponential recency weighted average branching heuristic for SAT solvers. See Proc. 30th AAAI conference on artificial intelligence, february 12-17, 2016, phoenix, Schuurmans and Wellman, pp. 3434–3440. External Links: Link Cited by: §4.8.1.
  • [315] J. H. Liang, V. Ganesh, P. Poupart, and K. Czarnecki (2016) Learning rate based branching heuristic for SAT solvers. In Theory and Applications of Satisfiability Testing - SAT 2016 - Proc. 19th Int. Conf., Bordeaux, July 5-8, N. Creignou and D. L. Berre (Eds.), LNCS, Vol. 9710, pp. 123–140. Cited by: §4.8.1.
  • [316] J. H. Liang, C. Oh, M. Mathew, C. Thomas, C. Li, and V. Ganesh (2018) Machine learning-based restart policy for CDCL SAT solvers. See Theory and applications of satisfiability testing - SAT 2018 - proc. 21st int. conf. SAT’18, held as part of the federated logic conference, floc 2018, oxford, july 9-12, Beyersdorff and Wintersteiger, pp. 94–110. External Links: Link, Document Cited by: §4.8.1.
  • [317] J. Lieber, E. Nauer, H. Prade, and G. Richard (2018) Making the best of cases by approximation, interpolation and extrapolation. In Proc. 26th Int. Conf. on Case-Based Reasoning (ICCBR’18), Stockholm, July 9-12, M. T. Cox, P. Funk, and S. Begum (Eds.), LNCS, Vol. 11156, pp. 580–596. Cited by: §3.7.
  • [318] X. Lin, H. Zhu, R. Samanta, and S. Jagannathan (2019) ART: abstraction refinement-guided training for provably correct neural networks. CoRR abs/1907.10662. External Links: Link, 1907.10662 Cited by: §4.8.3.
  • [319] R. J.A. Little and D. B. Rubin (2019) Statistical analysis with missing data. Vol. 793, Wiley. Cited by: §3.6.1.
  • [320] L. Liu and T. Dietterich (2014) Learnability of the superset label learning problem. In International Conference on Machine Learning, pp. 1629–1637. Cited by: §3.6.1.
  • [321] N. Liu, H. Yang, and X. Hu (2018) Adversarial detection with model interpretation. See Proc. 24th ACM SIGKDD int. conf. on knowledge discovery & data mining, KDD 2018, london, uk, august 19-23, 2018, Guo and Farooq, pp. 1803–1811. External Links: Link, Document Cited by: §4.8.2.
  • [322] F. Locatello, S. Bauer, M. Lucic, S. Gelly, B. Schölkopf, and O. Bachem (2018) Challenging common assumptions in the unsupervised learning of disentangled representations. CoRR abs/1811.12359. External Links: Link Cited by: 1st item.
  • [323] S. M. Loos, G. Irving, C. Szegedy, and C. Kaliszyk (2017) Deep network guided proof search. See LPAR-21, 21st international conference on logic for programming, artificial intelligence and reasoning, maun, botswana, may 7-12, 2017, Eiter and Sands, pp. 85–105. External Links: Link Cited by: §4.8.1.
  • [324] D. Lopez-Paz, R. Nishihara, S. Chintala, B. Scholkopf, and L. Bottou (2017) Discovering causal signals in images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6979–6987. Cited by: §2.4, §2.4.
  • [325] S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 4765–4774. External Links: Link Cited by: §4.8.2, footnote 2.
  • [326] C. Luo, S. Cai, W. Wu, Z. Jie, and K. Su (2015) CCLS: an efficient local search algorithm for weighted maximum satisfiability. IEEE Transactions on Computers 64 (7), pp. 1830–1843. External Links: Document Cited by: §2.2.
  • [327] R. Majumdar and V. Kuncak (Eds.) (2017) Computer aided verification - 29th int. conf. CAV’17, heidelberg, july 24-28, 2017, proceedings, part I. LNCS, Vol. 10426, Springer. External Links: Link, Document, ISBN 978-3-319-63386-2 Cited by: 281, 250.
  • [328] D. Malioutov and K. S. Meel (2018) MLIC: A maxsat-based framework for learning interpretable classification rules. See Principles and practice of constraint programming - proc. 24th int. conf. CP 2018, lille, aug. 27-31, Hooker, pp. 312–327. External Links: Link, Document Cited by: §4.8.2.
  • [329] S. Mallat (2018) Sciences des données et apprentissage en grande dimension. Leçons Inaugurales du Collège de France, Fayard, Paris. Cited by: §5.
  • [330] E. H. Mamdani and S. Assilian (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. of Man-Machine Studies 7 (1), pp. 1 – 13. Cited by: §3.5.2.
  • [331] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt (2018) DeepProbLog: neural probabilistic logic programming. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 3753–3763. External Links: Link Cited by: §4.8.1.
  • [332] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt (2018) Deepproblog: neural probabilistic logic programming. In Advances in Neural Information Processing Systems, pp. 3749–3759. Cited by: §4.5.
  • [333] J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, and J. Wu (2019) The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.3.
  • [334] P. Marquis, O. Papini, and H. Prade (Eds.) (2019) A guided tour of artificial intelligence research. vol. 1 knowledge representation, reasoning and learning. vol. 2 artificial intelligence algorithms. vol. 3 interfaces and applications of artificial intelligence. Springer. Cited by: §1.
  • [335] P. Marquis (2015) Compile!. See Proceedings of the twenty-ninth AAAI conference on artificial intelligence (aaai 2015), Bonet and Koenig, pp. 4112–4118. External Links: Link Cited by: §2.2.
  • [336] C. Marsala and B. Bouchon-Meunier (2010) Quality of measures for attribute selection in fuzzy decision trees. See 190, pp. 1–8. External Links: Link, Document Cited by: §3.5.2.
  • [337] R. Martins, V. M. Manquinho, and I. Lynce (2014) Open-wbo: A modular maxsat solver. See Proceedings of the 17th international conference on theory and applications of satisfiability testing (SAT 2014), Sinz and Egly, pp. 438–445. External Links: Document Cited by: §2.2.
  • [338] P. J. Matos, J. Planes, F. Letombe, and J. Marques-Silva (2008) A MAX-SAT algorithm portfolio. See Proc. ECAI 2008 - proc. 18th european conf. on artificial intelligence, patras, greece, july 21-25, 2008, proceedings, Ghallab et al., pp. 911–912. External Links: Link, Document Cited by: §4.8.1.
  • [339] K. S. McKinley and K. Fisher (Eds.) (2019) Proc. of the 40th ACM SIGPLAN conf. on programming language design and implementation, PLDI 2019, phoenix, june 22-26, 2019. ACM. External Links: Link, Document, ISBN 978-1-4503-6712-7 Cited by: 506.
  • [340] N. Messai, M. Devignes, A. Napoli, and M. Smaïl-Tabbone (2008) Many-valued concept lattices for conceptual clustering and information retrieval. In Proc. 18th Europ. Conf. on Artificial Intelligence (ECAI’08), Patras, July 21-25, M. Ghallab, C. D. Spyropoulos, N. Fakotakis, and N. M. Avouris (Eds.), Frontiers in Artificial Intelligence and Applications, Vol. 178, pp. 127–131. Cited by: §3.4.
  • [341] L. Miclet, S. Bayoudh, and A. Delhay (2008) Analogical dissimilarity: definition, algorithms and two experiments in machine learning. JAIR, 32, pp. 793–824. Cited by: §3.7.
  • [342] L. Miclet and H. Prade (2009) Handling analogical proportions in classical logic and fuzzy logics settings. In Proc. 10th Eur. Conf. on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU’09), pp. 638–650. Cited by: §3.7.
  • [343] T. Miller (2019) “But why?" understanding explainable artificial intelligence. ACM Crossroads 25 (3), pp. 20–25. External Links: Document Cited by: footnote 2.
  • [344] T. Miller (2019) Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, pp. 1–38. External Links: Document Cited by: footnote 2.
  • [345] P. Minervini, M. Bosnjak, T. Rocktäschel, and S. Riedel (2018) Towards neural theorem proving at scale. CoRR abs/1807.08204. External Links: Link, 1807.08204 Cited by: §4.8.3.
  • [346] B. Mirkin (2011) Core concepts in data analysis: summarization, correlation, visualization. Springer. Cited by: §5.
  • [347] M. Mirman, T. Gehr, and M. T. Vechev (2018) Differentiable abstract interpretation for provably robust neural networks. See Proceedings of the 35th international conference on machine learning, ICML 2018, stockholmsmässan, stockholm, sweden, july 10-15, 2018, Dy and Krause, pp. 3575–3583. External Links: Link Cited by: §4.8.2.
  • [348] T. Mitchell (1979) Version spaces: an approach to concept learning.. Ph.D. Thesis, Stanford University. Cited by: §3.5.3, §5.
  • [349] B. D. Mittelstadt, C. Russell, and S. Wachter (2019) Explaining explanations in AI. In FAT, pp. 279–288. External Links: Document Cited by: footnote 2.
  • [350] M. Mohri, A. Rostamizadeh, and A. Talwalkar (2018) Foundations of machine learning. Second edition. MIT Press. Cited by: §1.
  • [351] V. Molek and I. Perfilieva (2019) Scale-space theory, F-transform kernels and CNN realization. In Advances in Computational Intelligence - Proc. 15th Int. Work-Conf. on Artificial Neural Networks, IWANN 2019, Gran Canaria, June 12-14, Part II, pp. 38–48. Cited by: §5.
  • [352] G. Montavon, W. Samek, and K. Müller (2018) Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73, pp. 1–15. External Links: Document Cited by: footnote 2.
  • [353] M. Mueller and S. Kramer (2010) Integer linear programming models for constrained clustering. In Discovery Science - Proc. 13th Int. Conf. DS’10, Canberra, Oct. 6-8, pp. 159–173. Cited by: §4.7.
  • [354] S. T. Mueller, R. R. Hoffman, W. J. Clancey, A. Emrey, and G. Klein (2019) Explanation in human-AI systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI. CoRR abs/1902.01876. External Links: 1902.01876 Cited by: footnote 2.
  • [355] S. Muggleton and L. De Raedt (1994) Inductive logic programming: theory and methods. J. Log. Program. 19/20, pp. 629–679. External Links: Link, Document Cited by: §3.2.
  • [356] S. H. Muggleton, U. Schmid, C. Zeller, A. Tamaddoni-Nezhad, and T. R. Besold (2018) Ultra-strong machine learning: comprehensibility of programs learned with ILP. Machine Learning 107 (7), pp. 1119–1140. Cited by: §3.2.
  • [357] S. Muggleton (1995) Inverse entailment and Progol. New Generation Comput. 13 (3-4), pp. 245–286. Cited by: §3.2.
  • [358] V. Nair and G. E. Hinton (2010) Rectified linear units improve restricted boltzmann machines. See Proceedings of the 27th international conference on machine learning (icml-10), june 21-24, 2010, haifa, israel, Fürnkranz and Joachims, pp. 807–814. External Links: Link Cited by: §4.8.2.
  • [359] N. Narodytska, A. Ignatiev, F. Pereira, and J. Marques-Silva (2018) Learning optimal decision trees with SAT. See Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden, Lang, pp. 1362–1368. External Links: Link, Document Cited by: §4.8.2.
  • [360] N. Narodytska, S. P. Kasiviswanathan, L. Ryzhyk, M. Sagiv, and T. Walsh (2018) Verifying properties of binarized deep neural networks. In Proc. 32nd AAAI Conf. on Artificial Intelligence, (AAAI-18), New Orleans, Louisiana, Feb. 2-7, 2018, pp. 6615–6624. Cited by: §4.8.2.
  • [361] N. Narodytska, A. A. Shrotri, K. S. Meel, A. Ignatiev, and J. Marques-Silva (2019) Assessing heuristic machine learning explanations with model counting. See Theory and applications of satisfiability testing - SAT 2019 - proc. 22nd int. conf. SAT’19, lisbon, july 9-12, Janota and Lynce, pp. 267–278. External Links: Link, Document Cited by: §4.8.2.
  • [362] N. Narodytska (2018) Formal analysis of deep binarized neural networks. See Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden, Lang, pp. 5692–5696. External Links: Link, Document Cited by: §4.8.2.
  • [363] H. T. Nguyen (1978) On random sets and belief functions. J. of Mathematical Analysis and Applications 65, pp. 531–542. Cited by: 2nd item.
  • [364] S. Nijssen and É. Fromont (2007) Mining optimal decision trees from itemset lattices. See Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, san jose, california, usa, august 12-15, 2007, Berkhin et al., pp. 530–539. External Links: Link, Document Cited by: §4.8.2.
  • [365] S. Nijssen and É. Fromont (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Min. Knowl. Discov. 21 (1), pp. 9–51. External Links: Link, Document Cited by: §4.8.2.
  • [366] S. Nijssen (2008) Bayes optimal classification for decision trees. See Machine learning, proceedings of the twenty-fifth international conference (ICML 2008), helsinki, finland, june 5-9, 2008, Cohen et al., pp. 696–703. External Links: Link, Document Cited by: §4.8.2.
  • [367] J. Nin, A. Laurent, and P. Poncelet (2010) Speed up gradual rule mining from stream data! A b-tree and owa-based approach. J. Intell. Inf. Syst. 35 (3), pp. 447–463. Cited by: §3.5.2.
  • [368] A. Ouali, A. Zimmermann, S. Loudni, Y. Lebbah, B. Crémilleux, P. Boizumault, and L. Loukil (2017) Integer linear programming for pattern set mining; with an application to tiling. In Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part II, pp. 286–299. Cited by: §4.7.
  • [369] A. Paliwal, S. M. Loos, M. N. Rabe, K. Bansal, and C. Szegedy (2019) Graph representations for higher-order logic and theorem proving. CoRR abs/1905.10006. External Links: Link, 1905.10006 Cited by: §4.8.1.
  • [370] R. B. Palm, U. Paquet, and O. Winther (2018) Recurrent relational networks. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 3372–3382. External Links: Link Cited by: §4.8.1.
  • [371] P. Panda and K. Roy (2018) Explainable learning: implicit generative modelling during training for adversarial robustness. CoRR abs/1807.02188. External Links: Link, 1807.02188 Cited by: §4.8.2.
  • [372] E. Parisotto, A. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli (2017) Neuro-symbolic program synthesis. See 4, External Links: Link Cited by: §4.8.3.
  • [373] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal (1999) Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24, pp. 25–46. Cited by: §3.4.
  • [374] Z. Pawlak (1991) Rough sets. theoretical aspects of. reasoning about data. Kluwer Acad. Publ., Dordrecht. Cited by: §5.
  • [375] J. Pearl, M. Glymour, and N. P. Jewell (2016) Causal inference in statistics: a primer. John Wiley & Sons. Cited by: §2.4.
  • [376] J. Pearl and D. Mackenzie (2018) The book of why: the new science of cause and effect. Basic Books. Cited by: §2.4, §3.7.
  • [377] J. Pearl (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. Cited by: §2.3, §4.9.
  • [378] J. Pearl (2009) Causality. Cambridge university press. Cited by: §2.4.
  • [379] W. Pedrycz (1998) Conditional fuzzy clustering in the design of radial basis function neural networks. IEEE Trans. Neural Networks 9 (4), pp. 601–612. Cited by: §3.5.2.
  • [380] G. Pinkas and S. Cohen (2019) High-order networks that learn to satisfy logic constraints. Journal of Applied Logics - IfCoLog Journal of Logics and their Applications 6 (4), pp. 653–694. External Links: Link Cited by: §3.3.
  • [381] G. Pinkas (1991) Symmetric neural networks and propositional logic satisfiability. Neural Computation 3 (2), pp. 282–291. External Links: Document Cited by: §3.3.
  • [382] G. Pinkas (1995) Reasoning, nonmonotonicity and learning in connectionist networks that capture propositional knowledge. Artificial Intelligence 77 (2), pp. 203 – 247. Cited by: §3.3.
  • [383] M. Pirlot and H. Prade (2018) Predicting preferences by means of analogical proportions. In Proc. 26th Int. Conf. on Case-Based Reason- ing Research (ICCBR’18), Stockholm, July 9-12, M.T. Cox, P. Funk, and S. Begum (Eds.), LNCS, Vol. 11156, pp. 515–531. Cited by: §3.7, §4.9.
  • [384] G. Plotkin (1970) A note on inductive generalization. In Machine Intelligence, Vol. 5, pp. 153–163. Cited by: §3.2.
  • [385] H. Prade and G. Richard (2013) From analogical proportion to logical proportions. Logica Universalis 7 (4), pp. 441–505. Cited by: §3.7.
  • [386] H. Prade and G. Richard (2018) Analogical proportions: From equality to inequality. Int. J. Approx. Reasoning 101, pp. 234–254. Cited by: §3.7.
  • [387] H. Prade, A. Rico, and M. Serrurier (2009) Elicitation of Sugeno integrals: A version space learning perspective. In Proc. 18th Int. Symp. on Foundations of Intelligent Systems (ISMIS’09), Prague, Sept. 14-17, J. Rauch, Z. W. Ras, P. Berka, and T. Elomaa (Eds.), LNCS, Vol. 5722, pp. 392–401. Cited by: §3.5.3.
  • [388] H. Prade, A. Rico, M. Serrurier, and E. Raufaste (2009) Elicitating Sugeno integrals:Methodology and a case study. In Proc. 10th Eur. Conf. on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU’09), Verona, July 1-3, C. Sossai and G. Chemello (Eds.), LNCS, Vol. 5590, pp. 712–723. Cited by: §3.5.3.
  • [389] H. Prade (2016) Reasoning with data - A new challenge for AI?. In Proc. 10th Int. Conf. on Scalable Uncertainty Management(SUM’16), Nice, Sept. 21-23, S. Schockaert and P. Senellart (Eds.), LNCS, Vol. 9858, pp. 274–288. Cited by: §1.
  • [390] M. O. R. Prates, P. H. C. Avelar, H. Lemos, L. C. Lamb, and M. Y. Vardi (2019) Learning to solve NP-complete problems: A graph neural network for decision TSP. See 454, pp. 4731–4738. External Links: Link, Document Cited by: §4.8.1.
  • [391] (1995) Proceedings of the fourteenth international joint conference on artificial intelligence (IJCAI 95). Morgan Kaufmann. Cited by: 420.
  • [392] J. Pujara, T. Rocktäschel, D. Chen, and S. Singh (Eds.) (2016) Proc. 5th workshop on automated knowledge base construction, akbc@naacl-hlt 2016, san diego, june 17. The Association for Computer Linguistics. External Links: Link, ISBN 978-1-941643-53-2 Cited by: 403.
  • [393] L. Pulina and A. Tacchella (2010) An abstraction-refinement approach to verification of artificial neural networks. See Computer aided verification, proc. 22nd int. conf. CAV’10, edinburgh, july 15-19, Touili et al., pp. 243–257. External Links: Link, Document Cited by: §4.8.2.
  • [394] C. Qin, K. (. Dvijotham, B. O’Donoghue, R. Bunel, R. Stanforth, S. Gowal, J. Uesato, G. Swirszcz, and P. Kohli (2019) Verification of non-linear specifications for neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.2.
  • [395] J. R. Quinlan (1996) Learning first-order definitions of functions. CoRR cs.AI/9610102. Cited by: §3.2.
  • [396] B. Quost, T. Denœux, and S. Li (2017-12-01) Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression. Advances in Data Analysis and Classification 11 (4), pp. 659–690. Cited by: 1st item.
  • [397] B. Quost, M. Masson, and T. Denœux (2011) Classifier fusion in the Dempster-Shafer framework using optimized t-norm based combination rules. Int. J. of Approximate Reasoning 52 (3), pp. 353–374. Cited by: §4.1.
  • [398] M. T. Ribeiro, S. Singh, and C. Guestrin (2016) “Why should I trust you?": explaining the predictions of any classifier. See Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016, Krishnapuram et al., pp. 1135–1144. External Links: Link, Document Cited by: §2.4, §2.4, §4.8.2, footnote 2.
  • [399] M. T. Ribeiro, S. Singh, and C. Guestrin (2018) Anchors: high-precision model-agnostic explanations. In Proc. 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Feb. 2-7, S. A. McIlraith and K. Q. Weinberger (Eds.), pp. 1527–1535. Cited by: §4.8.2, footnote 2.
  • [400] M. Richardson and P. M. Domingos (2006) Markov logic networks. Machine Learning 62 (1-2), pp. 107–136. Cited by: §3.2, §3.5.1.
  • [401] S. Riedel, L. Yao, A. McCallum, and B. M. Marlin (2013) Relation extraction with matrix factorization and universal schemas. In Proceedings of HLT-NAACL, pp. 74–84. Cited by: §4.6.
  • [402] S. Riedel, L. Yao, and A. McCallum (2010) Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Cited by: §4.6.
  • [403] T. Rocktäschel and S. Riedel (2016) Learning knowledge base inference with neural theorem provers. See Proc. 5th workshop on automated knowledge base construction, akbc@naacl-hlt 2016, san diego, june 17, Pujara et al., pp. 45–50. External Links: Link Cited by: §4.8.3.
  • [404] T. Rocktäschel and S. Riedel (2017) End-to-end differentiable proving. See Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, Guyon et al., pp. 3788–3800. External Links: Link Cited by: §4.8.3.
  • [405] G. Rogova (1994) Combining the results of several neural network classifiers. Neural Networks 7 (5), pp. 777–781. Cited by: §4.1.
  • [406] F. Rosenblatt (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65 (6), pp. 386–408. Cited by: 1st item.
  • [407] A. S. Ross and F. Doshi-Velez (2018) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proc. 32nd AAAI Conf. on Artificial Intelligence, (AAAI-18), New Orleans, Feb. 2-7, pp. 1660–1669. Cited by: §4.8.2.
  • [408] M.-Ch. Rousset and B. Safar (1987) Negative and positive explanations in expert. Applied Artificial Intelligence 1 (1), pp. 25–38. Cited by: §2.4.
  • [409] W. Ruan, X. Huang, and M. Kwiatkowska (2018) Reachability analysis of deep neural networks with provable guarantees. See Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden, Lang, pp. 2651–2659. External Links: Link, Document Cited by: §4.8.2.
  • [410] C. Rudin (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206–215. Cited by: §4.8.2.
  • [411] B. Russell (1912) The problems of philosophy. chap. vi. on induction. Home Univ. Libr.; Oxford Univ. Pr., 1959. Cited by: §3.7.
  • [412] S. J. Russell (2015) Unifying logic and probability. Commun. ACM 58 (7), pp. 88–97. Cited by: §2.3.
  • [413] H. X. S. H. Huanga (2002) Extract intelligible and concise fuzzy rules from neural networks. Fuzzy Sets and Systems 132, pp. 233–243. Cited by: §3.5.2.
  • [414] D. Salvagnin and M. Lombardi (Eds.) (2017) Integration of AI and OR techniques in constraint programming - proc. 14th int. conf., CPAIOR 2017, padua, italy, june 5-8. LNCS, Vol. 10335, Springer. External Links: Link, Document, ISBN 978-3-319-59775-1 Cited by: 466.
  • [415] W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K. Müller (Eds.) (2019) Explainable AI: interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science, Vol. 11700, Springer. External Links: Document, ISBN 978-3-030-28953-9 Cited by: footnote 2, 416.
  • [416] W. Samek and K. Müller (2019) Towards explainable artificial intelligence. See Explainable AI: interpreting, explaining and visualizing deep learning, Samek et al., pp. 5–22. External Links: Document Cited by: footnote 2.
  • [417] D. Saxton, E. Grefenstette, F. Hill, and P. Kohli (2019) Analysing mathematical reasoning abilities of neural models. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.3.
  • [418] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The graph neural network model. IEEE Trans. Neural Networks 20 (1), pp. 61–80. External Links: Link, Document Cited by: §4.8.1.
  • [419] T. Schaub, G. Friedrich, and B. O’Sullivan (Eds.) (2014) Proceedings of th 21st european conference on artificial intelligence (ecai 2014). Frontiers in Artificial Intelligence and Applications, Vol. 263, IOS Press. Cited by: 478.
  • [420] T. Schiex, H. Fargier, and G. Verfaillie (1995) Valued constraint satisfaction problems: hard and easy problems. See 391, pp. 631–639. External Links: Link Cited by: §4.9.
  • [421] S. Schockaert and H. Prade (2013) Interpolative and extrapolative reasoning in propositional theories using qualitative knowledge about conceptual spaces. Artif. Intell. 202, pp. 86–131. Cited by: §2.3.
  • [422] S. Schulz, L. de Moura, and B. Konev (Eds.) (2015) 4th workshop on practical aspects of automated reasoning, paar@ijcar 2014, vienna, 2014. EPiC Series in Computing, Vol. 31, EasyChair. Cited by: 275.
  • [423] J. Schumann and S. D. Nelson (2002) Toward v&v of neural network based controllers. See Proceedings of the first workshop on self-healing systems, WOSS 2002, charleston, south carolina, usa, november 18-19, 2002, Garlan et al., pp. 67–72. External Links: Link, Document Cited by: §4.8.2.
  • [424] D. Schuurmans and M. P. Wellman (Eds.) (2016) Proc. 30th AAAI conference on artificial intelligence, february 12-17, 2016, phoenix. AAAI Press. External Links: Link, ISBN 978-1-57735-760-5 Cited by: 314.
  • [425] D. Selsam and N. Bjørner (2019) Guiding high-performance SAT solvers with unsat-core predictions. See Theory and applications of satisfiability testing - SAT 2019 - proc. 22nd int. conf. SAT’19, lisbon, july 9-12, Janota and Lynce, pp. 336–353. External Links: Link, Document Cited by: §4.8.1.
  • [426] D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill (2018) Learning a SAT solver from single-bit supervision. CoRR abs/1802.03685. External Links: Link, 1802.03685 Cited by: §4.8.1.
  • [427] D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill (2019) Learning a SAT solver from single-bit supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.1.
  • [428] L. Serafini and A. S. d’Avila Garcez (2016) Learning and reasoning with logic tensor networks. In AI*IA 2016: Advances in Artificial Intelligence - Proc. XVth Int. Conf. of the Italian Association for Artificial Intelligence, Genova, Nov. 29 - Dec. 1, G. Adorni, S. Cagnoni, M. Gori, and M. Maratea (Eds.), LNCS, Vol. 10037, pp. 334–348. Cited by: §3.3.
  • [429] M. Serrurier, D. Dubois, H. Prade, and T. Sudkamp (2007) Learning fuzzy rules with their implication operators. Data Knowl. Eng. 60 (1), pp. 71–89. Cited by: §3.5.2.
  • [430] M. Serrurier and H. Prade (2007) Introducing possibilistic logic in ILP for dealing with exceptions. Artif. Intell. 171 (16-17), pp. 939–950. Cited by: §3.5.1.
  • [431] M. Serrurier and H. Prade (2013) An informational distance for estimating the faithfulness of a possibility distribution, viewed as a family of probability distributions, with respect to data. Int. J. Approx. Reasoning 54 (7), pp. 919–933. Cited by: §4.2.
  • [432] M. Serrurier and H. Prade (2015) Entropy evaluation based on confidence intervals of frequency estimates : application to the learning of decision trees. In Proc. 32nd Int. Conf. on Machine Learning (ICML’15), Lille, July 6-11, F. R. Bach and D. M. Blei (Eds.), JMLR Workshop and Conference Proceedings, Vol. 37, pp. 1576–1584. Cited by: §4.2.
  • [433] S. A. Seshia, A. Desai, T. Dreossi, D. J. Fremont, S. Ghosh, E. Kim, S. Shivakumar, M. Vazquez-Chanlatte, and X. Yue (2018) Formal specification for deep neural networks. See Automated technology for verification and analysis - proc. 16th int. symp. ATVA(18, los angeles, oct. 7-10, Lahiri and Wang, pp. 20–34. External Links: Link, Document Cited by: §4.8.2.
  • [434] G. Shafer and V. Vovk (2008) A tutorial on conformal prediction. Journal of Machine Learning Research 9 (Mar), pp. 371–421. Cited by: 2nd item.
  • [435] G. Shafer (1976) A mathematical theory of evidence. Princeton University Press, Princeton, N.J.. Cited by: 1st item, §4.1, §4.1.
  • [436] S. Shalev-Shwartz and S. Ben-David (2014) Understanding machine learning. From theory to algorithms. Cambridge University Press. Cited by: §1.
  • [437] J. Shen, Y. Qu, W. Zhang, and Y. Yu (2017) Wasserstein distance guided representation learning for domain adaptation. arXiv preprint arXiv:1707.01217. Cited by: §3.7.
  • [438] P. P. Shenoy (1994) Conditional independence in valuation-based systems. Int. J. Approx. Reasoning 10 (3), pp. 203–234. Cited by: §2.3.
  • [439] A. Shih, A. Choi, and A. Darwiche (2018) A symbolic approach to explaining bayesian network classifiers. See Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, july 13-19, 2018, stockholm, sweden, Lang, pp. 5103–5111. External Links: Link, Document Cited by: §4.8.2.
  • [440] C. Sierra (Ed.) (2017) Proc. of the 26th int. joint conf. on artificial intelligence (IJCAI’17). ijcai.org. External Links: ISBN 978-0-9992411-0-3, Link Cited by: 148.
  • [441] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. T. Vechev (2018) Fast and effective robustness certification. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 10825–10836. External Links: Link Cited by: §4.8.2.
  • [442] G. Singh, T. Gehr, M. Püschel, and M. T. Vechev (2019) An abstract domain for certifying neural networks. PACMPL 3 (POPL), pp. 41:1–41:30. External Links: Link, Document Cited by: §4.8.2.
  • [443] G. Singh, T. Gehr, M. Püschel, and M. T. Vechev (2019) Boosting robustness certification of neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §4.8.2.
  • [444] R. Singh, J. P. Near, V. Ganesh, and M. Rinard (2009) AvatarSAT: an auto-tuning boolean SAT solver. Technical report Technical Report MIT-CSAIL-TR-2009-039, MIT. Cited by: §4.8.1.
  • [445] C. Sinz and U. Egly (Eds.) (2014) Proceedings of the 17th international conference on theory and applications of satisfiability testing (SAT 2014). Lecture Notes in Computer Science, Vol. 8561, Springer. External Links: Document, ISBN 978-3-319-09283-6 Cited by: 337.
  • [446] R. Socher, D. Chen, C. D. Manning, and A. Y. Ng (2013) Reasoning with neural tensor networks for knowledge base completion. See Proceedings of the 27th annual conference on neural information processing systems (NIPS 2013), Burges et al., pp. 926–934. External Links: Link Cited by: §3.3.
  • [447] G. Sourek, M. Svatos, F. Zelezný, S. Schockaert, and O. Kuzelka (2017) Stacked structure learning for lifted relational neural networks. In Inductive Logic Programming - 27th International Conference, ILP 2017, Orléans, France, September 4-6, 2017, Revised Selected Papers, pp. 140–151. Cited by: §3.2.
  • [448] J. F. Sowa (1984) Conceptual structures : information processing in mind and machine. Addison-Wesley, . Cited by: §2.1.
  • [449] N. Stroppa and F. Yvon (2005) Analogical learning and formal proportions: Definitions and methodological issues. Note: Technical Report D004, ENST-Paris Cited by: §3.7.
  • [450] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. See 2nd international conference on learning representations, ICLR 2014, banff, ab, canada, april 14-16, 2014, conference track proceedings, Bengio and LeCun, External Links: Link Cited by: §4.8.2.
  • [451] T. Takagi and M. Sugeno (1985) Fuzzy identification of systems and its application to modelling and control. IEEE Trans. Systems, Man, and Cybernetics 15 (1), pp. 11)–132. Cited by: §3.5.2.
  • [452] G. Tao, S. Ma, Y. Liu, and X. Zhang (2018) Attacks meet interpretability: attribute-steered detection of adversarial samples. See Advances in neural information processing systems 31: annual conf. on neural information processing systems (neurips’18), dec. 3-8, montréal, Bengio et al., pp. 7728–7739. External Links: Link Cited by: §4.8.2.
  • [453] P. R. Thagard (1978) The best explanation: Criteria for theory choice. The Journal of Philosophy 75 (2), pp. 76–92. Cited by: §2.4.
  • [454] (2019) The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, honolulu, hawaii, usa, january 27 - february 1, 2019. AAAI Press, AAAI. External Links: Link, ISBN 978-1-57735-809-1 Cited by: 261, 390, 467.
  • [455] S. Thrun, L. K. Saul, and B. Schölkopf (Eds.) (2003) Advances in neural information processing systems 16 (NIPS 2