From Common Sense Reasoning to Neural Network Models through Multiple Preferences: an overview

07/10/2021 ∙ by Laura Giordano, et al. ∙ 0

In this paper we discuss the relationships between conditional and preferential logics and neural network models, based on a multi-preferential semantics. We propose a concept-wise multipreference semantics, recently introduced for defeasible description logics to take into account preferences with respect to different concepts, as a tool for providing a semantic interpretation to neural network models. This approach has been explored both for unsupervised neural network models (Self-Organising Maps) and for supervised ones (Multilayer Perceptrons), and we expect that the same approach might be extended to other neural network models. It allows for logical properties of the network to be checked (by model checking) over an interpretation capturing the input-output behavior of the network. For Multilayer Perceptrons, the deep network itself can be regarded as a conditional knowledge base, in which synaptic connections correspond to weighted conditionals. The paper describes the general approach, through the cases of Self-Organising Maps and Multilayer Perceptrons, and discusses some open issues and perspectives.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Preferential approaches [49, 59, 52] to common sense reasoning, having their roots in conditional logics [54, 58], have been recently extended to description logics, to deal with inheritance with exceptions in ontologies, allowing for non-strict forms of inclusions, called typicality or defeasible inclusions (namely, conditionals), with different preferential semantics [25, 10] and closure constructions [13, 12, 26, 60], allowing for defeasible or typicality inclusions, e.g., of the form , meaning “the typical s are s” or “normally s are s”, corresponding, in the propositional case, to the conditionals in Kraus, Lehmann and Magidor’s (KLM) preferential approach [49, 52]. Description logics allow for a limited first-order language. A first-order extension of system Z has also been explored [4].

In this paper we consider “concept-wise” a multi-preferential semantics, recently introduced by Giordano and Theseider Dupré [29] to capture preferences with respect to different aspects (concepts) in ranked knowledge bases, and describe how it has been used as a semantics for some neural network models. We have considered both an unsupervised model, Self-Organising Maps, and a supervised one, Multilayer Perceptrons.

Self-organising maps (SOMs) are psychologically and biologically plausible neural network models [47] that can learn after limited exposure to positive category examples, without need of contrastive information. They have been proposed as possible candidates to explain the psychological mechanisms underlying category generalisation. Multilayer Perceptrons (MLPs) [35] are deep networks. Learning algorithms in the two cases are quite different but, in this work, we only aim to capture, through a semantic interpretation, the behavior of the network resulting after training and not to model learning. We will see that this can be accomplished in both cases in a similar way, based on a multi-preferential semantics.

The result of the training phase is represented very differently in the two models: for SOMs it is given by a set of units spatially organized in a grid (where each unit

in the map is associated with a weight vector

of the same dimensionality as the input vectors); for MLPs, as a result of training, the weights of the synaptic connections have been learned. In both cases, considering the domain of all input stimuli presented to the network during training (or in the generalization phase), one can build a semantic interpretation describing the input-output behavior of the network as a multi-preference interpretation, where preferences are associated to concepts. For SOMs, the learned categories are regarded as concepts

so that a preference relation (over the domain of input stimuli) is associated to each category. In case of MLPs, each neuron in the deep network (including hidden neurons) is associated to a concept and a preference relation is associated to it.

In both cases, the preferential model resulting from the network after training describes the input-output behavior of the network on the input stimuli considered, and the preference relations define a notion of typicality (with respect to different concepts/categories) on the domain of input stimuli. For instance, given two input stimuli and , the model can assign to a degree of typicality which is higher than the degree of typicality of with respect to some category , so that is regarded as a being more typical than as a horse (), while vice-versa can be regarded as a being more typical than as a zebra (). The preferential interpretation can be used for checking properties like: are the instances of a category also instances of category ? are typical instances of a category also instances of category ? This verification can be done by model-checking given multipreference interpretation describing the input-output behavior of the network [28].

This kind of construction establishes a strong relationship between the logics of commonsense reasoning and the neural network models, as the first ones are able to reason about the properties of the second ones. The relationship can be made even stronger in some cases, e.g., for MLPs, when the neural network itself can be seen as a conditional knowledge base. In [31]

, the concept-wise multipreference semantics has been adapted to deal with weighted knowledge bases, where typicality inclusions have a weight, a real (positive or negative) number, representing the plausibility of the typicality inclusions. It has been proven that Multilayer Perceptrons can be regarded as weighted conditional knowledge bases under a fuzzy extension of the multipreference semantics. The multipreference interpretation which can be built over the set of input stimuli to describe the input-output behavior of the deep network can be proven to be a coherent fuzzy multipreference model of such a knowledge base (under some condition on the activation functions).

This approach rises several issues, from the standpoint of knowledge representation, from the standpoint of neuro-symbolic integration, as well as from the standpoint of explainable AI [1, 34, 2]. We will discuss some of these issues in the paper after describing the approach in some detail.

2 A concept-wise multi-preference semantics

In this section we shortly describe an extension of with typicality based on the same language as the typicality logics [25, 26], but on a different concept-wise multipreference semantics first introduced for [29].

We consider the description logic Let be a set of concept names, a set of role names and a set of individual names. The set of concepts can be defined as follows: , where , and . A knowledge base (KB) is a pair , where is a TBox and is an ABox. The TBox is a set of concept inclusions (or subsumptions) of the form , where are concepts. The ABox is a set of assertions of the form and where is a concept, , and .

In addition to standard inclusions (called strict inclusions in the following), the TBox also contains typicality inclusions of the form , where and are concepts and is a new concept constructor (and is called a typicality concept). A typicality inclusion means that “typical s are s” or “normally s are s” and corresponds to a conditional implication in Kraus, Lehmann and Magidor’s (KLM) preferential approach [49, 52]. Such inclusions are defeasible, i.e., admit exceptions, while strict inclusions must be satisfied by all domain elements.

Let be a set of distinguished concepts. For each concept , we introduce a modular preference relation which describes the preference among domain elements with respect to . Each preference relation has the same properties of preference relations in KLM-style ranked interpretations [52], i.e., it is a modular and well-founded strict partial order (an irreflexive and transitive relation), where: is well-founded if, for all , if , then ; and is modular if, for all , if then ( or ).

Definition 1 (Multipreference interpretation)

A multipreference interpretation is a tuple , where:     is a non-empty domain;

  • is an irreflexive, transitive, well-founded and modular relation over ;

  • is an interpretation function, as in an interpretation that maps each concept name to a set , each role name to a binary relation , and each individual name to an element . It is extended to complex concepts as follows: , , , and , and .

The preference relation allows the set of prototypical -elements to be defined as the -elements which are minimal with respect to , i.e., . As a consequence, the multipreference interpretation above is able to single out the typical -elements, for all distinguished concepts .

The multipreference structures above are at the basis of the semantics for ranked knowledge bases [29], which have been inspired by Brewka’s framework of basic preference descriptions [7]. While we refer to [29] for the construction of the preference relations ’s from a ranked knowledge base , in the following we shortly recall the notion of concept-wise multi-preference interpretation which can be obtained by combining the preference relations into a global preference relation . This is needed for reasoning about the typicality of arbitrary concepts , which do not belong to the set of distinguished concepts . For instance, we may want to verify whether typical employed students are young, or whether they have a boss, starting from a ranked KB containing inclusions , , , and . To answer the query above both preference relations and are relevant, and they might be conflicting as, for instance, Tom is more typical than Bob as a student (), but more exceptional as an employee ( ). By combining the preference relations into a single global preference relation we can exploit for interpreting the typicality operator, which may be applied to arbitrary concepts, and verify, for instance, whether .

A natural definition of the notion of global preference exploits Pareto combination of the relations , as follows:

where is the non-strict preference relation associated with ( is a total preorder). A slightly more sophisticated notion of preference combination, which exploits a modified Pareto condition taking into account the specificity relation among concepts (such as, for instance, the fact that concept is more specific than concept ), has been considered for ranked knowledge bases [29].

The addition of the global preference relation allows for defining a notion of concept-wise multipreference interpretation , where typicality concept is interpreted as the set of the -minimal elements, i.e., , where and s.t. .

The notions of cw-model of a ranked knowledge base , and of cw-entailment can be defined in the natural way. In particular, cw-entailment has been proven to be -complete for ranked knowledge bases and to satisfy the KLM postulates of a preferential consequence relation [29].

3 A multi-preferential interpretation of Self-organising maps

In this section, we report about the multi-preferential semantics for SOMs proposed in [27], and later extended to fuzzy interpretations and to probabilistic interpretations in [28].

Self-organising maps, introduced by Kohonen [47], are particularly plausible neural network models that learn in a human-like manner. In this section we shortly describe the architecture of SOMs and report about Gliozzi and Plunkett’s similarity-based account of category generalization based on SOMs [33].

SOMs consist of a set of neurons, or units, spatially organized in a grid [47]. Each map unit is associated with a world representation, given by a weight vector of the same dimensionality as the input vectors. At the beginning of training, all weight vectors are initialized to random values, outside the range of values of the input stimuli. During training, the input elements are sequentially presented to all neurons of the map. After each presentation of an input , the best-matching unit (BMU) is selected: this is the unit whose weight vector is closest to the stimulus (i.e. ).

The weights of the best matching unit and of its surrounding units are updated in order to maximize the chances that the same unit (or its surrounding units) will be selected as the best matching unit for the same stimulus or for similar stimuli on subsequent presentations. In particular, it reduces the distance between the best matching unit’s weights (and its surrounding neurons’ weights) and the incoming input. The learning process is incremental: after the presentation of each input, the map’s representation of the input (in particular the representation of its best-matching unit) is updated in order to take into account the new incoming stimulus. At the end of the whole process, the SOM has learned to organize the stimuli in a topologically significant way: similar inputs (with respect to Euclidean distance) are mapped to close by areas in the map, whereas inputs which are far apart from each other are mapped to distant areas of the map.

Once the SOM has learned to categorize, to assess category generalization, Gliozzi and Plunkett [33] define the map’s disposition to consider a new stimulus as a member of a known category as a function of the distance of from the map’s representation of . They use to refer to the map’s representation of category and define category generalization as depending on the distance of the new stimulus with respect to the category representation compared to the maximal distance from that representation of all known instances of the category. This is captured by the following notion of relative distance (rd for short) [33] :

(1)

where is the (minimal) Euclidean distance between and ’s category representation, and expresses the precision of category representation, and is the (maximal) Euclidean distance between any known member of the category and the category representation.

By judging a new stimulus as belonging to a category by comparing the distance of the stimulus from the category representation to the precision of the category representation, Gliozzi and Plunkett demonstrate [33] that the Numerosity and Variability effects of category generalization, described by Griffiths and Tenenbaum [64], and usually explained with Bayesian tools, can be accommodated within a simple and psychologically plausible similarity-based account. Their notion of relative distance can as well be used as a basis for a logical semantics for SOMs.

3.1 Relating self-organising Maps and multi-preference models

Once the SOM has learned to categorize, we can regard the result of the categorization as a multipreference interpretation. Let be the set of input stimuli from different categories, , which have been considered during the learning process. For each category , we let be the ensemble of best-matching units corresponding to the input stimuli of category , i.e., . We regard the learned categories as being the concept names (atomic concepts) in the description logic and we let them constitute our set of distinguished concepts .

To construct a multi-preference interpretation, first we fix the domain to be the space of all possible stimuli; then, for each category (concept) , we define a preference relation , exploiting the notion of relative distance of a stimulus from the map’s representation of . Finally, we define the interpretation of concepts.

Let be the set of all the possible stimuli, including all input stimuli () as well as the best matching units of input stimuli (i.e., ). For simplicity, we will assume the space of input stimuli to be finite.

Once the SOM has learned to categorize, the notion of relative distance of a stimulus from a category can be used to build a binary preference relation among the stimuli in w.r.t. category as follows: for all ,

(2)

Each preference relation is a strict partial order relation on . The relation is also well-founded, as we have assumed to be finite.

We exploit this notion of preference to define a concept-wise multipreference interpretation associated with the SOM. We restrict the DL language to the fragment of (plus typicality), not admitting roles.

Definition 2 (multipreference-model of a SOM)

The multipreference-model of the SOM is a multipreference interpretation such that:

  • is the set of all the possible stimuli, as introduced above;

  • for each , is the preference relation defined by equivalence (2).

  • the interpretation function is defined for concept names (i.e. categories) as:

    where is the maximal relative distance of an input stimulus from category , that is, . The interpretation function is extended to complex concepts in the fragment of according to Definition 1.

Informally, we interpret as -elements those stimuli whose relative distance from category is not larger than the relative distance of any input exemplar belonging to category . Given , we can identify the most typical -elements wrt as the -elements whose relative distance from category is minimal, i.e., the elements in . Observe that the best matching unit of an input stimulus is an element of . As, for , is , .

3.2 Evaluation of concept inclusions by model checking

We have defined a multipreference interpretation where, in the domain of the possible stimuli, we are able to identify, for each category , the -elements as well as the most typical -elements wrt . We can exploit to verify which inclusions are satisfied by the SOM by model checking, i.e., by checking the satisfiability of inclusions over model . This can be done both for strict concept inclusions of the form and for defeasible inclusions of the form , where and are concept names (i.e., categories), by exploiting a notion of maximal relative distance of from , defined as .

We refer to [27, 28] for details. Let us observe that checking the satisfiability of strict or defeasible inclusions on the SOM may be non trivial, depending on the number of input stimuli that have been considered in the learning phase, although from a logical point of view, this is just model checking. Gliozzi and Plunkett have considered self-organising maps that are able to learn from a limited number of input stimuli, although this is not generally true for all self-organising maps [33].

Note also that the multipreference interpretation introduced in Definition 2 allows to determine the set of -elements for all learned categories and to define the most typical -elements, exploiting the preference relation . Although, we are not able to define, for instance, the most typical -elements just using single preferences, starting from , we can construct a concept-wise multipreference interpretation that combines the preferential relations in into a global preference relation , and provides an intepretation to all typicality concepts as . The interpretation can be constructed from according to the definition of the global preference in Section 2.

As an alternative to a multipreference semantics for SOMs, a fuzzy semantics has also been considered [28], based on fuzzy Description Logics [56]

, as well as a related probabilistic account exploiting Zadeh’s probability of fuzzy events

[69].

Our work has focused on the multipreference interpretation of a self-organising map after the learning phase. However, the state of the SOM during the learning phase can as well be represented as a multipreference model (in the same way). During training, the current state of the SOM corresponds to a model representing the beliefs about the input stimuli considered so far (beliefs concerning the category of the stimuli). One can regard the category generalization process as a model building process and, in a way, as a belief change process. For future work, it would be interesting to study the properties of this notion of change and compare it with the notions of change studied in the literature [20, 21, 41, 40].

4 A multi-preferential interpretation of a deep neural network

Let us first recall from [35] the model of a neuron as an information-processing unit in an (artificial) neural network. The basic elements are the following:

  • a set of synapses or connecting links, each one characterized by a weight. We let

    be the signal at the input of synapse

    connected to neuron , and the related synaptic weight;

  • the adder for summing the input signals to the neuron, weighted by the respective synapses weights: ;

  • an activation function for limiting the amplitude of the output of the neuron (typically, to the interval or ).

The sigmoid, threshold and hyperbolic-tangent functions are examples of activation functions. A neuron can be described by the following pair of equations: , and , where are the input signals and are the weights of neuron ; is the bias, the activation function, and is the output signal of neuron . By adding a new synapse with input and synaptic weight , one can write: , and , where is called the induced local field of the neuron. The neuron can be represented as a directed graph, where the input signals and the output signal of neuron are nodes of the graph. An edge from to , labelled , means that is an input signal of neuron with synaptic weight .

Neural network models are classified by their synaptic connection topology. In a

feedforward network the architectural graph is acyclic, while in a recurrent network it contains cycles. In a feedforward network neurons are organized in layers. In a single-layer network there is an input-layer of source nodes and an output-layer of computation nodes. In a multilayer feedforward network there is one or more hidden layer, whose computation nodes are called hidden neurons (or hidden units). The source nodes in the input-layer supply the activation pattern (input vector) providing the input signals for the first layer computation units. In turn, the output signals of first layer computation units provide the input signals for the second layer computation units, and so on, up to the final output layer of the network, which provides the overall response of the network to the activation pattern. In a recurrent network at least one feedback exists.

4.1 A (two-valued) multipreference interpretation of multilayer perceptrons

In the following, we do not put restrictions on the topology the network, and we consider a network after training, when the synaptic weights have been learned. We associate a concept name to any unit in (including input units and hidden units) and construct a multi-preference interpretation over a (finite) domain of input stimuli, the input vectors considered so far, for training and generalization. In case the network is not feedforward, we assume that, for each input vector in , the network reaches a stationary state [35], in which

is the activity level of unit

. In essence, we are not considering the transient behavior of the network, but rather it behavior at stationary states.

Let be a subset of concepts in , the concepts associated to the units we are focusing on (e.g., might be associated to the set of output units, or to all units). We associate to and a (two-valued) concept-wise multipreference interpretation over the boolean fragment of (with no roles or individual names).

Definition 3

The cwinterpretation over for network wrt is a cw-interpretation where:

  • the interpretation function is defined for named concepts as: if , and if ;

  • for , relation is defined for as: iff , where is the output signal of unit for input vectors .

The relation is a strict partial order, and and are defined as usual. In particular, for . Clearly, the boundary between the domain elements which are in and those which are not could be defined differently, e.g., by letting if , and if . This would require only a minor change in the definition of the .

This model provides a multipreference interpretation of the network , based on the input stimuli considered in . For instance, when the neural network is used for categorization and a single output neuron is associated to each category, each concept associated to an output unit corresponds to a learned category. If , the preference relation determines the relative typicality of input stimuli wrt category . This allows to verify typicality properties concerning categories, such as (where is a boolean concept built from the named concepts in ), by model checking on the model .

Evaluating properties involving hidden units might be of interest, although their meaning is usually unknown. In the well known Hinton’s family example [36], one may want to verify whether, normally, given an old Person 1 and relationship Husband, Person 2 would also be old, i.e., is satisfied. Here, concept (resp., ) is associated to a (known, in this case) hidden unit for Person 1 (and Person 2), while Husband is associated to an input unit.

4.2 From a two-valued to a fuzzy preferential interpretation of multilayer perceptrons

The definition of a fuzzy model of a neural network , under the same assumptions as in previous section is straightforward. In a fuzzy DL interpretation [56] concepts can be interpreted as fuzzy sets, and the fuzzy interpretation function assigns to each concept a function . For a domain element , represents the degree of membership of in concept .

Let be the set containing a concept name for each unit in , including hidden units. Let us restrict to the boolean fragment of with no individual names. A fuzzy interpretation for [31] is defined as follows:

  • is a (finite) set of input stimuli;

  • the interpretation function is defined for named concepts as: , ; where is the output signal of neuron , for input vector .

The verification that a fuzzy axiom is satisfied in the model , can be done based on satisfiability in fuzzy DLs, according to the choice of the t-norm and implication function. It requires to be recorded for all and . Of course, one could restrict to the concepts associated to input and output units in , so to capture the input/output behavior of the network.

The fuzzy interpretation above, induces a preference relation over the domain as, for all , iff . Based on this idea, a fuzzy multipreference interpretation over can be associated to the network starting from . In a fuzzy multipreference interpretation a typicality concept can be interpreted as a crisp concept having the value for the minimal -elements in the domain with respect to the preference relation , and otherwise. This relation is well-founded if we restrict to finite models (as we do), or to witnessed models, as usual in fuzzy DLs [56].

5 Multilayer perceptrons as weighted conditional knowledge bases

The three interpretations considered above for Multilayer Perceptrons describe the input-output behavior of the network, and allow for the verification of properties by model-checking. The last one, is, in essence, a combination of the first two, and can be proved to be a model of the neural network when it is regarded as a weighted conditional knowledge base.

In this section, we report the notion of a weighted conditional knowledge base for from [31], and we describe how a weighted conditional knowledge base can be associated to a deep network . We give some hint about its two-valued and fuzzy multipreference semantics, and we refer to [31] for a detailed description.

5.1 Weighted conditional knowledge bases

Weighted knowledge bases are knowledge bases in which defeasible or typicality inclusions of the form are given a positive or negative weight (a real number).

A weighted knowledge base , over a set of distinguished concepts, is a tuple , where is a set of fuzzy inclusion axiom, is a set of fuzzy assertions and is a set of weighted typicality inclusions , where each inclusion has a weight , a real number. The concepts occurring on the l.h.s. of some typicality inclusion are called distinguished concepts. Arbitrary inclusions and assertions may belong to and .

Example 1

Consider the weighted knowledge base , over the set of distinguished concepts , with empty ABox and with containing the inclusions and . The weighted TBox contains the following weighted defeasible inclusions:

,   +20

,   +50

,   +50;

contains the defeasible inclusions:

,   - 70

,   +50;

,   +10;

The meaning is that a bird normally has wings, has feathers and flies, but having wings and feather (both with weight 50) for a bird is more plausible than flying (weight 20), although flying is regarded as being plausible. For a penguin, flying is not plausible (inclusion has a negative weight -70), while and being black or being grey are plausible properties of prototypical penguins, and and have both a positive weight, 50 and 10, respectively (for a penguin being black is more plausible than being grey).

A two-valued semantics for weighted DL knowledge bases has been defined by developing a semantic closure construction in the same spirit as Lehmann’s lexicographic closure [53], but more related to Kern-Isberner’s semantics of c-representations [43, 45]. In c-representations, both the sum of the weights of the verified conditionals and the sum of the penalties of falsified conditionals are considered. Here, conditionals have a single (positive or negative) weight, but negative weights can be interpreted as penalties. We consider a concept-wise construction, as we want to associate different (ranked) preferences to the different concepts. For an element in the domain , and a concept , the weight of wrt is defined as the sum of the weights of the typicality inclusions in verified by (and is when is not an instance of ). From this notion of weight of an element wrt concept , the preference relation can be defined as follows: for , iff . The higher the weight of wrt the higher is its typicality relative to . This closure construction allows for the definition of concept-wise multipreference interpretations as in Section 2.

A similar construction has been adopted in the fuzzy case. Rather then summing weights of the typicality inclusions verified in , is defined by summing the products for all , thus considering the degree of membership of in each (a value in the interval ). Furthermore, for fuzzy multipreference interpretations, a condition is needed to enforce the coherence of the values , defining the degree of membership of a domain element in a concept in a fuzzy interpretation , with the weights , which are computed from the knowledge base (given ). The requirement that, for all , iff leads to the definition of coherent fuzzy multipreference models (cf-models) of the weighted conditional knowledge base. We refer to [31] for details.

5.2 Mapping multilayer perceptrons to conditional knowledge bases

Let us now consider how a multilayer perceptron can be mapped to a weighted conditional knowledge base. For each unit , we consider all the units whose output signals are the input signals of unit , with synaptic weights . Let be the concept name associated to unit and the concept names associated to units , respectively. For each unit the following set of typicality inclusions is defined, with their associated weights:

with ,
,
with .

Given , the knowledge base extracted from network is defined as the tuple: , where and, for each , contains the set of weighted typicality inclusions associated to neuron (as defined above). is a weighted knowledge base over the set of distinguished concepts . Given a network , it can be proven that the interpretation (see Section 4.2) is a cf-model of the knowledge base , provided the activation functions of all units are monotonically increasing and have value in .

We refer to [30] for the proof. Under the given conditions on activation functions, that hold, for instance, for the sigmoid activation function, for any choice of and for any choice of the domain of input stimuli (provided that they lead to a stationary state of ), the fm-interpretation is a coherent fuzzy multipreference model of the defeasible knowledge base .

This result can be further generalized by weakening the notion of coherence of a fuzzy multipreference interpretation to a notion of faithfulness considered in [23] (called weak consistency in the technical report [30]). It has been proven that, also in the fuzzy case, the concept-wise multipreference semantics has interesting properties and satisfies most of the KLM properties, depending of their reformulation and on the on the fuzzy combination functions.

6 Conclusions

We have explored the relationships between a concept-wise multipreference semantics and two very different neural network models, Self-Organising Maps and Multilayer Perceptrons, showing that a multi-preferential semantics can be used to provide a logical model of the network behavior after training. Such a model can be used to learn or to validate conditional knowledge from the empirical data used for training and generalization, by model checking of logical properties. A two-valued KLM-style preferential interpretation with multiple preferences and a fuzzy semantics have been considered, based on the idea of associating preference relations to categories (in the case of SOMs) or to neurons (for Multilayer Perceptrons). Due to the diversity of the two models we would expect that a similar approach might be extended to other neural network models and learning approaches.

Much work has been devoted, in recent years, to the combination of neural networks and symbolic reasoning [15, 17, 16], leading to the definition of new computational models, such as Graph Neural Networks [50]

, Logic Tensor Network

[62], Recursive Reasoning Networks [38], neural-symbolic stream fusion [51]

, and to extensions of logic programming languages with neural predicates

[57, 68]. Among the earliest systems combining logical reasoning and neural learning are the KBANN [65] and the CLIP [18] systems and Penalty Logic [61], a non-monotonic reasoning formalism used to establish a correspondence with symmetric connectionist networks. The relationships between normal logic programs and connectionist network have been investigated by Garcez et al. [18, 15] and by Hitzler et al. [37].

The correspondence between neural network models and fuzzy systems has been first investigated by Bart Kosko in his seminal work [48]. In his view, “at each instant the n-vector of neuronal outputs defines a fuzzy unit or a fit vector. Each fit value indicates the degree to which the neuron or element belongs to the n-dimentional fuzzy set.” Our fuzzy interpretation of a multilayer perceptron regards, instead, each concept (representing a single neuron) as a fuzzy set. This is the usual way of viewing concepts in fuzzy DLs [63, 55, 5], and we have used fuzzy concepts within a multipreference semantics based on a semantic closure construction, in the line of Lehmann’s semantics for lexicographic closure [53] and Kern-Isberner’s c-representations [43, 45]. The multipreference semantics we have introduced for weighted conditionals appears to be a relative of c-representations, which generate the world ranks as a sum of impacts of falsified conditionals, [43, 44]. We have further considered a semantics with multiple preferences, in order to make it concept-wise: each distinguished concept has its own set of (weighted) typicality inclusions, and an associated preference relation . This allows a preference relation to be associated to each category (e.g., in the preferential interpretation of SOMs) or neuron (in a deep network). Related semantics with multiple preferences have been proposed, starting from Brewka’s framework of basic preference descriptions [7], based on different approaches: in system ARS, as a refinement of System Z by Kern-Isberner and Ritterskamp [46], using techniques for handling preference fusion; in (an extension of with typicality) by Gil [22]; in a refinement of rational closure by Gliozzi [32]; by associating multiple preferences to roles by Britz and Varzinczak [11, 9]; in ranked knowledge bases by Giordano and Theseider Dupré [29]; in the first-order logic setting by Delgrande and Rantsaudis [19]; and in the MP-closure [24].

For Multilayer Perceptrons, the logical semantics is based on the representation of a deep neural network as a conditional knowledge base, where conditional implications are associated to synaptic connections. That a conditional logic, belonging to a family of logics which are normally used for hypothetical and counterfactual reasoning, for common sense reasoning, and for reasoning with exceptions, can be used for capturing reasoning in a deep neural network model is rather surprising. It suggests that slow thinking and fast thinking [39] might be more related than expected.

Opening the black-box and recognizing that multilayer perceptrons can be seen as a set of conditionals, can be exploited as a possible basis for an integrated use of symbolic reasoning and neural networks (at least for this neural network model). While a neural network, once trained, is able and fast in classifying the new stimuli (that is, it is able to do instance checking), all other reasoning services such as satisfiability, entailment and model-checking are missing. These capabilities would be needed for dealing with tasks combining empirical and symbolic knowledge, such as, for instance: to prove whether the network satisfies some (strict or conditional) properties; to learn the weights of a conditional knowledge base from empirical data and use it for inference; to combine defeasible inclusions extracted from a neural network with other defeasible or strict inclusions for inference.

To make these tasks possible, the development of proof methods for such logics is a preliminary step. In the two-valued case multipreference entailment is decidable for weighted knowledge bases [31], and proof methods for reasoning with weighted conditional knowledge bases in could, for instance, exploit Answer Set Programming (ASP) encodings of the concept-wise multipreference semantics, an approach already considered [29] to achieve defeasible reasoning from ranked knowledge bases in asprin [8]. In the fuzzy case, an open problem is whether the notion of fuzzy-multipreference entailment is decidable (even for the small fragment of without roles), and under which choice of fuzzy logic combination functions. Undecidability results for fuzzy description logics with general inclusion axioms [3, 14, 6] motivate the investigation of decidable approximations of fuzzy-multipreference entailment.

An interesting issue is whether the mapping of deep neural networks to weighted conditional knowledge bases can be extended to more complex neural network models, such as Graph neural networks [50], or whether different logical formalisms and semantics would be needed.

Another issue is whether the fuzzy-preferential interpretation of neural networks can be related with the probabilistic interpretation of neural networks based on statistical AI. This is an interesting issue, as the fuzzy DL interpretations we have considered, where concepts are regarded as fuzzy sets, also suggest a probabilistic account based on Zadeh’s probability of fuzzy events [69]. We refer to [28] for some results concerning a probabilistic interpretation of SOMs and to [30]

for a preliminary account for MLPs. A methodology for commonsense reasoning based on probabilistic conditional knowledge under the principle of maximum entropy (MaxEnt) has been developed by Kern-Isberner

[42] starting from the propositional case. Wilhelm et al. [67] have recently shown how to calculate MaxEnt distributions in a first-order setting by using typed model counting and condensed iterative scaling, and have explored the connection to Markov Logic Networks for drawing inferences. A description logic with probabilistic conditionals has also been proposed [66] based on this methodology.

References

  • [1] A. Adadi and M. Berrada.

    Peeking inside the black-box: A survey on explainable artificial intelligence (XAI).

    IEEE Access, 6:52138–52160, 2018.
  • [2] A. Barredo Arrieta, N. Díaz Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion, 58:82–115, 2020.
  • [3] F. Baader and R. Peñaloza. Are fuzzy description logics with general concept inclusion axioms decidable? In FUZZ-IEEE 2011, IEEE International Conference on Fuzzy Systems, Taipei, Taiwan, 27-30 June, 2011, Proceedings, pages 1735–1742. IEEE, 2011.
  • [4] Christoph Beierle, Tobias Falke, Steven Kutsch, and Gabriele Kern-Isberner. System Z: Default reasoning with system z-like ranking functions for unary first-order conditional knowledge bases. Int. J. Approx. Reason., 90:120–143, 2017.
  • [5] F. Bobillo and U. Straccia. The fuzzy ontology reasoner fuzzydl. Knowl. Based Syst., 95:12–34, 2016.
  • [6] S. Borgwardt and R. Peñaloza. Undecidability of fuzzy description logics. In Gerhard Brewka, Thomas Eiter, and Sheila A. McIlraith, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Thirteenth International Conference, KR 2012, Rome, Italy, June 10-14, 2012. AAAI Press, 2012.
  • [7] G. Brewka. A rank based description language for qualitative preferences. In Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI’2004, Valencia, Spain, August 22-27, 2004, pages 303–307, 2004.
  • [8] G. Brewka, J. P. Delgrande, J. Romero, and T. Schaub. asprin: Customizing answer set preferences without a headache. In Proc. AAAI 2015, pages 1467–1474, 2015.
  • [9] A. Britz and I. Varzinczak. Contextual rational closure for defeasible ALC (extended abstract). In Proc. 32nd International Workshop on Description Logics, Oslo, Norway, June 18-21, 2019, 2019.
  • [10] K. Britz, J. Heidema, and T. Meyer. Semantic preferential subsumption. In G. Brewka and J. Lang, editors, KR 2008, pages 476–484, Sidney, Australia, September 2008. AAAI Press.
  • [11] K. Britz and I J. Varzinczak. Rationality and context in defeasible subsumption. In Proc. 10th Int. Symp. on Found. of Information and Knowledge Systems, FoIKS 2018, Budapest, May 14-18, 2018, pages 114–132, 2018.
  • [12] G. Casini, T. Meyer, I. J. Varzinczak, , and K. Moodley. Nonmonotonic Reasoning in Description Logics: Rational Closure for the ABox. In DL 2013, volume 1014 of CEUR Workshop Proceedings, pages 600–615, 2013.
  • [13] G. Casini and U. Straccia. Rational Closure for Defeasible Description Logics. In T. Janhunen and I. Niemelä, editors, JELIA 2010, volume 6341 of LNCS, pages 77–90, Helsinki, Sept. 2010. Springer.
  • [14] M. Cerami and U. Straccia. On the undecidability of fuzzy description logics with gcis with lukasiewicz t-norm. CoRR, abs/1107.4212, 2011.
  • [15] A. S. d’Avila Garcez, K. Broda, and D. M. Gabbay. Symbolic knowledge extraction from trained neural networks: A sound approach. Artif. Intell., 125(1-2):155–207, 2001.
  • [16] A. S. d’Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran.

    Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning.

    FLAP, 6(4):611–632, 2019.
  • [17] A. S. d’Avila Garcez, L. C. Lamb, and D. M. Gabbay. Neural-Symbolic Cognitive Reasoning. Cognitive Technologies. Springer, 2009.
  • [18] A. S. d’Avila Garcez and G. Zaverucha. The connectionist inductive learning and logic programming system. Appl. Intell., 11(1):59–77, 1999.
  • [19] J. Delgrande and C. Rantsoudis. A preference-based approach for representing defaults in first-order logic. In Proc. 18th Int. Workshop on Non-Monotonic Reasoning, NMR2020, September 12th - 14th, 2020.
  • [20] P. Gardenförs. Knowledge in Flux. MIT Press, 1988.
  • [21] P. Gardenfors and H. Rott. Belief revision. Handbook of Logic in Artificial Intelligence and Logic Programming, volume 4, ed. by D. M. Gabbay, C. J. Hogger, and J. A. Robinson, 1995.
  • [22] Oliver Fernandez Gil. On the Non-Monotonic Description Logic ALC+T. CoRR, abs/1404.6566, 2014.
  • [23] L. Giordano. On the KLM properties of a fuzzy DL with Typicality. CoRR, abs/2106.00390, 2021. To appear in ECSQARU 2021.
  • [24] L. Giordano and V. Gliozzi. A reconstruction of multipreference closure. Artif. Intell., 290, 2021.
  • [25] L. Giordano, V. Gliozzi, N. Olivetti, and G. L. Pozzato. Preferential Description Logics. In LPAR 2007, volume 4790 of LNAI, pages 257–272, Yerevan, Armenia, October 2007. Springer.
  • [26] L. Giordano, V. Gliozzi, N. Olivetti, and G. L. Pozzato. Semantic characterization of rational closure: From propositional logic to description logics. Artif. Intell., 226:1–33, 2015.
  • [27] L. Giordano, V. Gliozzi, and D. Theseider Dupré. On a plausible concept-wise multipreference semantics and its relations with self-organising maps. In F. Calimeri, S. Perri, and E. Zumpano, editors, CILC 2020, Rende, Italy, October 13-15, 2020, volume 2710 of CEUR, pages 127–140, 2020.
  • [28] L. Giordano, V. Gliozzi, and D. Theseider Dupré. A conditional, a fuzzy and a probabilistic interpretation of self-organising maps. CoRR, abs/2103.06854, 2021.
  • [29] L. Giordano and D. Theseider Dupré. An ASP approach for reasoning in a concept-aware multipreferential lightweight DL. Theory and Practice of Logic programming, TPLP, 10(5):751–766, 2020.
  • [30] L. Giordano and D. Theseider Dupré. Weighted defeasible knowledge bases and a multipreference semantics for a deep neural network model. CoRR, abs/2012.13421, 2020.
  • [31] L. Giordano and D. Theseider Dupré. Weighted defeasible knowledge bases and a multipreference semantics for a deep neural network model. In Proc17th European Conf. on Logics in AI, JELIA 2021, May 17-20, volume 12678 of LNCS, pages 225–242. Springer, 2021.
  • [32] V. Gliozzi. Reasoning about multiple aspects in rational closure for DLs. In Proc. AI*IA 2016 - XVth International Conference of the Italian Association for Artificial Intelligence, Genova, Italy, November 29 - December 1, 2016, pages 392–405, 2016.
  • [33] V. Gliozzi and K. Plunkett. Grounding bayesian accounts of numerosity and variability effects in a similarity-based framework: the case of self-organising maps. Journal of Cognitive Psychology, 31(5–6), 2019.
  • [34] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1–93:42, 2019.
  • [35] S. Haykin. Neural Networks - A Comprehensive Foundation. Pearson, 1999.
  • [36] G. Hinton.

    Learning distributed representation of concepts.

    In Proceedings 8th Annual Conference of the Cognitive Science Society. Erlbaum, Hillsdale, NJ, 1986.
  • [37] P. Hitzler, S. Hölldobler, and A. Karel Seda. Logic programs and connectionist networks. J. Appl. Log., 2(3):245–272, 2004.
  • [38] P. Hohenecker and T. Lukasiewicz. Ontology reasoning with deep neural networks. J. Artif. Intell. Res., 68:503–540, 2020.
  • [39] D. Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux, New York, 2011.
  • [40] H. Katsuno and K. Sato. A unified view of consequence relation, belief revision and conditional logic. In IJCAI’91, pages 406–412, 1991.
  • [41] Hirofumi Katsuno and Alberto O. Mendelzon. A unified view of propositional knowledge base updates. In N. S. Sridharan, editor, Proceedings of the 11th International Joint Conference on Artificial Intelligence. Detroit, MI, USA, August 1989, pages 1413–1419. Morgan Kaufmann, 1989.
  • [42] G. Kern-Isberner. Characterizing the principle of minimum cross-entropy within a conditional-logical framework. Artif. Intell., 98(1-2):169–208, 1998.
  • [43] G. Kern-Isberner. Conditionals in Nonmonotonic Reasoning and Belief Revision - Considering Conditionals as Agents, volume 2087 of LNCS. Springer, 2001.
  • [44] G. Kern-Isberner. A thorough axiomatization of a principle of conditional preservation in belief revision. Ann. Math. Artif. Intell., 40(1-2):127–164, 2004.
  • [45] G. Kern-Isberner and C. Eichhorn. Structural inference from conditional knowledge bases. Stud Logica, 102(4):751–769, 2014.
  • [46] G. Kern-Isberner and M. Ritterskamp. Preference fusion for default reasoning beyond system Z.

    J. Autom. Reasoning

    , 45(1):3–19, 2010.
  • [47] T. Kohonen, M.R. Schroeder, and T.S. Huang, editors. Self-Organizing Maps, Third Edition. Springer Series in Information Sciences. Springer, 2001.
  • [48] Bart Kosko. Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence. Prentice Hall, 1992.
  • [49] S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44(1-2):167–207, 1990.
  • [50] L. C. Lamb, A. S. d’Avila Garcez, M. Gori, M. O. R. Prates, P. H. C. Avelar, and M. Y. Vardi. Graph neural networks meet neural-symbolic computing: A survey and perspective. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4877–4884. ijcai.org, 2020.
  • [51] D. Le-Phuoc, T. Eiter, and A. Le-Tuan. A scalable reasoning and learning approach for neural-symbolic stream fusion. In AAAI 2021, February 2-9, pages 4996–5005. AAAI Press, 2021.
  • [52] D. Lehmann and M. Magidor. What does a conditional knowledge base entail? Artificial Intelligence, 55(1):1–60, 1992.
  • [53] D. J. Lehmann. Another perspective on default reasoning. Ann. Math. Artif. Intell., 15(1):61–82, 1995.
  • [54] D. Lewis. Counterfactuals. Basil Blackwell Ltd, 1973.
  • [55] T. Lukasiewicz and U. Straccia. Managing uncertainty and vagueness in description logics for the semantic web. J. Web Semant., 6(4):291–308, 2008.
  • [56] T. Lukasiewicz and U. Straccia. Description logic programs under probabilistic uncertainty and fuzzy vagueness. Int. J. Approx. Reason., 50(6):837–853, 2009.
  • [57] R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt. Deepproblog: Neural probabilistic logic programming. In NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 3753–3763, 2018.
  • [58] D. Nute. Topics in conditional logic. Reidel, Dordrecht, 1980.
  • [59] J. Pearl. System Z: A natural ordering of defaults with tractable applications to nonmonotonic reasoning. In TARK’90, Pacific Grove, CA, USA, 1990, pages 121–135. Morgan Kaufmann.
  • [60] M. Pensel and A. Turhan. Reasoning in the defeasible description logic - computing standard inferences under rational and relevant semantics. Int. J. Approx. Reasoning, 103:28–70, 2018.
  • [61] G. Pinkas. Reasoning, nonmonotonicity and learning in connectionist networks that capture propositional knowledge. Artif. Intell., 77(2):203–247, 1995.
  • [62] L. Serafini and A. S. d’Avila Garcez. Learning and reasoning with logic tensor networks. In Proc. AI*IA 2016, Genova, Italy, November 29 - December 1, 2016, volume 10037 of LNCS, pages 334–348. Springer.
  • [63] U. Straccia. Towards a fuzzy description logic for the semantic web (preliminary report). In The Semantic Web: Research and Applications, Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29 - June 1, 2005, Proceedings, volume 3532 of Lecture Notes in Computer Science, pages 167–181. Springer, 2005.
  • [64] J. B. Tenenbaum and T. L. Griffiths.

    Generalization, similarity, and bayesian inference.

    Behavioral and Brain Sciences, 24:629–641, 2001.
  • [65] G. G. Towell and J. W. Shavlik. Knowledge-based artificial neural networks. Artif. Intell., 70(1-2):119–165, 1994.
  • [66] M. Wilhelm and G. Kern-Isberner. Maximum entropy calculations for the probabilistic description logic . In Description Logic, Theory Combination, and All That, LNAI 11560, pp. 588–609, 2019.
  • [67] M. Wilhelm, G. Kern-Isberner, M. Finthammer, and C. Beierle. Integrating typed model counting into first-order maximum entropy computations and the connection to markov logic networks. In Proc. 32-nd Int. Florida Artificial Intelligence Research Society Conference, Sarasota, Florida, USA, May 19-22 2019, pages 494–499. AAAI Press, 2019.
  • [68] Z. Yang, A. Ishay, and J. Lee. Neurasp: Embracing neural networks into answer set programming. In C. Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 1755–1762. ijcai.org, 2020.
  • [69] L. Zadeh. Probability measures of fuzzy events. J.Math.Anal.Appl, 23:421–427, 1968.