Learning Syllogism with Euler Neural-Networks

07/14/2020 ∙ by Tiansi Dong, et al. ∙ 0

Traditional neural networks represent everything as a vector, and are able to approximate a subset of logical reasoning to a certain degree. As basic logic relations are better represented by topological relations between regions, we propose a novel neural network that represents everything as a ball and is able to learn topological configuration as an Euler diagram. So comes the name Euler Neural-Network (ENN). The central vector of a ball is a vector that can inherit representation power of traditional neural network. ENN distinguishes four spatial statuses between balls, namely, being disconnected, being partially overlapped, being part of, being inverse part of. Within each status, ideal values are defined for efficient reasoning. A novel back-propagation algorithm with six Rectified Spatial Units (ReSU) can optimize an Euler diagram representing logical premises, from which logical conclusion can be deduced. In contrast to traditional neural network, ENN can precisely represent all 24 different structures of Syllogism. Two large datasets are created: one extracted from WordNet-3.0 covers all types of Syllogism reasoning, the other extracted all family relations from DBpedia. Experiment results approve the superior power of ENN in logical representation and reasoning. Datasets and source code are available upon request.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Learning [13] has solved a variety of difficult AI tasks, e.g., gaming [31], machine translation, object recognition, robotics [19]. Vectors are used by deep neural-networks to represent words, sentences, texts, images, videos, and are able to simulate a number of functions of the associative memory (System 1 of mind) [17], and approximate logical reasoning (System 2 of mind) [3]. On the other hand, regions are taken as primitive for commonsense spatial reasoning [6, 8, 9, 27, 40], also used for logical reasoning [33, 38] and cognitive modeling [32]. Using regions as inputs of neural-networks can date back to [22] in terms of

diameter-limited perceptrons

, and received continued interests (to increase the power of reasoning) in terms of Poincaré ball [25], sphere [20], N-ball [10], hyperbolic disks [34], boxes [28], or using vector plus a bounded distance [23]. However, current reasoning still can not allow logical forms to contain negation, and fails to reason with different structures of syllogism. Here, we propose a novel neural-network architecture, namely, Euler Neural-Network (ENN) that takes high dimensional ball as inputs and is able to learn topological configurations of balls as Euler diagram for reasoning.

Advantages of ENN are as follows: (1) it uses central vectors of balls to inherit latent features from traditional neural-networks; (2) it uses topological relations among balls to encode structures among balls; (3) it uses a map of spatial transition as an innate structure within the network; (4) objective functions are dynamically optimized by the neighborhood transition from the input relation to the target relation; (4) ideal values within topological relations are parameterized not only to realize efficient reasoning but also to optimize visualization. Two large datasets are created for the reasoning of syllogism, and the reasoning of family relations as an example of reasoning with part-whole relations [16]. In contrast to existing works, ENN can precisely represent all 24 styles of syllogism, and all family relations. Our experiments show that ENN reaches 100% accuracy in reasoning with syllogism only having three statements. In reasoning with family relation without gender information, the accuracy slightly decreases along with the number of statements. By utilizing pre-trained latent feature vectors, ENN is able to reasoning with family relations with gender information.

The rest of the paper is structured as follows: Section 2 surveys a number of related work. Section 3 proposes Euler Neural Network, including its architecture, dynamic loss functions, and relations to traditional neural networks. Section 4 presents our experiments in syllogism, and reasoning with family relations. Section 5 concludes the paper and lists some on-going researches.

2 Euler neural network

We propose a simple extension of classic neural-networks which promotes vectors into balls and uses topological transition map as its inner structure for spatial optimization. This enables the novel neural-network to learn ball configurations as Euler diagram for logical reasoning. So comes the name Euler Neural-Network (ENN), as illustrated in Figure 1. In ENN, an entity is represented as an dimensional vector and is interpreted as an dimensional ball with the central vector , and the length of the radius . We defined ball as an open space. That is, a point is inside ball , if and only if . ENN optimizes the relation between ball and ball to the target relation . The default value of can be a random choice between and , so that ENN will optimize the relation between two input balls to either to . This will result in the equal relation: and can be measured by the similarity between their central vectors: . That is, ENN is degraded into a traditional neural-network.

2.1 Spatial predicates

Figure 1: Euler Neural-Network having four spatial statuses, three neighborhood relations, and six Rectified Spatial Units (ReSU)

Given two balls and , we define as a spatial predicate that returns true, if and only if disconnects from . This can be measured by subtracting the sum of their radii from the distance between their central vectors.

We define as a spatial predicate that returns true, if and only if is partially overlapped with . This can be determined by checking whether the distance between their central vectors is greater than the difference between their radii, meanwhile less than the sum of their radii.

Ball is part of ball , or , if the distance between their central vectors plus the radius of is less than or equals to the radius of . The co-inside relation (or, the equal relation () [40, 7, 27, 9]) is included by both the relation and the relation.

The four spatial predicates are jointly exhaustive (it holds that ) and pairwise disjoint with one exception that . Each spatial predicate asserts a spatial status between two input balls. Transitions among neighborhood spatial statuses have been discussed in qualitative spatial reasoning, i.e., [27, 11, 9]. We adopt a lightweight topological transition map of open regions that only consists of three neighborhood relations: , , and , as illustrated in Figure 1.

2.2 Rectified spatial unit (ReSU)

Rectified activation units have shown better performance than sigmoid or hyperbolic tangent units [12, 24, 21]. Six Rectified Spatial Units (ReSU) are designed to regulate transformations between neighborhood spatial statuses. The ReSU for the transition from to is defined as

is greater than zero, if . Decreasing the value of will push the relation between and to the relation of being overlapped (). That is the ‘’ in . From the relation of being partially overlapped, the relations between two balls can be transformed into either being disconnected or being part of (including the inverse relation). We define three Rectified Spatial Units , , and as follows.

is greater than zero, if . Decreasing the value of will push the relation between and to the relation of being disconnected () between and .

is greater than zero, if . Decreasing the value of will push the relation between and to the relation that being part of () .

is greater than zero, if . Decreasing the value of will push the relation between and to the relation that being part of () .

The relation of being part of can be transformed into the relation of being partially overlap. We define as follows.

We define

2.3 Ideal spatial values

In normal back-propagation process, optimization process to transform from relation to relation will be stopped, when . This makes the disconnected relation between and indistinguishable from the partial overlapping relation between them. This kind of being almost overlapped relation is neither ideal for reasoning nor for visualization. In natural categories, such as color, line orientations, and numbers, people select a subset of members as “ideal types”[39] or “cognitive reference points”[29], such as multiples of 10 as ideal numbers, vertical, horizontal, and diagonal lines as ideal orientations. We define ideal distance values for the being disconnected relation as follows.

in which . We define the spatial function as the loss function for the training. Fix the radii of and , and let . We define ideal distances between the central points of two partially overlapped balls as follows.

in which . Figure 2(a) illustrates three ideal partial overlapping relations. is the transition status between and (). In one extreme case, let , that means that ball is tangential part of ball ; In another extreme case, let , that means that ball is exactly disconnected from ball . We define the spatial function as the loss function for the training.

Three reference relations of being partial overlapped
Three reference relations of being part of
Figure 2: ideal values in the spatial categories of being partially overlapped(a) and being part of(b)

Fix the radii of and , and let . We define ideal distances between the central points of two balls with the condition that one ball is part of the other as follows.

in which . Figure 2(b) illustrates three reference part of relations. If , that means ball is tangential part of ball ; If , that means two balls are concentric. We define the spatial function as the loss function for the training.

Ideal values are invariant, if ball rotates around the central point of ball . We define ideal rotation as ball rotates (Euler angle) in the space spanned by the and the axes around the central point of ball .

2.4 Learning Euler diagram

The input of an ENN consists of a sequence -dimensional balls and a table of target topological relations , parameters for ideal values , the total number of the ideal rotations , and the maximum number of iterations. The output of ENN is the sequence of balls with updated locations and sizes, so that the topological relations among them satisfy the relations defined in as much as possible. The global optimization procedure is illustrated in Algorithm 1.

Input: A sequence of -dimensional balls
Input: A table of target relations between the balls,
Input: ,
Output: Euler diagram for satisfying relations in as much as possible
1 ;;
2
3 while  do
4       for  do
5             for  do
6                   if  then
7                         in the transition map in Figure 1;
8                         for ,  do
9                               ;
10                               repeat
11                                     update using back-propagation to reduce the value of ;
12                                     if  then
13                                           ;
14                                           update , till the relation between and is ;
15                                     else
16                                                  ;
17                                                  for  do
18                                                         for  do
19                                                                for  do
20                                                                       ;
21                                                                       compute current global loss with respect to ;
22                                                                       if  then
23                                                                              ;
24                                                                      
25                                                               
26                                                        
27                                                 ; ;
28                                          
29                                   until 
30                            
31                     
32              
33       if  then
34               for  do
35                      normal back-propagation operation on to reduce , till does not decrease.
36              
37       ;
return
Algorithm 1 global optimization of Topological Transition

Algorithm 1 randomly initializes balls, and sorts them according to the number of degrees in the decreasing order. The two outer loops traverses all target relations in . To optimize the relation between two balls to the target relation, ENN firstly finds the route to the target relation according to the transition map in Figure 1 (the length of a route is either 1 or 2). The optimization of the a route segment is a loop that starts with a normal back-propagation process [30]. If ends with , the target relation will further optimized into an ideal value (), otherwise current focused ball will rotate with an ideal value (), with which the global loss is the smallest among all possible rotated locations. From that rotate location, the loop continues the back-propagation. After having traversed , ENN computes the current global loss . If it is greater than 0, a normal back-propagation will be applied for every ball. After that, ENN continues the two outer loops to optimize relations in till either reaches 0 or the maximum iteration number is reached.

2.5 Representing all 24 structures of syllogism

Statements of syllogism consist of four types: (1) all are ; (2) some are ; (3) no are ; (4) some are not . Type (1) can be interpreted as ball is part of ball (); Type (2) can be interpreted as there is a ball inside of ball such that ball is part of ball (). This is equivalent to and also to ; Type (3) can be interpreted as ball disconnects from ball (); Type (4) can be interpreted as there is a ball inside of ball such that ball disconnects from ball (). This is equivalent to and also to . Table 1 lists the representations of all 24 different structures of syllogisms that can be precisely represented by ENN.

Num Name Premise Conclusion Spatial proposition for Euler diagrams
1 Barbara all are , all are all are
2 Barbari all are , all are some are
3 Celarent all are , no is no is
4 Cesare all are , no is no is
5 Calemes no is , all are no is
6 Camestres no is , all are no is
7 Darii some are , all are some are
8 Datisi some are , all are some are
9 Darapti all are , all are some are
10 Disamis all are , some are some are
11 Dimatis all are , some are some are
12 Baroco some are not , all is some are not
13 Cesaro all are , no is some are not
14 Celaront all are , no is some are not
15 Camestros no is , all are some are not
16 Calemos no is , all are some are not
17 Bocardo all are , some are not some are not
18 Bamalip all are , all are some are
19 Ferio some are , no is some are not
20 Festino some are , no is some are not
21 Ferison some are , no is some are not
22 Fresison some are , no is some are not
23 Felapton all are , no is some are not
24 Fesapo all are , no is some are not
Table 1: List of all syllogisms

2.6 Representing family relations

relation definition relation definition relation definition
Table 2: Representing basic family relations in ENN
formula definition formula definition
Table 3: Representing compound family relations in ENN

Representing and reasoning with family relations is one of the best examples to illustrate the power of neural networks [15]. It is also an example to show spatial thinking (in terms of diagrammatic representation and reasoning) can be applied for non-spatial thinking [37]. We use a ball to represent a family member. The central vector of a ball encodes its latent feature (including gender information); topological relations between balls structure family relations. The lower limit ball satisfying with , written as ‘’, is understood as is the smallest ball that satisfies with , formally, ‘’. The upper limit ball satisfying , written as ‘’, is understood as is the largest ball that satisfies , formally, ‘’. Given person , person being his/her child can be represented as ball is an upper limit ball inside ball , written as . We use and to represent female and male, respectively. Person being the mother of person is written as ‘’ or for short ‘’. We introduce an ethnic axiom that siblings should not be married and become spouse. ENN is able to precisely represent all family relations in English. Table 2-3 lists a number of representative family relations, other relations are defined in the similar manner.

Ethnic Axiom.

3 Experiments

For all datasets, we set the dimension of balls as 2 or 3. The ideal spatial values , , and are set as 3, 3, 3 and 72, respectively. The maximum number of iterations

is 1000. We leverage the stochastic gradient descent 

[4] to optimize the spatial relations between balls according to Algorithm 1

, and the learning rate is chosen as 0.005. We implemented ENN in PyTorch. All experiments were conducted on a personal workstation with an Intel(R) Xeon(R) E5-2640 2.40GHz CPU, and 256 GB memory.

3.1 Learning syllogism

We group 24 syllogism structures into 14 groups. Syllogism structures in the same group can be tested by the same dataset. For each group, we created 500 test cases by extracting hypernym relations of WordNet3.0. A test case consists of 2 assertions as premises, 1 true conclusion, and 1 false conclusion, totaling 14,000 assertions for training, and 7,000 true testing assertions and 7,000 false testing assertions. As shown in Figure 3, ENN achieves the superior accuracy in reasoning with a variety of syllogism structures, and demonstrates great potential, in contrast to traditional neural networks, in reasoning with complex knowledge.

Figure 3:

After 60 epochs of training, reasoning accuracies of Syllogism reasoning in structures of Barbara, Barbari, Celarent_Cesare, Calemes_Camestres reach to 99.6%, 99.7%, 99.8%, 99.8%, respectively; accuracies of other Syllogism structures reach to 100.0%

3.2 Learning Family Relations

We extracted all Triples of basic family relations (spouse, child, and parent) from DBpedia for training, and created complex family relations without gender information according to Table 2-3 for testing. We group training Triples into family groups. Two persons are in the same family group, if there are a chain of basic family relations between them. Family groups are sorted by the number of family members. We ignore sorts under which there are less than 5 family groups. The statistics of the dataset, after cleaning, is listed in Table 4.

#Member 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#Family 1,899 1,060 591 344 194 121 69 62 42 28 14 18 8 8 6
#Triple 3,803 3,193 2,402 1,746 1,183 876 585 573 438 321 178 251 119 125 98
#True_A 2,595 2,937 2,577 2,165 1,626 1,295 942 912 745 505 308 569 255 208 259
#False_A 2,595 2,944 2,630 2,202 1,649 1,351 1,023 940 772 527 326 603 265 219 265
Table 4: Datasets extracted from DBpedia for reasoning with family relations (#Member: number of family members; #Family: number of family groups having #Member members; #Triple: number of basic relation triples; #True_A: number of true assertions; #False_A: number of false assertions)

Experiment results show that for training sets only consists of three persons, the reasoning turns out to be Syllogism, ENN reaches almost 100% precision and recall. The performance decreases, as the number of family members increases, as illustrated in Figure 

4. Most errors are resulted from two reasons: (1) the loss of training process fails to reach the global minimum 0 within the maximum number of iteration; (2) there are family members in the dataset that violated the ethnic axiom.

Figure 4: From simple relation with a simple family, ENN can infer cross-family relations

4 Related work

Regions, e.g. Venn diagram or Euler diagram, have been used to represent logical reasoning [38, 33]

, and can be embedded by representation learning. For example, words or entities are embedded as multi-modal Gaussian distribution

[1, 14] or as manifolds [41]; nested regions are embedded by Poincaré balls to encode tree structures [25]; Spheres are used to embed concepts to capture subordinate relations among instances and concepts [20]. Intersection or union among high-dimensional boxes are implemented to approximate a subset of logical queries [28]. Hyperbolic disks are trained to embed directed acyclic graphs (DAGs)[34]. Relations between regions, including distance and orientation, can be logically formalized by taking the connection relation as primitive and calculated [8, 40, 6, 27, 32, 9]. The connection relation is valued in the research of cognitive science in the sense that the contact relation [5], or the topological relation [26], is the first relation distinguished by human babies. Under uncertain or incomplete situations, reasoning will turn out to be similarity judgments [35, 36], which can be approximated by similarity between vectors111vectors can be understood as regions of the smallest size [15].

5 Conclusions and Outlooks

One major challenge for neural reasoning is to reach the symbolic level of analysis [2]. Recent studies suggest that pre-trained neural language models have a long way to go to adequately learn human-like factual knowledge [18]. In this paper, we loose the tie of neural model to vector representations, and propose a novel neural architecture ENN that takes high-dimensional balls as input. We show that topological relations among balls is able to spatialize semantics of symbolic logic. ENNs can precisely represent human-like factual knowledge, such as all 24 different structures of Syllogism, and complex family relations. Our experiments show that the novel global optimazation algorithm pushes the reasoning ability of ENN to the level of symbolic syllogism. In ENN, the central vector of a ball is able to inherit the representation power of traditional neural-networks. Jointly training ENN with unstructured and structured data is our on-going research.

References

  • [1] B. Athiwaratkun and A. Wilson (2017) Multimodal word distributions. In ACL’17, pp. 1645–1656. Cited by: §4.
  • [2] W. Bechtel and A. Abrahamsen (2002) Connectionism and the mind: parallel processing, dynamics, and evolution in networks. Graphicraft Ltd, Hong Kong. Cited by: §5.
  • [3] Y. Bengio (2019) From system 1 deep learning to system 2 deep learning. In NeurIPS, Cited by: §1.
  • [4] L. Bottou (2010)

    Large-scale machine learning with stochastic gradient descent

    .
    In Proceedings of COMPSTAT’2010, pp. 177–186. Cited by: §3.
  • [5] S. Carey (2009) The Origin of Concepts. Oxford University Press. Cited by: §4.
  • [6] B. L. Clarke (1981) A calculus of individuals based on ‘connection’. Notre Dame Journal of Formal Logic 23 (3), pp. 204–218. Cited by: §1, §4.
  • [7] B. L. Clarke (1985) Individuals and points. Notre Dame Journal of Formal Logic 26 (1), pp. 61–75. Cited by: §2.1.
  • [8] T. de Laguna (1922) Point, line and surface as sets of solids. The Journal of Philosophy 19, pp. 449–461. Cited by: §1, §4.
  • [9] T. Dong (2008) A Comment on RCC: from RCC to RCC. Journal of Philosophical Logic 37 (4), pp. 319–352. Cited by: §1, §2.1,