Gödel's Sentence Is An Adversarial Example But Unsolvable

02/25/2020 ∙ by Xiaodong Qi, et al. ∙ Huazhong University of Science u0026 Technology 0

In recent years, different types of adversarial examples from different fields have emerged endlessly, including purely natural ones without perturbations. A variety of defenses are proposed and then broken quickly. Two fundamental questions need to be asked: What's the reason for the existence of adversarial examples and are adversarial examples unsolvable? In this paper, we will show the reason for the existence of adversarial examples is there are non-isomorphic natural explanations that can all explain data set. Specifically, for two natural explanations of being true and provable, Gödel's sentence is an adversarial example but ineliminable. It can't be solved by the re-accumulation of data set or the re-improvement of learning algorithm. Finally, from the perspective of computability, we will prove the incomputability for adversarial examples, which are unrecognizable.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduce

Adversarial examples have attracted significant attention in recent years. They have emerged in image(Goodfellow et al., 2014; Engstrom et al., 2017), audio(Carlini and Wagner, 2018; Taori et al., 2019; Qin et al., 2019), text classification(Samanta and Mehta, 2017; Lei et al., 2019) and NLP (Alzantot et al., 2018; Niven and Kao, 2019; Zhang et al., 2019; Eger et al., 2019). The reason for their existence remains unclear. Previous work has proposed different answers from the perspective of data set and learning algorithm, including the data set is not big enough(Schmidt et al., 2018), not good enough(Ford et al., 2019) or not random enough(Weiguang Ding et al., 2019), and the learning algorithm is not complex enough(Bubeck et al., 2018; Nakkiran, 2019) or not robust enough(Xiao et al., 2018; Stutz et al., 2019). Moreover, in Shafahi et al. (2018), it’s shown that for certain classes of problems (on the sphere or cube), adversarial examples are inescapable. In this paper, for a simple binary classification problem, we will propose an adversarial example—Gödel’s sentence, which is not caused by data set or learning algorithm. In our example, for any finite or any decidable infinite data set, for any learning algorithm, Gödel’s sentence will always be there as an adversarial example and ineliminable. It can’t be solved by the re-accumulation of data set or the re-improvement of learning algorithm.

First of all, what are adversarial examples? The definition of adversarial example now is almost all based on perturbations. For instance, in Shafahi et al. (2018), a formal definition of an adversarial example is as follows.

Consider a point drawn from class ; a scalar ; and a metric . We say that admits an -adversarial example in the metric if there exists a point with ; and . (

is a “classifier” function.)

However, adversarial examples are not limit to the type of perturbations, like Figure 1.a & 1.b(Goodfellow et al., 2014; Thys et al., 2019). Recent work has shown there are many purely natural adversarial examples without any perturbations, like Figure 1.c(Hendrycks et al., 2019). It’s inappropriate to understand the latter as the normal wrong outputs and only the former as the adversarial examples. Otherwise we will fall into an infinite regress to argue which special type of wrong output is an adversarial example and which is not, because any insular definition will be so difficult to reconcile with the facts observed in practice. Therefore, it’s necessary to redefine adversarial example beyond perturbations, which is also important for us to apprehend the reason for their existence, and their unsolvability in our example.

(a) Perturbations in digital      world. (b) Perturbations in real world. (c) Without perturbations,      natural example.
Figure 1: Three different types of adversarial examples (Goodfellow et al., 2014; Thys et al., 2019; Hendrycks et al., 2019)

Second, are adversarial examples unsolvable? In this paper, the specific adversarial example, Gödel’s sentence, is ineliminable. It can’t be solved by re-accumulation of data set. Adversarial training(Goodfellow et al., 2014; Wu et al., 2017)

can’t be helpful because of the ineradicable incompleteness. It can’t be solved by re-improvement of learning algorithm, either. Each learning algorithm must be equivalent to a Turing machine, no matter it’s based on logistic regression, SVM or neural network. However, changing one Turing machine to another Turing machine won’t be helpful, because no Turing machine can learn something from nothing.

Finally, before we redefine adversarial example and study their unsolvability, some essential issues about them in machine learning need to be clarified. Especially, what is the ground truth, what is to be learned, what can be expressed by data set, and what is expected to express by us through data set? Maybe the answers to these issues are so self-evident that they are ignored for a long time. Actually, none of these answers is clear, although they are thought self-evident. However, Russell’s paradox always reminds us that danger is always in the unclarity. Once these issues are clarified, the reason for the existence of adversarial examples will be clear. There are three main contributions in this paper.

  1. We will point out the reason for the existence of adversarial examples is there are non-isomorphic natural explanations that can all explain data set.

  2. We will show that Gödel’s sentence is an adversarial example but ineliminable.

  3. We will prove the incomputability for adversarial examples, which are unrecognizable.

2 Related Work

Adversarial examples are first demonstrated in Szegedy et al. (2013) and Biggio et al. (2013) in digital world. They can be generated by adding pixel-level changes to a normal image(Goodfellow et al., 2014; Papernot et al., 2017) or doing simple rotation and translation to a normal image(Engstrom et al., 2017; Alcorn et al., 2019). They also exist in real world. By placing a few stickers in real world, a physical stop sign is recognized as a speed limit 45 sign(Eykholt et al., 2018), images are misclassified(Liu et al., 2019), and Autopilot goes to the wrong lane, because it identifies stickers as road traffic markings(Ackerman, ). By wearing a adversarial patch in real world, the person is difficult to be identified(Thys et al., 2019; Komkov and Petiushko, 2019). There are also adversarial examples of audio in real world(Li et al., 2019). Excluding the perturbations in digital world (pixel-level changes, rotation and translation) and in real world (stickers and patches), recent work has shown there have been plenty of purely natural adversarial examples in real world without any perturbations (Hendrycks et al., 2019), like Figure 1.c.

Many solutions for adversarial examples have been proposed, like adversarial training(Goodfellow et al., 2014; Wu et al., 2017), network distillation(Papernot et al., 2016), classifier robustifying(Abbasi and Gagné, 2017), adversarial detecting(Lu et al., 2017), input reconstruction(Gu and Rigazio, 2014) and network verification(Katz et al., 2017). However, almost all solutions are shown to be effective only for part of adversarial attacks (Yuan et al., 2019).

Recent work has also made encouraging progress to clarify some essential issues about adversarial examples. What can be expressed by data set? More technically, what is the meaning of label in data set? The extraordinary work in Ghorbani et al. (2019) shows the difference between “” and basketball, the former is a meaningless label in data set, and the latter is a word meaning a sphere or a sport. It’s shown that the most important concept for predicting “” images is the players’ jerseys rather than the ball itself. Moreover, even if there are only the basketball jerseys without ball itself in a image, it can still be classified as the class “”. What is to be learned? The remarkable work in Ilyas et al. (2019) shows that what are learned by machine can be non-robust features, but what human can understand and will want machine to learn are robust features, however they are both in data set, and adversarial example is a natural consequence of the presence of highly predictive but non-robust features in standard data sets.

3 Meaning of Label

To interpret what can be expressed by data set and reveal that it’s quite different from what is expected to express by us through data set, we need to figure out what the label in data set means, which is crucial for us to understand why there can be different explanations that can all explain data set. Here is a binary classification problem for cats and dogs.

3.1 Where Are Cats?

The data set is in Figure 2. If we were native English speakers, we would adopt the label of type1. And, where are cats?

Label Label
Image-I type1 type2 type3 type4 Image-II type1 type2 type3 type4
Figure 2: Data set of cats and dogs
For Human For Machine
Fact. In Image-I, there are images of animals who can meow and catch mice. Fact. In Image-I, there are some legal png files.
Fact. The label of left type1 is “”, a word which means the animal who can meow and catch mice. Fact. The label of left type1 is “”, a symbolic string which is meaningless.
Fact. Combining the Image-I and the label of left type1, it means that the animals in Image-I are all cats, the animals who can meow and catch mice. Fact. Combining the Image-I and the label of left type1, it means that the images in Image-I are all of the same type, denoted as “”.
Table 1: Understandings of data set for human and machine are different

For machine, there is no cat, the animal who can meow and catch mice; there is only “”, the meaningless symbolic string (Table 1). There are more reasons (see Figure 2). In French, there is no cat but . When programming, there is no cat but . In a planet in the Andromeda Galaxy with the isomorphic civilization of human’s, there is no cat but . The only meaning of a label is to show that those with the same label belong to the same class, and those with different labels belong to different classes. As for the symbolic representation of a label, it doesn’t matter.

3.2 Can Smart Enough Algorithm Find Cats?

To answer this question, we need to detail the current machine learning paradigm. All we have are the data set and learning algorithm. The data set is divided into two (or three) parts, one of them is used to train by learning algorithm, another one of them is used to test. However, there is no cat in the data set, and nobody can learn something from nothing. The good performance on test set means it performs well to distinguish “” from “”. But the “” and “” is not the same thing as the cat and dog we know, the former can meow and the latter can bark. For instance, as a well-known example of algorithmic bias, Google identifies black people as gorillas (Crawford, 2016). However, the truth is that, Google identifies black people as “”, the meaningless symbolic string, and we interpret the “” as gorillas we know. One might wonder, how could “” not be interpret as gorillas we know? However, we are not playing with words, it’s just that some basic concepts in machine learning have not been established.

4 Reason for Existence of Adversarial Examples

The symbolic representation of a label doesn’t matter, but the relationship between labels matters. For example, the meaning of “” and “” can’t be expressed by the data set in Figure 2, but the following three points can be expressed by the data set: (1) the images in Image-I are of the same type; (2) the images in Image-II are of the same type; (3) these two types are not the same. We need to depict the partial isomorphism in a more precise way, then define adversarial example.

For any field , let its universal set be , for any , any function is called an explanation of field .

Let be an explanation of field , , , the surjection satisfying that is called the explanation of limited to .

Let and be explanations of field , let , if there is a bijection between and that , it’s called that the explanations and are isomorphic on . If the explanations and are isomorphic on , it’s simply called that the explanations and are isomorphic.

Let be an explanation of field and , is denoted as , if is an explanation of field , and the explanations and are isomorphic on , it’s called explains , also called is an explanation on .

Let be an explanation of field , , are the enumeration of all the non-isomorphic natural explanations on . If there is a set , satisfying that , and for any , and are isomorphic on , and for any , there are to make that and are not isomorphic on , is called a generalization set of .

Let be an explanation of field , , if is the generalization set of , , is called the adversarial set of .

Name Symbol Description
Object Set the universal set of field .
Known Set , the set known to us.
Unknown Set the set unknown to us.
Training Set ,
Verification Set ,
Test Set .
Data Set without/with explanatory information.
Generalization Set .
Adversarial Set .
Table 2: Definition of sets Training set, verification set and test set are the subsets of known set. Known set is a subset of object set. Generalization set and adversarial set depend entirely on known set (with explanatory information) and have nothing to do with learning algorithms. What is known to us is the known set, but what people pay attention to is always the object set.

object set

generalization set

known set

adversarial set

unknown set

training set
verification set
test set

Figure 3: Relationship between sets Known set generalization set object set. To keep expanding known set, generalization set will inevitably expand together, but it doesn’t mean adversarial set will necessarily be empty gradually, like the example in Section 5.2.

For explanations, they are further divided into natural explanations and unnatural explanations. The former are the explanations that conform to human cognition of reality, and the latter are the explanations that don’t. There are always many unnatural explanations that can explain the data set, but they are not convincing. What we concern are always natural explanations, but there may be non-isomorphic natural explanations that can all explain the data set. For example, the classical mechanics (from Newton) and special relativity (from Einstein), the Euclidean and Lobachevskian geometry, the ZF and ZFC in axiomatic set theory, they are all non-isomorphic natural explanations. They can be isomorphic respectively on the the ground truth we have observed in earth before 1887, the absolute geometry, and . Based on the natural explanations, the generalization set and adversarial set can be defined (Table 2, Figure 3). The adversarial example can be defined in this way.

For any , , if , is an adversarial example of .

The reason for the existence of adversarial examples is that there are non-isomorphic natural explanations that can all explain data set. Any in adversarial set is an adversarial example, because it’s unknown which explanation is the desired one, but the outputs of non-isomorphic natural explanations (on the data set) are different, so any output can be wrong. All in our mind is the desired explanation we want to express, and according to this explanation, we generate lots of data and make up the data set. However, we may not notice that there is an undesired natural explanation on the data set. If the outcome learned by the learning algorithm happens to be the undesired one, it can perform perfectly on the known set , including the training set and test set , but the output for the adversarial example will be considered to be wrong by us.

5 Incompleteness and Gödel’s Sentence

In this section, we are concerning about two very specific natural explanations, being true and being provable. Meanwhile, we will show there are indeed two non-isomorphic natural explanations on a million-scale data set. Moreover, the two natural explanations are not insubstantial like castle in the air. They are quite comprehensible and can both be implemented by concrete and simple algorithms. We will show whether being true and provable are isomorphic depends on the completeness theorem. For simplicity, let’s start with propositional logic.

5.1 Peirce’s Law

Here is a data set for a binary classification problem in Table 3, and the full data set can be found here111https://github.com/IcePulverizer/Hx, where there are millions of formulas for each class. It’s easy to see the formulas in class are all tautologies, and formulas in class are all contradictions. We can tell the readers that this assertion is still true for the full data set. However, after machine learning based on this data set, which class should the Peirce’s law be classified into?

Peirce’s law: .

Class Class
Table 3: Data set of formulas
Table 4: true Table 5: provable

It depends on how to explain class and . There are at least two natural explanations. The apparent one is being true, which can be explained by the two-value interpretation structure in Table 5.1, and the formulas in class are --. Another one is being provable in , which can explained by the three-value interpretation structure in Table 5.1, and the formulas in class are --. is an axiomatic system in formal language . The axioms and rules of inference of are in Axiomatic System 5.1. The Peirce’s law is true, but unprovable in . According to , the Peirce’s law should be classified into class because it’s true. However, according to , the Peirce’s law should be classified into class because it’s unprovable in .

Axiomatic System

.

The axioms of :

The rules of inference of :

  • (modus ponens): can be obtained from and .

  • (substitution): can be obtained from , where is a finite substitution.

[Soundness] In , if is provable, is a tautology.

[Incompleteness] In , if is a tautology but not a --, is unprovable.

By this instance, vivid answers are given to two questions that what is the ground truth and what is to be learned. They are both subjective and rely on which explanation is desired by us, and the data set avails nothing. Furthermore, as the builders of the data set in Table 3, even if we admit what we want to express by data set is not the formula that is a tautology or contradiction but the -formula that is provable or unprovable in , does this admission matter? We can’t deny that even though the data set is generated by us according to whether a -formula is provable or unprovable in by , it can also be explained by whether a formula is a tautology or contradiction by . The idea of the builders of the data set doesn’t matter at all, which is important for us to understand adversarial examples.

In our data set, the Peirce’s law is an adversarial example, because there are two non-isomorphic natural explanations that can both explain the data set. One of them is being true by . Another one is being provable (in ) by . They two are non-isomorphic because the completeness theorem doesn’t hold in . The soundness theorem guarantees that anything provable must be true (Theorem 3). The completeness theorem guarantees that anything true must be provable. Hence, whether being true and provable are isomorphic depends on the soundness and completeness theorem. Since no one can stand an unsound axiomatic system, we only need to concern about the completeness. However, is incomplete (Theorem 3). The Peirce’s law is true, but unprovable in .

5.2 Gödel’s Sentence

[Gödel’s Incompleteness] Let be an axiomatic system, if is consistent, there is a sentence so that and are both unprovable in T.

Beyond propositional logic, the Gödel’s sentence , in any consistent first-order axiomatic system that is sufficient to contain all axioms of (first-order Peano arithmetic axiomatic system), is also an adversarial example. Gödel’s sentence is constructed by Gödel (1931), then improved by Rosser (1936), it’s true but unprovable inside system. Theorem 5.2 is famous as Gödel’s first incomplete theorem. Gödel’s sentence fits our definition of adversarial example, perfectly. First, no one can deny that being true and provable are two natural explanations. Second, for any consistent existing data set, there must be an axiomatic system to guarantee being true and provable are isomorphic on this data set. Third, there will be the Gödel’s sentence that is true but unprovable.

Gödel’s sentence has a very good property, to interpret the unsolvability of it as an adversarial example. According to Figure 3, the direct way to eliminate or alleviate adversarial examples is to shrink the adversarial set . Therefore, we only need to expend the data set/known set . More technically, this solution is nothing more than adding adversarial examples, with the labels according to the desired explanation, to the data set, then training again with this new data set, known as adversarial training. We may take for granted that adversarial examples may be solved in this way, because at least these old adversarial examples should have been solved. However, this is just a misty imagination, and demon is always in the mistiness.

Gödel’s sentence is an adversarial example, but Gödel’s sentence is ineliminable. In any consistent system , there is a Gödel’s sentence , which is true but unprovable in , so is incomplete. If is added to as a axiom, to form a new system , of course will be both true and provable in . However, in , there will be a new Gödel’s sentence , which is true but unprovable in , so is still incomplete, which is guaranteed by Gödel’s first incomplete theorem. This process can be repeated as any finite times as you like, but the last system you get is still incomplete, for example, the Gödel’s sentence of the last system is still true but unprovable. The incomplete theorem will always hold, so the adversarial example is unsolvable, because the adversarial set won’t be empty forever.

6 Incomputability for Adversarial Examples

The adversarial example above is unsolvable, because there are always at least two non-isomorphic natural explanations that can explain the data set. However, that the adversarial set can’t be empty doesn’t mean what the learning algorithm has learned must be the undesired explanation. What if the learning algorithm happens to learn the desired one? First, Turing does not believe this telepathy deserves attention (Turing, 2009), and neither do we. Second, in Section 5.1, we have shown that the idea of the builders of the data set doesn’t matter at all. Third, taking all information into account, like the details of learning algorithm, the details of data set and the performance on training set, verification set or test set, nothing can be concluded about it. Finally, even if this lottery thing is to happen, we can’t know it effectively that we are to win the lottery. Similarly, for any Turing machine and any input string, the Turing machine must halt or not halt on it, but we can’t know it effectively, which is famous as the halting problem of Turing machine, and it’s undecidable(Turing, 1937). We will prove that whether a learning algorithm can learn the desired one is unrecognizable (Theorem 6). The description of symbols is in Table 6 and dependence in in Theorem 6 is in Figure 4. Therefore, even if the outputs for all adversarial examples indeed conform to the explanation we desire, we can’t know it effectively by algorithm or computing.

Symbol Description
Turing machine.
input string.
Turing machine space, the set consisting of all Turing machines.
the set consisting of all strings, on which halts.
the output of on , .
the encoding string of objects .
Language is many-one reducible to language .
Table 6: Description of symbols

?

Data set

Learning Turing
machine—

Target Turing
machine—

Outcome Turing
machine—

2.As input

3.Output

1.Generate

4.

4.
Figure 4: Dependence in in Theorem 6

For any Turing machine and , and are input-output equivalent if and only if and , denoted as .

If and is unrecognizable, is unrecognizable.

[Halting problem] is undecidable and unrecognizable.

is unrecognizable.

Proof  We prove , so we need to prove :

= “For input , in which and is a string:

  1. Construct the following two machines and .

    = ‘For any input :

    1. Be circular.’

    = ‘For any input :

    1. Run on . If halts on , output and halt.’

  2. Output .”

is unrecognizable.

Proof  :

= “For input , in which :

  1. Construct the machine .

    = ‘For any input :

    1. Output and halt.’

  2. Output .”

7 Conclusion

In this paper, we show the reason for the existence of adversarial examples is there are non-isomorphic natural explanations that can all explain data set. Specifically, Gödel’s sentence is an adversarial example but ineliminable, because the two natural explanations of being true and provable are always non-isomorphic, which is guaranteed by Gödel’s first incomplete theorem. Therefore, it can’t be solved by the re-accumulation of data set or the re-improvement of learning algorithm. Any data set can’t eliminate the inherent incompleteness and any learning algorithm can’t distinguish which explanation is the desired one. Finally, we prove the incomputability for adversarial examples that whether a learning algorithm can learn the desired explanation is unrecognizable.


References