1. Motivation
The motivation of the current paper is two fold.
One of the authors of the current paper was using iterative machine learning to crack cipher code in late 1990s. While doing so he came to the astounding realisation that the resulting learned function is not convergent (definition A.3) at all. What is really meant is that : data points which were accepted upto iteration ( points in the set ) would fly away eventually after some more iteration at ( with ):
In other words none of the lim sup and lim inf exists and therefore :
In essence the iteration was showing converging and then sudden diverging behaviour. While he communicated this finding to one of pioneers in the image processing domain  he was simply baffled. In fact the trivia is  he simply remarked : “I have no idea  what to respond!”. While this behavior was never fully understood by the author then, recent theory by the same author(s) [11] seems to have an explanation to it.
A very recent (yet to be published) paper in arxiv [12] discusses interesting properties in the deep learning. It became immediately apparent to the authors that these two phenomenons are essentially connected.
2. The Nature of Learning Theory
Learning theory is really about classification. As Vapnik [1]
pointed it out  it is all about finding a classifier function
such as to isolate positive samples ( accept ) from negative ones ( reject ), aided by training set; which is a set of ordered pairs
with and . Given then else , with being the accept set.2.1. Learned Function as Indicator of Accept Set
We can say that to learn one needs to isolate the set using training data sets . We need to learn the function with :
Clearly then, the function we want to learn, is the indicator function [2] of the accept set , that is:
Then we can simply suggest that machine learning is all about finding the positive samples ( accept ) sets indicator function . This formulation has a problem, because we really do not know that whether or not is computable. The set of computable functions in this paper would be designated as . If we assume existence of a sequence of computable functions which converges [2] to .
This indeed is a very strong axiom  and in the very next section (3) we shall see that this almost surely not holding to be true.
2.2. Learning theory as Iterative Maps
We need to generate this sequence of functions in a computable way. That immediately gives :
the guiding equation of the learning. There has to be a fixed point computable iteration function such that it can take a computable function, returns another one, such that the orbit (definition A.5) eventually converges to “”.
In effect what we seek is :
with is the fixed point of the iteration (definition A.1). All ML and Deep Learning methods can be abstracted in this form, it is in effect a simple tautology.
NOTE however that the function iteration might not be able to reach because in general the set of functions , which would be discussed in the next section.
2.3. The Cut Function Approximation of
It is a decision that whether or not with when and when . Therefore, clearly partitions the input space . The question of mechanism of computable partitioning requires a definition.
Definition 2.1.
Binary Decision Tree.
A binary tree with each nodes having two decider Turing Machine (definition A.11) from a finite set of decider machines such that:
Given with is the input to the current node, let : and . If the input is rejected and the system is halted. If the input is accepted and the system is halted. If then is passed as input to the left child ; if no left child exists, reject input x and halt. If then is passed as input to the right child ; if no right child exists, reject input x and halt.
This structure (definition 2.1) partitions the input space into finite number of equivalent partitions such that:
with either or , but can not be both. We also note that if then . Clearly then is a disjoint union of sets:
3. Probability, Chaos and Learning Iterations
We demonstrate that the general iterative learning system will be ‘almost surely’ [6][7] chaotic [8] [11].
3.1. Properties of the The Learning Iteration
We need some definitions to formally present the ideas discussed before.
Definition 3.1.
Rational Mapping of a Computable Function.
Given an universal turing machine (definition A.13) any computable function can be encoded using the symbols in the tape of the machine. The rationalisation (definition A.16) of the tape then, serves as the rationalisation of the computable function.
where is the set of computable functions. This makes .
Definition 3.2.
The Computable Functional.
A computable function is called a functional iff the range of the function can be interpreted as an encoding of a computable function. That is :
Definition 3.3.
The Learning Iteration.
Given a computable functional , and input (training set) the learning iteration is defined as :
with . Here signifies the rationalisation (definition 3.1) of the .
Theorem 3.1.
Learning Iteration is Almost Surely Chaotic.
Learning iteration, as defined as definition (3.3) is almost surely chaotic.
Proof.
We note : , from (theorem A.1) the theorem on chaos on computation, that almost every rational sequence is chaotic, the result is immediate. ∎
Theorem 3.2.
Almost Sure NonComputability for converging function.
The iteration of definition (3.3), when converges, almost surely converges to an uncomputable function.
Proof.
This is trivial from Real analysis. We know that the sequence , so to complete the space (definition A.4 ) we need to have the embedding space . Clearly then, almost all limit points would be irrational ( actually transcendental ), with a cardinality equal to continuum which is a well known theorem from real analysis [2]. Irrational numbers take infinitely many symbols to encode, therefore, the function limit can not be encoded in a Turing machine tape, and hence, is obviously non computable.
This is precisely what we wanted to show. ∎
3.2. Properties of Self Similarity in Decision Tree
We start with a definition of self similarity.
Definition 3.4.
Self Similarity.
Theorem 3.3.
Infinite Binary Decision Trees have Self Similar Structure.
Proof.
We note that every node in the tree first does a transform of the input space using into ; both homeomorphic to . After that, this space is partitioned using which are finite in number as discussed in section (2). Then this individual partitions are homeomorphic to . Suppose defines the number of partitions. We can then say there are numbers of homeomorphic (on ) nonsurjective function available for each . This is due to lemma (A.1).
The set of such functions is finite because and is finite. Suppose then, the set of such composition is termed as . It is then trivial that when (at the root) then :
Therefore, infinite binary decision tree have self similar structure. ∎
In fact it is well known that Cantor Set [9][10][8] is a generalisation of this sort of structure ( a dyadic Tree ).
Theorem 3.4.
Convergent ML Function would have Self Similar Structure.
If an ML iteration (definition 3.3) converging to a function , then accept set of almost surely would have a self similar structure.
Proof.
Almost surely the function is uncomputable (theorem 3.2). We note that the due to the fixed size of the learner algorithm, the number of deciders the learning system can stays finite. And therefore, to be convergent the structure is to be extended to infinity : that is an infinite binary decision tree (definition 2.1). Now, using theorem (3.3), the result is immediate. ∎
4. Chaos And Machine Learning  A Summary
The theorems proven in the sections before can be useful to deduce very interesting properties of machine learning, which clearly showcases the problems arising from the chaotic nature of the Universal Computation.

The first phenomenon  that non converging ML process experienced by the coauthor (section 1) is clearly explained by the inherent chaotic (theorem 3.1) nature of the ML iteration. The functions, encoded in the rational space, almost surely has a chaotic orbit, and that is why output of function can drastically differ from , given same input. However, we very well know that although almost all numbers are normal and transcendental proving them so is a problem, such is the problem here. Proving that the specific behaviour is chaotic can be done only in case to case basis.

For the second phenomenon reported [12] : As stated clearly in [12] that same ML algorithm from a different subset of the original training set facing the same problem. This is happening because in this case ML iteration is actually converging, and generating a fractal partition of the input space (theorem 3.4). This is precisely what they found : input to output mapping mostly discontinuous . They also noted that for each input which gets accepted, there are dense (definition A.7) set of inputs which gets rejected. This is a standard property for fractal space [8].
So, to summarise when the iteration converges, the result would ‘almost surely’ be a fractal set. Iteration from different initial conditions would, in fact converge to different indicator function, hence would accept different fractal sets . Albeit but also in fact the set should be dense (definition A.7) in .
This clearly demonstrates the chaotic nature of the ML iteration convergence.
Finally we end with the same note they have : indeed, any form of computable machinery would exhibit behaviour of this kind. These chaotic behaviours are in effect Universal, as clearly demonstrated in this paper.
Appendix A Definitions Used
Definition A.1.
Fixed Point of a function.
For a function , is said to be a fixed point, iff .
Definition A.2.
Metric Space.
A metric space is an ordered pair where is a set and is a metric on , i.e., a function:
such that for , the following holds:


iff .


The function ‘’ is also called “distance function” or simply “distance”.
Definition A.3.
Cauchy Sequence in a Metric Space .
Given a Metric space , the sequence of real numbers is called ‘Cauchy Sequence’, if for every positive real number , there is a positive integer such that for all natural numbers the following holds:
Roughly speaking, the terms of the sequence are getting closer and closer together in a way that suggests that the sequence ought to have a limit . Nonetheless, such a limit does not always exist within .
Note that by the term: sequence we are implicitly assuming infinite sequence , unless otherwise specified.
Definition A.4.
Complete Metric Space.
A metric space is called complete (or Cauchy) iff every Cauchy sequence (definition A.3) of points in has a limit , that is also in .
As an example of notcomplete metric space take , the set of rational numbers. Consider for instance the sequence defined by and function is defined by standard difference between , then :
This is a Cauchy sequence of rational numbers, but it does not converge towards any rational limit, but to , but then .
The closed interval is a Complete Metric space which is homemorphic (definition A.8) to .
Definition A.5.
Let be a function. The sequence where
is called an orbit (more precisely ‘forward orbit’) of .
is said to have a ‘closed’ or ‘periodic’ orbit if .
Definition A.6.
Topological Space.
Let the empty set be written as : . Let denotes the power set, i.e. the set of all subsets of . A topological space is a set together with satisfying the following axioms:

and ,

is closed under arbitrary union,

is closed under finite intersection.
The set is called a topology of .
Definition A.7.
Dense Set.
Let be a subset of a topological space . is dense in for any point , if any neighborhood of contains at least one point from .
The real numbers with the usual topology have the rational numbers as a countable dense subset.
Definition A.8.
Homeomorphism.
A function between two topological spaces and is called a homeomorphism if it has the following properties:

f is a bijection (onetoone and onto),

f is continuous,

the inverse function is continuous (f is an open mapping).
Lemma A.1.
Existence of Homeomorphic functions on Partitions.
Let be a partition of homeomorphic to , such that:
then, there exists homeomorphic functions from .
Definition A.9.
Bounded Sequence.
A sequence is called a bounded sequence iff :
The number ‘l’ is called the lower bound of the sequence and ‘u’ is called the upper bound of the sequence.
Lemma A.2.
BolzanoWeierstrass.
Every bounded sequence has a convergent (Cauchy) subsequence.
It is to be noted that a bounded sequence may have many convergent subsequences (for example, a sequence consisting of a counting of the rationals has subsequences converging to every real number) or rather few (for example a convergent sequence has all its subsequences having the same limit).
Definition A.10.
Turing Machine.
A “Turing Machine” is a 7tuple (), where:

is the set of states.

is the set of input alphabets not containing the blank symbol .

is the tape alphabet , where and .

is the transition function.

is start state.

is the accept state.

is the reject state.
According to standard notion , but we omit this requirement here, as we are not going to distinguish between two different types of halting (‘accept and halt’ vs ‘reject and halt’) of Turing Machines.
A Turing Machine ‘’ (definition A.10) computes as follows.
Initially ‘’ receives the input on the leftmost ‘’ squares on the tape, and the rest of the tape is filled up with blank symbol ‘’. The head starts on the leftmost square on the tape. As the input alphabet ‘’ does not contain the blank symbol ‘’, the first ‘’ marks end of input.
Once ‘’ starts, the computation proceeds wording to the rules of ‘’. However, if the head is already at the leftmost position, then, even if the ‘’ rule says move ‘’ , the head stays there.
The computation continues until the current state of the Turing Machine is either , or . In lieu of that, the machine will continue running forever.
Definition A.11.
Decider Turing Machine.
A Turing Machine, which is guaranteed to halt on any input (i.e. reach one of the states {} ) is called a decider.
Definition A.12.
Undecidable Problem.
If for a given problem, it is impossible to construct a decider (definition A.11) Turing Machine, then the problem is called undecidable problem.
Definition A.13.
Universal Turing Machine.
An ‘UTM’ or ‘Universal Turing Machine’ is a Turing Machine (definition A.10) such that it can simulate an arbitrary Turing machine on arbitrary input.
Definition A.14.
Church Turing Thesis.
Definition A.15.
Gödelization (Gödel).
Any string from an alphabet set can be represented as an integer in base ‘’ with . To achieve this, create a oneone and onto Gödel map , where,
Gödelization or then, is defined as follows:
A string of the form , with , can be mapped to an integer as follows [11]:
The common decimal system is a typical example of Gödelization of symbols from . The binary system represents Gödelization of symbols from . As a far fetched example, any string from the whole english alphabet, can be written as a base 26 integers!
Definition A.16.
Rationalization.
Theorem A.1.
Bounded non repeating sequences are chaotic.
Suppose is an infinite sequence such that , and when , then is chaotic. Given a bounded sequence, it is almost surely chaotic [11].
References

[1]
Vapnik, V. ,
The Nature of Statistical Learning Theory (Information Science and Statistics)
. Springer, Softcover reprint of hardcover 2nd ed. 2000 edition (October 21, 2010).  [2] Royden, H. L. , Real Analysis . Prentice Hall of India Private Limited, 2003.
 [3] Arrowsmith, D.K. and Place, C.M., An Introduction to Dynamical Systems. Cambridge University Press.
 [4] Brin, Michael and Stuck, Garrett., Introduction to Dynamical Systems. Cambridge University Press.
 [5] Turing, A. M., On Computable Numbers, with an Application to the Entscheidungsproblem . Proceedings of the London Mathematical Society, 2(42): 230265

[6]
Feller, W. ,
An Introduction to Probability Theory and Its Applications
. Vol. 2, 3rd ed. New York: Wiley, 1968.  [7] Williams, David. , Probability with Martingales. Cambridge University Press, 1991.
 [8] HeinzOtto Peitgen, Hartmut Jürgens , Dietmar Saupe, Chaos and Fractals New Frontiers of Science. Springer.
 [9] Smith,Henry J.S. On the integration of discontinuous functions. Proceedings of the London Mathematical Society, Series 1, vol. 6, pages 140153, 1874.
 [10] Cantor,George. Über unendliche, lineare Punktmannigfaltigkeiten V [On infinite, linear pointmanifolds (sets)]. Mathematische Annalen, vol. 21, pages 545591,1883.
 [11] Mondal , Nabarun and Ghosh, Partha P , Universal Computation is ‘Almost Surely’ Chaotic . Theoretical Computer Science,2014 , doi:10.1016/j.tcs.2014.07.005
 [12] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus, Intriguing properties of neural networks . Arxiv http://arxiv.org/abs/1312.6199
Comments
There are no comments yet.