A Formulation of Recursive Self-Improvement and Its Possible Efficiency

by   Wenyi Wang, et al.
The University of British Columbia

Recursive self-improving (RSI) systems have been dreamed of since the early days of computer science and artificial intelligence. However, many existing studies on RSI systems remain philosophical, and lacks clear formulation and results. In this paper, we provide a formal definition for one class of RSI systems, and then demonstrate the existence of computable and efficient RSI systems on a restricted version. We use simulation to empirically show that we achieve logarithmic runtime complexity with respect to the size of the search space, and these results suggest it is possible to achieve an efficient recursive self-improvement.



There are no comments yet.


page 1

page 2

page 3

page 4


Improving the complexity of Parys' recursive algorithm

Parys has recently proposed a quasi-polynomial version of Zielonka's rec...

From Seed AI to Technological Singularity via Recursively Self-Improving Software

Software capable of improving itself has been a dream of computer scient...

Self-Regulating Artificial General Intelligence

Here we examine the paperclip apocalypse concern for artificial general ...

A Behavioural Theory of Recursive Algorithms

“What is an algorithm?” is a fundamental question of computer science. G...

Subadditive stake systems

Stake systems which issue stakes as well as coins are proposed. Two suba...

Bounded Recursive Self-Improvement

We have designed a machine that becomes increasingly better at behaving ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recursive self-improving systems create new software iteratively. The newly created software should be better at creating future software. With this property, the system has potential to completely rewrite its original implementation, and take completely different approaches [8]. Chalmers’ proportionality thesis [1] hypothesizes that an increase in the capability of creating future systems proportionally increases the intelligence of the resulting system. With this hypothesis, he shows if a process iteratively generates a greater intelligent system using the current system, then this process leads to a phenomenon many refer to as superintelligence. However, many existing studies of RSI systems remain philosophical or lack clear mathematical formulation or results, e.g. [4, 5]. Some mathematically clear work on this topic exist, but they mostly focus on the architectures and methodologies to implement such systems [6, 2]. Our work is motivated to overcome this weakness by providing a mathematical formulation for a class of RSI procedures. With this formulation, we show that there exist such computable RSI systems. We further study in simulation that this procedure takes logarithmic runtime with respect to the size of search space to find the best program.

2 The Mathematical Formulation for A Family of RSI Systems

In this section, we develop a mathematical formulation for a family of RSI systems. To this end, we first examine the necessary elements of an RSI system. An RSI system iteratively improves its current program on the ability to generate “good” future programs. There are two crucial concepts that should be considered. First, an RSI system can be viewed as a sequence of programs where each program in the sequence generates the next program. Second, each program in the sequence has increasing ability to create future programs. Therefore, to define an RSI procedure a set of programs that can generate programs and an order of programs’ ability to improve future programs are needed. In the following, we consider a finite search space of programs that generate programs and a total order over it. Notice that a total order over a finite set is isomorphic to a score function. Denote the set of programs by and the score function by . For convenience, let a lower score represent a higher order. In other words, our objective is to minimize the score function S. Then an RSI system can be described as the following:

Definition 1 (RSI system)

Given a finite set of programs and a score function over . Initialize from to be the system’s current program. Repeat until certain criterion satisfied, generate using . If is better than according to S, replace by .

From this definition, one needs to decide how generates a program. In general, we should allow the RSI system to generate programs based on the history of the entire process. We assume a simplification that any program in the sequence is independent of all earlier programs given the immediate past program. In other words, the way a program generates a new program is independent of the history, and each program defines a fixed probabilistic distribution over

. This procedure defines a homogeneous Markov chain. We will see that even with this restriction, with some score function, the model is able to achieve a desirable runtime performance.

We illustrate the proposed formulation by an example. Consider a set of programs and a score function over such that . According to our formulation, each program can be abstracted as a probabilistic distribution over . To specify the distributions, let

be a vector of probabilistic weights of length 4 that represents the probabilistic distribution over

corresponding to . In this example we set

Then a possible RSI procedure may do the flowing. It starts from . First generates . Since , the current program is not updated. Then generates . The current program is updated to because . Next generates , and the current program updates to . Since has the lowest score (highest order), no future program will be updated. Figure 1 shows the corresponding Markov chain.








Figure 1:

The Markov chain corresponding to the RSI procedure defined by given scores and program generation probabilities in the example.

3 The Score Function as Expected Number of Steps

The last section defines an RSI procedure given a finite set of programs and a score function over it. We have specified the programs, but not the score function. Recall that the score function is to measure the programs’ ability to generate “good” future programs. We assume there is a utility measure being considered that can measure the “goodness” of a program. This measure need not be the same as the score function. Since the goal of these RSI systems is to find “good” programs, there needs to be a subset of target programs. Without loss of generality, we can assume there is a unique target program. One can do this because the further analysis will treat the target program as an absorbing state (the state that, once entered, cannot be left) of the Markov process.

A reasonable utility measure is the expected numbers of steps starting from a program to find the optimal program following our RSI definition. Furthermore, the score function needs to be consistent with the expected numbers of steps from programs to the optimal program following the process defined by itself. We mean that a score function is consistent if for all , implies that the expected number of steps to reach the optimal program from is greater than starting from . More generally, if one takes some measure for a programs’ ability to generate future programs, the score function needs to be consistent with this measure.

In the following, we describe how to construct a consistent score function. Construct the score function as the expected number of steps to reach the optimal program. To do this, we iteratively update the scores in an nondecreasing order. An intermediate Markov chain always follows the rules of transition defined by the program distributions and current scores. It is obvious that the optimal program should have the minimum score (smaller score represents more preferred program). Initially add the optimal program to the Markov chain, and set its score equals to zero. Set all other programs’ score equal infinity. Then repeat until all programs have a finite score. At each step, find program such that and has the minimum expected number of steps to reach the optimal program. Update the score of as the expected number of steps to reach the optimal program from . The Markov chain is be changed after changing scores. This process of computing the score function can be done in time by dynamic programming, which is similar as the Dijkstra algorithm, where is the size of programs, and is the sum of the number of possible programs that each program can generate. We do not describe the efficient way to compute it since the emphasis is the existence and computability of this function.

Two nice properties hold for this construction. First, the programs are added in an nondecreasing order of scores. Second, the score function equals the expected numbers of steps to reach the optimal program defined by this score function. We will prove the first property. The second property and the consistency of the score function are straightforward from the first property. Before the proof, we describe an example of how such score function is computed given the distributions to generate programs of each program and the optimal program.

Consider the same abstraction of programs as the example in section 2, where with corresponding probabilistic weights

Fix to be the optimal program. Initially set and . The transition function of initial Markov chain is

At the first step, the expected number of steps from following the current Markov chain are . Hence we update . Because of the change of score, transition of the Markov chain change to

Then we compute the expected number of steps from and following the updated Markov chain. By some arithmetic we get the expectation are for and (approximately) for . Since , update . By similar procedures, one can compute the score for .


Let be the program being added to the Markov process. Denote the resulting score function by . We need to show that for all feasible ’s.

Prove by induction on . The base case is true since and because it is some expected number of steps. For , assume holds for all . By definition we know that equals the expected number of steps from to reach following the Markov chain at step . Denote the expected number of steps from to reach following the Markov chain at step by E. E satisfies the equation that

where is the probability that generates . Therefore,

By the constructive process we know that .

Since for all , the Markov chain at step is the Markov chain at step with some transitions from to , where . Since for all programs , at step , there is no transition between ’s for . Therefore, similar as ,


Denote by and by . Then and . Since , . Hence . Thus .

4 Simulation Results

We test the performance of the proposed RSI procedure in simulation with randomly generated abstraction of programs. For each of the experiments, a fixed number of programs is chosen from

. The first program is designed to generate programs uniformly over all programs. Other programs generate programs follow a weighted distribution over a subset of programs. The sizes of subsets are drawn i.i.d. from the uniform distribution over integers between 10 and 100. Given the size of a subset, the subset and corresponding weights are drawn uniformly over the feasible supports. With 10 repeats for each

, the expected number of steps for the first program to reach the optimal program and its rank over all programs are shown in figure 2. Figure (a)a suggests a linear relation between and expected number of steps, and figure (b)b suggests a linear relation between

and rank of the first program. A linear regression model fits

and expected number of steps returns an R-squared value equals 0.983, which indicates the linear model can explain a lot of the data. Similarly, the linear regression fit to and rank of the first program has R-squared value equals 1.0.

(a) Expected numbers of steps
(b) Ranks
Figure 2: These two figures show simulation results of the expected number of steps to the optimal program (a) and ranks (b) of the program that generates programs uniformly.

For a fixed RSI system with , we run 100 simulations of proposed procedure starting from the first program. Figure 3 shows an error-bar of the ranks of current program at different number of steps of the simulation. We see that before some of the processes reach the optimal program, the ranks improve exponentially in a statistical sense. All of the processes converge to the global optimal program.

Figure 3: A simulation result of running our proposed RSI procedure given the precomputed score function.

5 Discussion and Future Works

In summary, we formulate a family of RSI procedures. For a more restricted family of RSI procedures satisfying the Markov assumption, we prove that a consistent score function exists, and we describe an algorithm to compute it. We study runtime of the restricted systems empirically. Experimental results suggest a logarithmic relation between the runtime and the number of programs. These results suggest a possibility of efficient recursive self-improvement. For future works, one may expand the model by embedding histories when generating a new program. Another possible extension is to model the programs taking a program as argument and return a suggested improvement of the given program. It is remarkable that in the simulations, the score function is precomputed, which takes more time than enumerate every program to find the optimal. From the practical point of view, to make the proposed procedure applicable, one needs to design an oracle score function, where at each evaluation it dose not need to process all other programs. One possible approach is to let each program take an program design task that can be evaluated as argument, and evaluate a program based on its performance on the evaluable tasks. The more rigorous approach is to study the reasoning of a program of its future behaviour including rewrites. This phenomenon is referred as Vingean reflection in Fallenstein and Soares’s work [3]. Alternatively, Steunebrink and Schmidhuber formulate it as a proof finding problem [7]. On the high level, this problem remains open and challenging. Since a practical score function may not have the desired properties as we analyzed in the ideal case, it would be interesting to study the behaviour of proposed procedures when the score function is biased, noisy or inconsistent.