DeepAI

# Reactive Control Improvisation

Reactive synthesis has become a widely-used paradigm for automatically building correct-by-construction systems that interact with an unknown or adversarial environment. We study how to do reactive synthesis when part of the specification of the system is that its behavior should be random. Such systems can be useful, for example a network protocol fuzz tester whose output should be varied, or a planner for a surveillance robot whose route should be unpredictable. However, existing reactive synthesis techniques do not provide a way to ensure random behavior while maintaining functional correctness. Towards this end, we introduce a notion of randomized reactive synthesis based on the recently-proposed framework of control improvisation (CI). This framework provides a natural way to integrate a randomness requirement with the usual functional specifications of reactive synthesis. We define a reactive version of CI which can be used to solve randomized reactive synthesis problems over finite windows. We theoretically characterize when such problems are realizable, and give a general method for solving them. For specifications given by reachability/safety games or deterministic finite automata, our method yields a polynomial-time synthesis algorithm. For various other types of specifications including temporal logic formulas, we obtain a polynomial-space algorithm and prove matching PSPACE-hardness results. In all of these cases, we show that randomized reactive synthesis is no harder in a complexity-theoretic sense than ordinary reactive synthesis.

• 13 publications
• 57 publications
08/12/2020

### Reactive Synthesis from Extended Bounded Response LTL Specifications

Reactive synthesis is a key technique for the design of correct-by-const...
06/06/2022

### Randomized Synthesis for Diversity and Cost Constraints with Control Improvisation

In many synthesis problems, it can be essential to generate implementati...
02/14/2019

### Environmentally-friendly GR(1) Synthesis

Many problems in reactive synthesis are stated using two formulas ---an ...
07/03/2020

### Supervisory Controller Synthesis for Non-terminating Processes is an Obliging Game

We present a new algorithm to solve the supervisory control problem over...
03/29/2019

### Practical Synthesis of Reactive Systems from LTL Specifications via Parity Games

The synthesis - the automatic construction - of reactive systems from li...
09/14/2017

### Validity-Guided Synthesis of Reactive Systems from Assume-Guarantee Contracts

Automated synthesis of reactive systems from spe- cifications has been a...
03/09/2021

### Synthesis from Weighted Specifications with Partial Domains over Finite Words

In this paper, we investigate the synthesis problem of terminating react...

## 1 Introduction

Many interesting programs, including protocol handlers, task planners, and concurrent software generally, are open systems that interact over time with an external environment. Synthesis of such reactive systems requires finding an implementation that satisfies the desired specification no matter what the environment does. This problem, reactive synthesis, has a long history (see [7] for a survey). Reactive synthesis from temporal logic specifications [18] has been particularly well-studied and is being increasingly used in applications such as hardware synthesis [3] and robotic task planning [14].

In this paper, we investigate how to synthesize reactive systems with random behavior: in fact, systems where being random in a prescribed way is part of their specification. This is in contrast to prior work on stochastic games where randomness is used to model uncertain environments or randomized strategies are merely allowed, not required. Solvers for stochastic games may incidentally produce randomized strategies to satisfy a functional specification (and some types of specification, e.g. multi-objective queries [4], may only be realizable by randomized strategies), but do not provide a general way to enforce randomness. Unlike most specifications used in reactive synthesis, our randomness requirement is a property of a system’s distribution of behaviors, not of an individual behavior. While probabilistic specification languages like PCTL [11] can capture some such properties, the simple and natural randomness requirement we study here cannot be concisely expressed by existing languages (even those as powerful as SGL [2]). Thus, randomized reactive synthesis in our sense requires significantly different methods than those previously studied.

However, we argue that this type of synthesis is quite useful, because introducing randomness into the behavior of a system can often be beneficial, enhancing variety, robustness, and unpredictability. Example applications include:

• Synthesizing a black-box fuzz tester for a network service, we want a program that not only conforms to the protocol (perhaps only most of the time) but can generate many different sequences of packets: randomness ensures this.

• Synthesizing a controller for a robot exploring an unknown environment, randomness provides a low-memory way to increase coverage of the space. It can also help to reduce systematic bias in the exploration procedure.

• Synthesizing a controller for a patrolling surveillance robot, introducing randomness in planning makes the robot’s future location harder to predict.

Adding randomness to a system in an ad hoc way could easily compromise its correctness. This paper shows how a randomness requirement can be integrated into the synthesis process, ensuring correctness as well as allowing trade-offs to be explored: how much randomness can be added while staying correct, or how strong can a specification be while admitting a desired amount of randomness?

To formalize randomized reactive synthesis we build on the idea of control improvisation, introduced in [6], formalized in [9], and further generalized in [8]. Control improvisation (CI) is the problem of constructing an improviser, a probabilistic algorithm which generates finite words subject to three constraints: a hard constraint that must always be satisfied, a soft constraint

that need only be satisfied with some probability, and a

randomness constraint that no word be generated with probability higher than a given bound. We define reactive control improvisation (RCI), where the improviser generates a word incrementally, alternating adding symbols with an adversarial environment. To perform synthesis in a finite window, we encode functional specifications and environment assumptions into the hard constraint, while the soft and randomness constraints allow us to tune how randomness is added to the system. The improviser obtained by solving the RCI problem is then a solution to the original synthesis problem.

The difficulty of solving reactive CI problems depends on the type of specification. We study several types commonly used in reactive synthesis, including reachability games (and variants, e.g. safety games) and formulas in the temporal logics LTL and LDL [17, 5]. We also investigate the specification types studied in [8], showing how the complexity of the CI problem changes when adding reactivity. For every type of specification we obtain a randomized synthesis algorithm whose complexity matches that of ordinary reactive synthesis (in a finite window). This suggests that reactive control improvisation should be feasible in applications like robotic task planning where reactive synthesis tools have proved effective.

In summary, the main contributions of this paper are:

• The reactive control improvisation (RCI) problem definition (Sec. 3);

• The notion of width, a quantitative generalization of “winning” game positions that measures how many ways a player can win from that position (Sec. 4);

• A characterization of when RCI problems are realizable in terms of width, and an explicit construction of an improviser (Sec. 4);

• A general method for constructing efficient improvisation schemes (Sec. 5);

• A polynomial-time improvisation scheme for reachability/safety games and deterministic finite automaton specifications (Sec. 6);

• -hardness results for many other specification types including temporal logics, and matching polynomial-space improvisation schemes (Sec. 7).

Finally, Sec. 8 summarizes our results and gives directions for future work.

## 2 Background

### 2.1 Notation

Given an alphabet , we write for the length of a finite word , for the empty word, for the words of length , and for , the set of all words of length at most . We abbreviate deterministic/nondeterministic finite automaton by DFA/NFA, and context-free grammar by CFG. For an instance of any such formalism, which we call a specification, we write for the language (subset of ) it defines (note the distinction between a language and a representation thereof). We view formulas of Linear Temporal Logic (LTL) [17] and Linear Dynamic Logic (LDL) [5] as specifications using their natural semantics on finite words (see [5]).

We use the standard complexity classes and , and the -complete problem of determining the truth of a quantified Boolean formula. For background on these classes and problems see for example [1].

Some specifications we use as examples are reachability games [15], where players’ actions cause transitions in a state space and the goal is to reach a target state. We group these games, safety games where the goal is to avoid a set of states, and reach-avoid games combining reachability and safety goals [19], together as reachability/safety games (RSGs). We draw reachability games as graphs in the usual way: squares are adversary-controlled states, and states with a double border are target states.

### 2.2 Synthesis Games

Reactive control improvisation will be formalized in terms of a 2-player game which is essentially the standard synthesis game used in reactive synthesis [7]. However, our formulation is slightly different for compatibility with the definition of control improvisation, so we give a self-contained presentation here.

Fix a finite alphabet . The players of the game will alternate picking symbols from , building up a word. We can then specify the set of winning plays with a language over . To simplify our presentation we assume that players strictly alternate turns and that any symbol from is a legal move. These assumptions can be relaxed in the usual way by modifying the winning set appropriately.

Finite words: While reactive synthesis is usually considered over infinite words, in this paper we focus on synthesis in a finite window, as it is unclear how best to generalize our randomness requirement to the infinite case. This assumption is not too restrictive, as solutions of bounded length are adequate for many applications. In fuzz testing, for example, we do not want to generate arbitrarily long files or sequences of packets. In robotic planning, we often want a plan that accomplishes a task within a certain amount of time. Furthermore, planning problems with liveness specifications can often be segmented into finite pieces: we do not need an infinite route for a patrolling robot, but can plan within a finite horizon and replan periodically. Replanning may even be necessary when environment assumptions become invalid. At any rate, we will see that the bounded case of reactive control improvisation is already highly nontrivial.

As a final simplification, we require that all plays have length exactly . To allow a range

and extend all shorter words to length , modifying the winning set appropriately.

###### Definition 1

A history is an element of , representing the moves of the game played so far. We say the game has ended after if ; otherwise it is our turn after if is even, and the adversary’s turn if

is odd.

###### Definition 2

A strategy is a function such that for any history with ,

is a probability distribution over

. We write to indicate that is a symbol randomly drawn from .

Since strategies are randomized, fixing strategies for both players does not uniquely determine a play of the game, but defines a distribution over plays:

###### Definition 3

Given a pair of strategies , we can generate a random play as follows. Pick , then for from to pick if is odd and otherwise. Finally, put . We write for the probability of obtaining the play . This extends to a set of plays in the natural way: . Finally, the set of possible plays is .

The next definition is just the conditional probability of a play given a history, but works for histories with probability zero, simplifying our presentation.

###### Definition 4

For any history and word , we write for the probability that if we assign for and sample by the process above, then .

## 3 Problem Definition

### 3.1 Motivating Example

Consider synthesizing a planner for a surveillance drone operating near another, potentially adversarial drone. Discretizing the map into the 7x7 grid in Fig. 1

(ignoring the depicted trajectories for the moment), a route is a word over the four movement directions. Our specification is to visit the 4 circled locations in 30 moves without colliding with the adversary, assuming it cannot move into the 5 highlighted central locations.

Existing reactive synthesis tools can produce a strategy for the patroller ensuring that the specification is always satisfied. However, the strategy may be deterministic, so that in response to a fixed adversary the patroller will always follow the same route. Then it is easy for a third party to predict the route, which could be undesirable, and is in fact unnecessary if there are many other ways the drone can satisfy its specification.

Reactive control improvisation addresses this problem by adding a new type of specification to the hard constraint above: a randomness requirement stating that no behavior should be generated with probability greater than a threshold . If we set (say) , then any controller solving the synthesis problem must be able to satisfy the hard constraint in at least 5 different ways, never producing any given behavior more than 20% of the time. Our synthesis algorithm can in fact compute the smallest for which synthesis is possible, yielding a controller that is maximally-randomized

in that the system’s behavior is as close to a uniform distribution as possible.

To allow finer tuning of how randomness is introduced into the controller, our definition also includes a soft constraint which need only be satisfied with some probability . This allows us to prefer certain safe behaviors over others. In our drone example, we require that with probability at least , we do not visit a circled location twice.

These hard, soft, and randomness constraints form an instance of our reactive control improvisation problem. Encoding the hard and soft constraints as DFAs, our algorithm (Sec. 6) produced a controller achieving the smallest realizable . We tested the controller using the PX4 autopilot [16] to refine the generated routes into control actions for a drone simulated in Gazebo [13] (videos and code are available online [10]). A selection of resulting trajectories are shown in Fig. 1 (the remainder in Appendix 0.A): starting from the triangles, the patroller’s path is solid, the adversary’s dashed. The left run uses an adversary that moves towards the patroller when possible. The right runs, with a simple adversary moving in a fixed loop, illustrate the randomness of the synthesized controller.

### 3.2 Reactive Control Improvisation

Our formal notion of randomized reactive synthesis in a finite window is a reactive extension of control improvisation [8, 9], which captures the three types of constraint (hard, soft, randomness) seen above. We use the notation of [8] for the specifications and languages defining the hard and soft constraints:

###### Definition 5 ([8])

Given hard and soft specifications and of languages over , an improvisation is a word . It is admissible if . The set of all improvisations is denoted , and admissible improvisations .

###### Running Example

We will use the following simple example throughout the paper: each player may increment (), decrement (), or leave unchanged () a counter which is initially zero. The alphabet is , and we set . The hard specification is the DFA in Fig. 2 requiring that the counter stay within . The soft specification is a similar DFA requiring that the counter end at a nonnegative value.

Then for example the word is an admissible improvisation, satisfying both hard and soft constraints, and so is in . The word on the other hand satisfies but not , so it is in but not . Finally, does not satisfy , so it is not an improvisation at all and is not in .

A reactive control improvisation problem is defined by , , and parameters and . A solution is then a strategy which ensures that the hard, soft, and randomness constraints hold against every adversary. Formally, following [8, 9]:

###### Definition 6

Given an RCI instance with , , and as above and , a strategy is an improvising strategy if it satisfies the following requirements for every adversary :

Hard constraint:

Soft constraint:

Randomness:

, .

If there is an improvising strategy , we say that is realizable. An improviser for is then an expected-finite time probabilistic algorithm implementing such a strategy , i.e. whose output distribution on input is .

###### Definition 7

Given an RCI instance , the reactive control improvisation (RCI) problem is to decide whether is realizable, and if so to generate an improviser for .

###### Running Example

Suppose we set and . Let be the strategy which picks or with equal probability in the first move, and thenceforth picks the action which moves the counter closest to respectively. This satisfies the hard constraint, since if the adversary ever moves the counter to we immediately move it back. The strategy also satisfies the soft constraint, since with probability we set the counter to on the first move, and if the adversary moves to we move back to and remain nonnegative. Finally, also satisfies the randomness constraint, since each choice of first move happens with probability and so no play can be generated with higher probability. So is an improvising strategy and this RCI instance is realizable.

We will study classes of RCI problems with different types of specifications:

###### Definition 8

If HSpec and SSpec are classes of specifications, then the class of RCI instances where and is denoted . We use the same notation for the decision problem associated with the class, i.e., given , decide whether is realizable. The size of an RCI instance is the total size of the bit representations of its parameters, with represented in unary and in binary.

Finally, a synthesis algorithm in our context takes a specification in the form of an RCI instance and produces an implementation in the form of an improviser. This corresponds exactly to the notion of an improvisation scheme from [8]:

###### Definition 9 ([8])

A polynomial-time improvisation scheme for a class of RCI instances is an algorithm with the following properties:

Correctness:

For any , if is realizable then is an improviser for , and otherwise .

Scheme efficiency:

There is a polynomial such that the runtime of on any is at most .

Improviser efficiency:

There is a polynomial such that for every , if then has expected runtime at most .

The first two requirements simply say that the scheme produces valid improvisers in polynomial time. The third is necessary to ensure that the improvisers themselves are efficient: otherwise, the scheme might for example produce improvisers running in time exponential in the size of the specification.

A main goal of our paper is to determine for which types of specifications there exist polynomial-time improvisation schemes. While we do find such algorithms for important classes of specifications, we will also see that determining the realizability of an RCI instance is often -hard. Therefore we also consider polynomial-space improvisation schemes, defined as above but replacing time with space.

## 4 Existence of Improvisers

### 4.1 Width and Realizability

The most basic question in reactive synthesis is whether a specification is realizable. In randomized reactive synthesis, the question is more delicate because the randomness requirement means that it is no longer enough to ensure some property regardless of what the adversary does: there must be many ways to do so. Specifically, there must be at least improvisations if we are to generate each of them with probability at most . Furthermore, at least this many improvisations must be possible given an unknown adversary: even if many exist, the adversary may be able to force us to use only a single one. We introduce a new notion of the size of a set of plays that takes this into account.

###### Definition 10

The width of is .

The width counts how many distinct plays can be generated regardless of what the adversary does. Intuitively, a “narrow” game — one whose set of winning plays has small width — is one in which the adversary can force us to choose among only a few winning plays, while in a “wide” one we always have many safe choices available. Note that which particular plays can be generated depends on the adversary: the width only measures how many can be generated. For example, means that a play in can always be generated, but possibly a different element of for different adversaries.

###### Running Example

Figure 3 shows the synthesis game for our running example: paths ending in circled or shaded states are plays in or respectively (ignore the state labels for now). At left, the bold arrows show the 4 plays in possible against the adversary that moves away from 0, and down at 0. This shows , and in fact 4 plays are possible against any adversary, so . Similarly, at right we see that .

It will be useful later to have a relative version of width that counts how many plays are possible from a given position:

###### Definition 11

Given a set of plays and a history , the width of given is .

This is a direct generalization of “winning” positions: if is the set of winning plays, then counts the number of ways to win from .

We will often use the following basic properties of without comment (for this proof, and the details of later proof sketches, see Appendix 0.B). Note that (3)–(5) provide a recursive way to compute widths that we will use later, and which is illustrated by the state labels in Fig. 3. lemmalemmaWidthrel For any set of plays and history :

1. ;

2. ;

3. if , then ;

4. if it is our turn after , then ;

5. if it is the adversary’s turn after , then .

Now we can state the realizability conditions, which are simply that and have sufficiently large width. In fact, the conditions turn out to be exactly the same as those for non-reactive CI except that width takes the place of size [9].

###### Theorem 4.1

The following are equivalent:

1. is realizable.

2. and .

3. There is an improviser for .

###### Running Example

We saw above that our example was realizable with , and indeed and . However, if we put we violate the second inequality and the instance is not realizable: essentially, we need to distribute probability among plays in (to satisfy the soft constraint), but since , against some adversaries we can only generate one play in and would have to give it the whole (violating the randomness requirement).

The difficult part of the Theorem is constructing an improviser when the inequalities (2) hold. Despite the similarity in these conditions to the non-reactive case, the construction is much more involved. We begin with a general overview.

### 4.2 Improviser Construction: Discussion

Our improviser can be viewed as an extension of the classical random-walk reduction of uniform sampling to counting [20]. In that algorithm (which was used in a similar way for DFA specifications in [8, 9]), a uniform distribution over paths in a DAG is obtained by moving to the next vertex with probability proportional to the number of paths originating at it. In our case, which plays are possible depends on the adversary, but the width still tells us how many plays are possible. So we could try a random walk using widths as weights: e.g. on the first turn in Fig. 3, picking , , and with probabilities , , and respectively. Against the adversary shown in Fig. 3, this would indeed yield a uniform distribution over the four possible plays in .

However, the soft constraint may require a non-uniform distribution. In the running example with , we need to generate the single possible play in with probability , not just the uniform probability . This is easily fixed by doing the random walk with a weighted average of the widths of and : specifically, move to position with probability proportional to . In the example, this would result in plays in getting probability and those in getting probability . Taking sufficiently large, we can ensure the soft constraint is satisfied.

Unfortunately, this strategy can fail if the adversary makes more plays available than the width guarantees. Consider the game on the left of Fig. 4, where and . This is realizable with , but no values of and yield improvising strategies, essentially because an adversary moving from to breaks the worst-case assumption that the adversary will minimize the number of possible plays by moving to . In fact, this instance is realizable but not by any memoryless strategy. To see this, note that all such strategies can be parametrized by the probabilities and in Fig. 4. To satisfy the randomness constraint against the adversary that moves from to , both and must be at most . To satisfy the soft constraint against the adversary that moves from to we must have , so . But then , a contradiction.

To fix this problem, our improvising strategy (which we will fully specify in Algorithm 1 below) takes a simplistic approach: it tracks how many plays in and are expected to be possible based on their widths, and if more are available it ignores them. For example, entering state from there are 2 ways to produce a play in , but since we ignore the play in . Extra plays in are similarly ignored by being treated as members of . Ignoring unneeded plays may seem wasteful, but the proof of Theorem 4.1 will show that nevertheless achieves the best possible :

###### Corollary 1

is realizable iff and . Against any adversary, the error probability of Algorithm 1 is at most .

Thus, if any improviser can achieve an error probability , ours does. We could ask for a stronger property, namely that against each adversary the improviser achieves the smallest possible error probability for that adversary. Unfortunately, this is impossible in general. Consider the game on the right in Fig. 4, with . Against the adversary which always moves up, we can achieve with the strategy that at moves to . We can also achieve against the adversary that always moves down, but only with a different strategy, namely the one that at moves to . So there is no single strategy that achieves the optimal for every adversary. A similar argument shows that there is also no strategy achieving the smallest possible for every adversary. In essence, optimizing or in every case would require the strategy to depend on the adversary.

### 4.3 Improviser Construction: Details

Our improvising strategy, as outlined in the previous section, is shown in Algorithm 1. We first compute and , the (maximum) probabilities for generating elements of and respectively. As in [8], we take as large as possible given , and determine from the probability left over (modulo a couple corner cases).

Next we initialize and , our expectations for how many plays in and respectively are still possible to generate. Initially these are given by and , but as we saw above it is possible for more plays to become available. The function Partition handles this, deciding which (resp., ) out of the available () plays we will use. The behavior of Partition is defined by the following lemma; its proof (in Appendix 0.B) greedily takes the first possible plays in under some canonical order and the first of the remaining plays in .

lemmalemmaPartition If it is our turn after , and satisfy and , there are integer partitions and of and respectively such that and for all . These are computable in poly-time given oracles for and .

Finally, we perform the random walk, moving from position to with (unnormalized) probability , the weighted average described above.

###### Running Example

With , as before and so and . On the first move, and match and , so all plays are used and Partition returns for each . Looking up these values in Fig. 5, we see and so . Similarly and . We choose an action according to these weights; suppose , so that we update and , and suppose the adversary responds with . From Fig. 5, and , whereas and . So Partition discards a play, say returning for and for . Then and . So we pick or with equal probability, say . If the adversary responds with , we get the play , shown in bold on Fig. 5. As desired, it satisfies the hard constraint.

The next few lemmas establish that is well-defined and in fact an improvising strategy, allowing us to prove Theorem 4.1. Throughout, we write (resp., ) for the value of () at the start of the iteration for history . We also write (so when we pick ).

lemmalemmaSigmaHard If , then is a well-defined strategy and for every adversary .

###### Proof (sketch)

An easy induction on shows the conditions of Lemma 1 are always satisfied, and that is always positive since we never pick a with . So and is well-defined. Furthermore, implies , so for any we have and thus . ∎

lemmalemmaSigmaSoft If , then for every .

###### Proof (sketch)

Because of the term in the weights , the probability of obtaining a play in starting from is at least (as can be seen by induction on in order of decreasing length). Then since and we have . ∎

lemmalemmaSigmaRandom If , then for every and .

###### Proof (sketch)

If the adversary is deterministic, the weights we use for our random walk yield a distribution where each play has probability either or (depending on whether or ). If the adversary assigns nonzero probability to multiple choices this only decreases the probability of individual plays. Finally, since we have . ∎

###### Proof (of Theorem 4.1)

We use a similar argument to that of [8].

(1)(2)

Suppose is an improvising strategy, and fix any adversary . Then , so . Since is arbitrary, this implies . Since , we also have , so and thus .

(2)(3)

By Lemmas 5 and 4.3, is well-defined and satisfies the hard and randomness constraints. By Lemma 4.3, , so also satisfies the soft constraint and thus is an improvising strategy. Its transition probabilities are rational, so it can be implemented by an expected finite-time probabilistic algorithm, which is then an improviser for .

(3)(1)

Immediate. ∎

###### Proof (of Corollary 1)

The inequalities in the statement are equivalent to those of Theorem 4.1(2). By Lemma 4.3, we have . So the error probability is at most . ∎

## 5 A Generic Improviser

We now use the construction of Sec. 4 to develop a generic improvisation scheme usable with any class of specifications Spec supporting the following operations:

Intersection:

Given specs and , find such that .

Width Measurement:

Given a specification , a length in unary, and a history , compute where .

Efficient algorithms for these operations lead to efficient improvisation schemes:

###### Theorem 5.1

If the operations on Spec above take polynomial time (resp. space), then has a polynomial-time (space) improvisation scheme.

###### Proof

Given an instance in , we first apply intersection to and to obtain such that . Since intersection takes polynomial time (space), has size polynomial in . Next we use width measurement to compute and . If these violate the inequalities in Theorem 4.1, then is not realizable and we return . Otherwise is realizable, and above is an improvising strategy. Furthermore, we can construct an expected finite-time probabilistic algorithm implementing , using width measurement to instantiate the oracles needed by Lemma 1. Determining and takes invocations of Partition, each of which is poly-time relative to the width measurements. These take time (space) polynomial in , since and have size polynomial in . As , they have polynomial bitwidth and so the arithmetic required to compute for each takes polynomial time. Therefore the total expected runtime (space) of the improviser is polynomial. ∎

Note that as a byproduct of testing the inequalities in Theorem 4.1, our algorithm can compute the best possible error probability given , , and (see Corollary 1). Alternatively, given , we can compute the best possible .

We will see below how to efficiently compute widths for DFAs, so Theorem 5.1 yields a polynomial-time improvisation scheme. If we allow polynomial-space schemes, we can use a general technique for width measurement that only requires a very weak assumption on the specifications, namely testability in polynomial space: theoremtheoremPspaceScheme has a polynomial-space improvisation scheme, where PSA is the class of polynomial-space decision algorithms.

###### Proof (sketch)

We apply Theorem 5.1, computing widths recursively using Lemma 11, (3)–(5). As in the algorithm, the current path in the recursive tree and required auxiliary storage need only polynomial space. ∎

## 6 Reachability Games and DFAs

Now we develop a polynomial-time improvisation scheme for RCI instances with DFA specifications. This also provides a scheme for reachability/safety games, whose winning conditions can be straightforwardly encoded as DFAs.

Suppose is a DFA with states , accepting states , and transition function . Our scheme is based on the fact that depends only on the state of reached on input , allowing these widths to be computed by dynamic programming. Specifically, for all and we define:

 C(v,i)=⎧⎨⎩1v∈Ti=nminu∈ΣC(δ(v,u),i+1)i
###### Running Example

Figure 6 shows the values in rows from downward. For example, is our turn, so , while is the adversary’s turn, so . Note that the values in Fig. 6 agree with the widths shown in Fig. 5.

###### Lemma 1

For any history , writing we have , where is the state reached by running on .

###### Proof

We prove this by induction on in decreasing order. In the base case , we have . Now take any history with . By hypothesis, for any we have . If it is our turn after , then as desired. If instead it is the adversary’s turn after , then again as desired. So by induction the hypothesis holds for any . ∎

###### Theorem 6.1

has a polynomial-time improvisation scheme.

###### Proof

We implement Theorem 5.1. Intersection can be done with the standard product construction. For width measurement we compute the quantities by dynamic programming (from down to ) and apply Lemma 1. ∎

## 7 Temporal Logics and Other Specifications

In this section we analyze the complexity of reactive control improvisation for specifications in the popular temporal logics LTL and LDL. We also look at NFA and CFG specifications, previously studied for non-reactive CI [8], to see how their complexities change in the reactive case.

For LTL specifications, reactive control improvisation is -hard because this is already true of ordinary reactive synthesis in a finite window (we suspect this has been observed but could not find a proof in the literature). theoremtheoremLTLHardness Finite-window reactive synthesis for LTL is -hard.

###### Proof (sketch)

Given a , we can view assignments to its variables as traces over a single proposition. In polynomial time we can construct an LTL formula whose models are the satisfying assignments of . Then there is a winning strategy to generate a play satisfying iff is true. ∎

###### Corollary 2

and are -hard.

This is perhaps disappointing, but is an inevitable consequence of LTL subsuming Boolean formulas. On the other hand, our general polynomial-space scheme applies to LTL and its much more expressive generalization LDL:

###### Theorem 7.1

has a polynomial-space improvisation scheme.

###### Proof

This follows from Theorem 5, since satisfaction of an LDL formula by a finite word can be checked in polynomial time (e.g. by combining dynamic programming on subformulas with a regular expression parser). ∎

Thus for temporal logics polynomial-time algorithms are unlikely, but adding randomization to reactive synthesis does not increase its complexity.

The same is true for NFA and CFG specifications, where it is again -hard to find even a single winning strategy: theoremtheoremNFA Finite-window reactive synthesis for NFAs is -hard.

###### Proof (sketch)

Reduce from as in Theorem 7, constructing an NFA accepting the satisfying assignments of (as done in [12]). ∎

and are -hard.

###### Theorem 7.2

has a polynomial-space improvisation scheme.

###### Proof

By Theorem 5, since CFG parsing can be done in polynomial time. ∎

Since NFAs can be converted to CFGs in polynomial time, this completes the picture for the kinds of CI specifications previously studied. In non-reactive CI, DFA specifications admit a polynomial-time improvisation scheme while for NFAs/CFGs the CI problem is -equivalent [8]. Adding reactivity, DFA specifications remain polynomial-time while NFAs and CFGs move up to .

## 8 Conclusion

In this paper we introduced reactive control improvisation as a framework for modeling reactive synthesis problems where random but controlled behavior is desired. RCI provides a natural way to tune the amount of randomness while ensuring that safety or other constraints remain satisfied. We showed that RCI problems can be efficiently solved in many cases occurring in practice, giving a polynomial-time improvisation scheme for reachability/safety or DFA specifications. We also showed that RCI problems with specifications in LTL or LDL, popularly used in planning, have the -hardness typical of bounded games, and gave a matching polynomial-space improvisation scheme. This scheme generalizes to any specification checkable in polynomial space, including NFAs, CFGs, and many more expressive formalisms. Table 1 summarizes these results.

These results show that, at a high level, finding a maximally-randomized strategy using RCI is no harder than finding any winning strategy at all: for specifications yielding games solvable in polynomial time (respectively, space), we gave polynomial-time (space) improvisation schemes. We therefore hope that in applications where ordinary reactive synthesis has proved tractable, our notion of randomized reactive synthesis will also. In particular, we expect our DFA scheme to be quite practical, and are experimenting with applications in robotic planning. On the other hand, our scheme for temporal logic specifications seems unlikely to be useful in practice without further refinement. An interesting direction for future work would be to see if modern solvers for quantified Boolean formulas (QBF) could be leveraged or extended to solve these RCI problems. This could be useful even for DFA specifications, as conjoining many simple properties can lead to exponentially-large automata. Symbolic methods based on constraint solvers would avoid such blow-up.

We are also interested in extending the RCI problem definition to unbounded or infinite words, as typically used in reactive synthesis. These extensions, as well as that to continuous signals, would be useful in robotic planning, cyber-physical system testing, and other applications. However, it is unclear how best to adapt our randomness constraint to settings where the improviser can generate infinitely many words. In such settings the improviser could assign arbitrarily small or even zero probability to every word, rendering the randomness constraint trivial. Even in the bounded case, RCI extensions with more complex randomness constraints than a simple upper bound on individual word probabilities would be worthy of study. One possibility would be to more directly control diversity and/or unpredictability by requiring the distribution of the improviser’s output to be close to uniform after transformation by a given function.

Acknowledgements. The authors would like to thank Markus Rabe, Moshe Vardi, and several anonymous reviewers for helpful discussions and comments, and Ankush Desai and Tommaso Dreossi for assistance with the drone simulations. This work is supported in part by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1106400, by NSF grants CCF-1139138 and CNS-1646208, by DARPA under agreement number FA8750-16-C0043, and by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.

## References

• [1] Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge University Press, New York (2009)
• [2] Baier, C., Brázdil, T., Größer, M., Kučera, A.: Stochastic game logic. Acta informatica pp. 1–22 (2012)
• [3] Bloem, R., Galler, S., Jobstmann, B., Piterman, N., Pnueli, A., Weiglhofer, M.: Specify, compile, run: Hardware from psl. In: Proceedings of the 6th International Workshop on Compiler Optimization meets Compiler Verification (COCV 2007). Electronic Notes in Theoretical Computer Science, vol. 190, pp. 3–16. Elsevier (2007), http://www.sciencedirect.com/science/article/pii/S157106610700583X
• [4] Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: Chatterjee, K., Sgall, J. (eds.) Mathematical Foundations of Computer Science 2013. pp. 266–277. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)
• [5]

De Giacomo, G., Vardi, M.Y.: Linear temporal logic and linear dynamic logic on finite traces. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. pp. 854–860. IJCAI ’13, AAAI Press (2013),

http://dl.acm.org/citation.cfm?id=2540128.2540252
• [6] Donze, A., Libkind, S., Seshia, S.A., Wessel, D.: Control improvisation with application to music. Tech. Rep. UCB/EECS-2013-183, EECS Department, University of California, Berkeley (Nov 2013), http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-183.html
• [7] Finkbeiner, B.: Synthesis of reactive systems. In: Esparza, J., Grumberg, O., Sickert, S. (eds.) Dependable Software Systems Engineering. NATO Science for Peace and Security Series, D: Information and Communication Security, vol. 45, pp. 72–98. IOS Press, Amsterdam, Netherlands (2016)
• [8] Fremont, D.J., Donzé, A., Seshia, S.A.: Control improvisation. arXiv preprint (2017)
• [9] Fremont, D.J., Donzé, A., Seshia, S.A., Wessel, D.: Control Improvisation. In: 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). pp. 463–474 (2015)
• [10] Fremont, D.J., Seshia, S.A.: Reactive control improvisation website (2018), https://math.berkeley.edu/~dfremont/reactive.html
• [11] Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal aspects of computing 6(5), 512–535 (1994)
• [12] Kannan, S., Sweedyk, Z., Mahaney, S.: Counting and random generation of strings in regular languages. In: 6th Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 551–557. SIAM (1995)
• [13] Koenig, N., Howard, A.: Design and use paradigms for Gazebo, an open-source multi-robot simulator. In: Intelligent Robots and Systems (IROS), 2004 IEEE/RSJ International Conference on. vol. 3, pp. 2149–2154. IEEE (2004)
• [14] Kress-Gazit, H., Fainekos, G.E., Pappas, G.J.: Temporal-logic-based reactive mission and motion planning. IEEE Transactions on Robotics 25(6), 1370–1381 (2009)
• [15] Mazala, R.: Infinite games. In: Grädel, E., Thomas, W., Wilke, T. (eds.) Automata, Logics, and Infinite Games, chap. 2, pp. 23–38. Springer, Berlin, Heidelberg (2002), http://dx.doi.org/10.1007/3-540-36387-4_2
• [16] Meier, L., Honegger, D., Pollefeys, M.: PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In: Robotics and Automation (ICRA), 2015 IEEE International Conference on. pp. 6235–6240. IEEE (2015)
• [17] Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium on Foundations of Computer Science (FOCS 1977). pp. 46–57. IEEE (1977)
• [18] Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 179–190. POPL ’89, ACM, New York, NY, USA (1989), http://doi.acm.org/10.1145/75277.75293
• [19] Tomlin, C., Lygeros, J., Sastry, S.: Computing controllers for nonlinear hybrid systems. In: 2nd International Workshop on Hybrid Systems: Computation and Control (HSCC). pp. 238–255. Springer, Berlin, Heidelberg (1999)
• [20] Wilf, H.S.: A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects. Advances in Mathematics 24(2), 281–291 (1977)

## Appendix 0.A Patrolling Drone Experiments

As described above, we ran experiments with two adversary strategies: one that moves towards the patrolling drone whenever possible, and one that moves in a fixed loop. We ran the improviser four times against each adversary, obtaining the trajectories in Figures 7 and 8. Animations showing the trajectories over time (and so illustrating that collisions do not in fact occur) are available online [10]. This site also provides our implementation of the DFA improvisation scheme, and implementations of the specifications and adversaries used in our drone experiments (as well as an adversary controlled by the user, so that one can type in actions and see how the improviser responds).

## Appendix 0.B Detailed Proofs

We use without comment several basic facts about , all immediate from its definition:

###### Lemma 2

For any history , word , and strategies , :

1. if , then ;

2. if , then ;

3. if , then for some , and:

1. if it is our turn after , then ;

2. if it is the adversary’s turn after , then .

*

###### Proof

1. By definition, , so trivially. Since , if then