## 1 Introduction

## 2 Optimality as a way to discover macro actions

### 2.1 Theory

If the optimal actions are always uniformly distributed, then finding macro actions are meaningless, since there are exponential number of them.

What makes it meaningful is that there exists some macro actions in general MDP and we should be able to find them from the optimal solutions.

Here is a theorem that needs to be proven:

###### Theorem 1.

Given an MDP with known set of , and the dynamics , the optimal action sequence of an MDP clusters among family of reward distributions.

Example: A 2D maze with multiple rooms. A sparse reward at some room location. No matter where you place the reward, at optimal solution, initially you always need to leave your current room.

###### Proof.

Note that the value in the room is completely determined by the values on its interface with the remaining of the states. If the value ordering remains the same, so does the optimal policy within this room (except at the interface). ∎

Once you have the theory, we know that clustering makes sense: you first cluster the optimal trajectories into macro actions given a few reward distributions, then you can simply apply macro actions in the new reward distributions.

## 3 Experiments

### 3.1 Generating complicated expressions

[Put a few sentences saying how the expression is generated?][Make sure at least some of them are from real cases.]

### 3.2 Comparison against rule-based systems

Halide has a rule-based system that is well-engineered. [Explain the system a bit here]. Our RL agent now beats the rule-based system with policy network only (i.e., picking the most probable action from the network).

With search-based method, it is even better [we need some numbers here]

### 3.3 Extraction of Macro Actions

Using the principle of optimality, we were able to find patterns in the action sequence. Fig. shows some patterns.

### 3.4 Generalization capability of patterns

We define patterns as macro actions and apply these actions in the unseen simplification cases. Do we reduce the number of steps?

Faster Exploration Using macro actions, we can achieve much faster exploration and learn to solve more complicated problems (problem that involves a much deeper search tree). Show a few examples.