An Optimal Control Approach to Sequential Machine Teaching

10/15/2018 ∙ by Laurent Lessard, et al. ∙ 0

Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for optimality of a training sequence. We present analytic, structural, and numerical implications of this approach on a case study with a least-squares loss function and gradient descent learner. We compute optimal training sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine teaching studies optimal control on machine learners (Zhu et al., 2018; Zhu, 2015)

. In controls language the plant is the learner, the state is the model estimate, and the input is the (not necessarily

) training data. The controller wants to use the least number of training items—a concept known as the teaching dimension (Goldman and Kearns, 1995)—to force the learner to learn a target model. For example, in adversarial learning, an attacker may minimally poison the training data to force a learner to learn a nefarious model (Biggio et al., 2012; Mei and Zhu, 2015). Conversely, a defender may immunize the learner by injecting adversarial training examples into the training data (Goodfellow et al., 2014). In education systems, a teacher may optimize the training curriculum to enhance student (modeled as a learning algorithm) learning (Sen et al., 2018; Patil et al., 2014).

Machine teaching problems are either batch or sequential depending on the learner. The majority of prior work studied batch machine teaching, where the controller performs one-step control by giving the batch learner an input training set

. Modern machine learning, however, extensively employs sequential learning algorithms. We thus study sequential machine teaching: what is the shortest training sequence to force a learner to go from an initial model

to some target model ? Formally, at time the controller chooses input from an input set . The learner then updates the model according to its learning algorithm. This forms a dynamical system :

(1a)
The controller has full knowledge of , and wants to minimize the terminal time subject to .

As a concrete example, we focus on teaching a gradient descent learner of least squares:

(1b)

with and the input set . We caution the reader not to trivialize the problem: (1) is a nonlinear dynamical system due to the interaction between and . A previous best attempt to solve this control problem by Liu et al. (2017) employs a greedy control policy, which at step optimizes to minimize the distance between and . One of our observations is that this greedy policy can be substantially suboptimal. Figure 1 shows three teaching problems and the number of steps to arrive at using different methods. Our optimal control method NLP found shorter teaching sequences compared to the greedy policy (lengths 151, 153, 259 for NLP vs 219, 241, 310 for GREEDY, respectively). This and other experiments are discussed in Section 4.

Figure 1: The shortest teaching trajectories found by different methods. All teaching tasks use the terminal point . The initial points used are (left panel), (middle panel), and (right panel). The learner is the least squares gradient descent algorithm (1) with and . Total steps to arrive at is indicated in the legends.

1.1 Main Contributions

Our main contribution is to show how tools from optimal control theory may be brought to bear on the machine teaching problem. Specifically, we show that:

  1. [itemsep=1mm,topsep=1mm,parsep=1mm]

  2. The Pontryagin optimality conditions reveal deep structural properties of optimal teaching sequences. For example, we show that the least-squares case (1) is fundamentally a 2D problem and we provide a structural characterization of solutions. These results are detailed in Section 3.

  3. Optimal teaching sequences can be vastly more efficient than what may be obtained via common heuristics. We present two optimal approaches: an exact method (NLP) and a continuous approximation (CNLP). Both agree when the stepsize is small, but CNLP is more scalable because its runtime does not depend on the length of the training sequence. These results are shown in Section 4.

We begin with a survey of the relevant optimal control theory and algorithms literature in Section 2.

2 Time-Optimal Control

To study the structure of optimal control we consider the continuous gradient flow approximation of gradient descent, which holds in the limit . In this section, we present the corresponding canonical time-optimal control problem and summarize some of the key theoretical and computational tools that have been developed over the past 60 years to address it. For a more detailed exposition on the theory, we refer the reader to modern references on the topic (Kirk, 2012; Liberzon, 2011; Athans and Falb, 2013).

This section is self-contained and we will use notation consistent with the control literature ( instead of , instead of , instead of ). We revert back to machine learning notation in section 3. Consider the following boundary value problem:

(2)

The function is called the state and is called the input. Here, is a given constraint set that characterizes admissible inputs. The initial and terminal states and are fixed, but the terminal time is free. If an admissible together with a state satisfy the boundary value problem (2) for some choice of , we call a trajectory of the system. The objective in a time-optimal control problem is to find an optimal trajectory, which is a trajectory that has minimal .

Established approaches for solving time-optimal control problems can be grouped in three broad categories: dynamic programming, indirect methods, and direct methods. We now summarize each approach.

2.1 Dynamic Programming

Consider the value function , where is the minimum time required to reach starting at the initial state . The Hamilton–Jacobi–Bellman (HJB) equation gives necessary and sufficient conditions for optimality and takes the form:

(3)

together with the boundary condition . If the solution to this differential equation is , then the optimal input is given by the minimizer:

(4)

A nice feature of this solution is that the optimal input depends on the current state . In other words, HJB produces an optimal feedback policy.

Unfortunately, the HJB equation (3) is generally difficult to solve. Even if the minimization has a closed form solution, the resulting differential equation is often intractable. We remark that the optimal may not be differentiable. For this reason, one looks for so-called viscosity solutions, as described by Liberzon (2011); Tonon et al. (2017) and references therein.

Numerical approaches for solving HJB include the fast-marching method (Tsitsiklis, 1995) and Lax–Friedrichs sweeping (Kao et al., 2004). The latter reference also contains a detailed survey of other numerical schemes.

2.2 Indirect Methods

Also known as “optimize then discretize”, indirect approaches start with necessary conditions for optimality obtained via the Pontryagin Maximum Principle (PMP). The PMP may be stated and proved in several different ways, most notably using the Hamiltonian formalism from physics or using the calculus of variations. Here is a formal statement.

[PMP] Consider the boundary value problem (2) where and its Jacobian with respect to are continuous on . Define the Hamiltonian as . If is an optimal trajectory, then there exists some function (called the “co-state”) such that the following conditions hold.

  1. [label=),itemsep=1mm,topsep=1mm,parsep=1mm]

  2. and satisfy the following system of differential equations for with boundary conditions and .

    (5a)
    (5b)
  3. For all , an optimal input satisfies:

    (6)
  4. Zero Hamiltonian along optimal trajectories:

    (7)

In comparison to HJB, which needs to be solved for all , the PMP only applies along optimal trajectories. Although the differential equations (5) may still be difficult to solve, they are simpler than the HJB equation and therefore tend to be more amenable to both analytical and numerical approaches. Solutions to HJB and PMP are related via .

PMP is only necessary for optimality, so solutions of (5)–(7) are not necessarily optimal. Moreover, PMP does not produce a feedback policy; it only produces optimal trajectory candidates. Nevertheless, PMP can provide useful insight, as we will explore in Section 3.

If PMP cannot be solved analytically, a common numerical approach is the shooting method, where we guess , propagate the equations (5)–(6) forward via numerical integration. Then is refined and the process is repeated until the trajectory reaches .

2.3 Direct Methods

Also known as “discretize then optimize”, a sparse nonlinear program is solved, where the variables are the state and input evaluated at a discrete set of timepoints. An example is collocation methods

, which use different basis functions such as piecewise polynomials to interpolate the state between timepoints. For contemporary surveys of direct and indirect numerical approaches, see

Rao (2009); Betts (2010).

If the dynamics are already discrete as in (1), we may directly formulate a nonlinear program. We refer to this approach as NLP. Alternatively, we can take the continuous limit and then discretize, which we call CNLP. We discuss the advantages and disadvantages of both approaches in Section 4.

3 Teaching Least Squares: Insight From Pontryagin

In this section, we specialize time-optimal control to least squares. To recap, our goal is to find the minimum number of steps such that there exists a control sequence that drives the learner (1) with initial state to the target state . The constraint set is . This is an nonlinear discrete-time time-optimal control problem, for which no closed-form solution is available.

On the corresponding continuous-time control problem, applying Theorem 2.2 we obtain the following necessary conditions for optimality111State, co-state, and input in Theorem 2.2 are , which is conventional controls notation. For this problem, we use , which is machine learning notation. for all .

(8a)
(8b)
(8c)
(8d)
(8e)

We can simplify (8) by setting , as described in Proposition 3 below.

For any trajectory satisfying (8), there exist another trajectory of the form . So we may set without any loss of generality.

Proof.

Since (8d) is linear in , the optimal occurs at a boundary and . Changing the sign of is equivalent to changing the sign of , so we may assume without loss of generality that . These changes leave (8b)–(8c) and (8e) unchanged so and are unchanged as well. ∎

In fact, Proposition 3 holds if we consider trajectories of (1) as well. For a proof, see the appendix.

Applying Proposition 3, the conditions (8d) and (8e) may be combined to yield the following quadratically constrained quadratic program (QCQP) equation.

(9)

where we have omitted the explicit time specification for clarity. Note that (9) constrains the possible tuples that can occur as part of an optimal trajectory. So in addition to solving the left-hand side to find , we must also ensure that it’s equal to . We will now characterize the solutions of (9) by examining five distinct regimes of the solution space that depend on the relationship between and as well as which regime transitions are admissible.

Regime I (Origin): and .

This regime happens when the teaching trajectory pass through the origin. In this regime, one can obtain closed-form solutions. In particular, and . In this regime, both and are positively aligned with . Therefore, Regime I necessarily transitions from Regime II and into Regime III, given that it is not at the beginning or the end of the teaching trajectory.

Regime II (positive alignment): with and .

This regime happens when and are positively aligned. Again we have closed form solutions. In particular, and . In this regime, both and are negatively aligned with , thus Regime II necessarily transitions into Regime I and can never transition from any other regimes.

Regime III (negative alignment inside the origin-centered ball): with and and .

This regime happens when and are negatively aligned and is inside the ball centered at the origin with radius . Again, closed form solutions exists: and . Regime III necessarily transitions from Regime I and into Regime IV.

Regime IV (negative alignment out of the origin-centered ball): with and and .

In this case, the solutions satisfies so that is uniquely determined by . However, the optimal is not unique. Any solution to with can be chosen. Regime IV can only transition from Regime III and cannot transition into any other regime. In other word, once the teaching trajectory enters Regime IV, it cannot escape. Another interesting property of Regime IV is that we know exactly how fast the norm of is changing. In particular, knowing , one can derive that . As a result, once the trajectory enters regime IV, we know exact how long it will take for the trajectory to reach , if it is able to reach it.

Regime V (general positions): and are linearly independent.

This case covers the remaining possibilities for the state and co-state variables. To characterize the solutions in this regime, we’ll first introduce some new coordinates. Define to be the orthonormal basis for such that and for some . Note that because we assume and are assumed to be linearly independent in this regime. We can therefore express any input uniquely as where is an out-of-plane

unit vector orthogonal to both

and , and are suitably chosen. Substituting these definitions, (9) becomes

(10)

Now observe that the objective is linear in and does not depend on . The objective is linear in because and otherwise the entire objective would be zero. Since the feasible set is convex, the optimal must occur at the boundary of the feasible set of variables and . Therefore, . This is profound, because it implies that in Regime V, the optimal solution necessarily lies on the 2D plane . In light of this fact, we can pick a more convenient parametrization. Let and . Equation (10) becomes:

(11)

This objective function has at most four critical points, of which there is only one global minimum, and we can find it numerically. Last but not least, Regime V does not transition from or into any other Regime.

II

I

III

IV

V

Figure 2: Optimal trajectories for for different choices of . Trajectories are colored according to the regime to which they belong and the directed graph above shows all possible transitions. The optimal trajectories are symmetric about the -axis. For implementation details, see Section 4.

Intrinsic low-dimensional structure of the optimal control solution.

As is hinted in the analysis of Regime V, the optimal control sometimes lies in the 2D subspace spanned by and . In fact, this holds not only for Regime V but for the whole problem. In particular, we make the following observation. There always exists a global optimal trajectory of (8) that lies in a 2D subspace of . The detailed proof can be found in the appendix. An immediate consequence of Theorem 3 is that if and are linearly independent, we only need to consider trajectories that are confined to the subspace . When and are aligned, trajectories are still 2D, and any subspace containing and is equivalent and arbitrary choice can be made.

This insight is extremely important because it enables us to restrict our attention to 2D trajectories even though the dimensionality of the original problem () may be huge. This allows us to not only obtain a more elegant and accurate solution in solving the necessary condition induced by PMP, but also to parametrize direct and indirect approaches (see Sections 2.2 and 2.3) to solve this intrinsically 2D problem more efficiently.

Figure 3: Trajectories found using a shooting approach (Section 2.2) with and . Gray curves show different shooting trajectories while the blue and orange curves show two trajectories that satisfy the necessary conditions for optimality (8). Markers show intervals of seconds, which is roughly 50 steps when using a stepsize of .

Multiplicity of Solution Candidates.

The PMP conditions are only necessary for optimality. Therefore, the optimality conditions (8) need not have a unique solution. We illustrate this phenomenon in Figure 3. We used a shooting approach (Section 2.2) to propagate different choices of forward in time. It turns out two choices lead to trajectories that end at , and they do not have equal total times. So in general, PMP identifies optimal trajectory candidates, which can be thought of as local minima for this highly nonlinear optimization problem.

4 Numerical Methods

While the PMP yields necessary conditions for time-optimal control as detailed in Section 3, there is no closed-form solution in general. We now present and discuss four numerical methods: CNLP and NLP are different implementations of time-optimal control, while GREEDY and STRAIGHT are heuristics.

Cnlp:

This approach solves the continuous gradient flow limit of the machine teaching problem using a direct approach (Section 2.3). Specifically, we used the NLOptControl package (Febbo, 2017), which is an implementation of the -pseudospectral method GPOPS-II (Patterson and Rao, 2014) written in the Julia programming language using the JuMP modeling language (Dunning et al., 2017) and the IPOPT interior-point solver (Wächter and Biegler, 2006). The main tuning parameters for this software are the integration scheme and the number of mesh points. We selected the trapezoidal integration rule with mesh points for most simulations. We used CNLP to produce the trajectories in Figures 1 and 2.

Nlp:

A naïve approach to optimal control is to find the minimum for which there is a feasible input sequence to drive the learner to . Fixing , the feasibility subproblem is a nonlinear program over -dimensional variables and constrained by learner dynamics. Recall is given, and one can fix for all by Proposition 3. For our learner (1), the feasibility problem is

(12)
s.t.

As in the CNLP case, we modeled and solved the subproblems (12) using JuMP and IPOPT. We also tried Knitro, a state-of-the-art commercial solver (Byrd et al., 2006), and it produced similar results. We stress that such feasibility problems are difficult; IPOPT and Knitro can handle moderately sized . For our specific learner (1) there are 2D optimal control and state trajectories in as discussed in Section 3. Therefore, we reparameterized (12) to work in 2D.

On top of this, we run a binary search over positive integers to find the minimum for which the subproblem (12) is feasible. Subject to solver numerical stability, the minimum and its feasibility solution is the time-optimal control. While NLP is conceptually simple and correct, it requires solving many subproblems with variables and constraints, making it less stable and scalable than CNLP.

Greedy:

We restate the greedy control policy initially proposed by Liu et al. (2017). It has the advantage of being computationally more efficient and readily applicable to different learning algorithms (i.e. dynamics). Specifically for the least squares learner (1) and given the current state , GREEDY solves the following optimization problem to determine the next teaching example :

(13)
s.t.

The procedure repeats until . We used the Matlab function fmincon to solve the above quadratic program iteratively. We point out that the optimization problem is not convex. Moreover, does not necessarily point in the direction of . This is evident in Figure 1 and Figure 6.

Straight:

We describe an intuitive control policy: at each step, move in straight line toward as far as possible subject to the constraint . This policy is less greedy than GREEDY because it may not reduce as much at each step. The per-step optimization in is a 1D line search:

(14)
s.t.

The line search (14) can be solved in closed-form. In particular, one can obtain that

4.1 Comparison of Methods

Figure 4: Comparison of CNLP vs NLP. All teaching tasks use the terminal point . The initial points used are (left panel), (middle panel), and (right panel). We observe that the NLP trajectories on learners with smaller ’s quickly converges to the CNLP trajectory.

We ran a number of experiments to study the behavior of these numerical methods. In all experiments, the learner is gradient descent on least squares (1), and the control constraint set is . Our first observation is that CNLP has a number of advantages:

  1. CNLP’s continuous optimal state trajectory matches NLP’s discrete state trajectories, especially on learners with small . This is expected, since the continuous optimal control problem is obtained asymptotically from the discrete one as . Figure 4 shows the teaching task . Here we compare CNLP with NLP’s optimal state trajectories on four gradient descent learners with different values. The NLP optimal teaching sequences vary drastically in length , but their state trajectories quickly overlap with CNLP’s optimal trajectory.

  2. CNLP is quick to compute, while NLP runtime grows as the learner’s decreases. Table 1 presents the wall clock time. With a small , the optimal control takes more steps (larger ). Consequently, NLP must solve a nonlinear program with more variables and constraints. In contrast, CNLP’s runtime does not depend on .

  3. CNLP can be used to approximately compute the “teaching dimension”, i.e. the minimum number of sequential teaching steps for the discrete problem. Recall CNLP produces an optimal terminal time . When the learner’s is small, the discrete “teaching dimension” is related by . This is also supported by Table 1.

That said, it is not trivial to extract a discrete control sequence from CNLP’s continuous control function. This hinders CNLP’s utility as an optimal teacher.

NLP CNLP
0.02 0.001
75 1499
0.013s 0.14s 59.37s 4.1s
76 1519
0.008s 0.11s 53.28s 2.37s
128 2570
0.012s 0.63s 310.08s 2.11s
Table 1: Teaching sequence length and wall clock time comparison. NLP teaches three learners with different ’s. Target is always . All experiments were performed on a conventional laptop.
NLP STRAIGHT GREEDY
148 161 233
221 330 721
292 867 2667
346 2849 10581
Table 2: Comparison of teaching sequence length . We fixed in all cases.
Figure 5: Points reachable in one step of gradient descent (with ) on a least-squares objective starting from each of the black dots. There is circular symmetry about the origin (red dot).
Figure 6: Reachable sets along the trajectory of NLP (left panel) and GREEDY (right panel). To minimize clutter, we only show every reachable set. For this simulation, we used . The greedy approach makes fast progress initially, but slows down later on.
Figure 7: Trajectories of the input sequence for GREEDY, STRAIGHT, and NLP methods and the corresponding for CNLP. The teaching task is , , and . Markers show every 10 steps. Input constraint is .

Our second observation is that NLP, being the discrete-time optimal control, produces shorter teaching sequences than GREEDY or STRAIGHT. This is not surprising, and we have already presented three teaching tasks in Figure 1 where NLP has the smallest . In fact, there exist teaching tasks on which GREEDY and STRAIGHT can perform arbitrarily worse than the optimal teaching sequence found by NLP. A case study is presented in Table 2. In this set of experiments, we set and . As increases, the ratio of teaching sequence length between STRAIGHT and NLP and between GREEDY and NLP grow at an exponential rate.

We now dig deeper and present an intuitive explanation of why GREEDY requires more teaching steps than NLP. The fundamental issue is the nonlinearity of the learner dynamics (1) in . For any let us define the one-step reachable set . Figure 5 shows a sample of such reachable sets. The key observation is that the starting is quite close to the boundary of most reachable sets. In other words, there is often a compressed direction—from to the closest boundary of —along which makes minimal progress. The GREEDY scheme falls victim to this phenomenon.

Figure 6 compares NLP and GREEDY on a teaching task chosen to have short teaching sequences in order to minimize clutter. GREEDY starts by eagerly descending a slope and indeed this quickly brings it closer to . Unfortunately, it also arrived at the -axis. For on the -axis, the compressed direction is horizontally outward. Therefore, subsequent GREEDY moves are relatively short, leading to a large number of steps to reach . Interestingly, STRAIGHT is often better than GREEDY because it also avoids the -axis compressed direction for general .

We illustrate the optimal inputs in Figure 7, which compares produced by STRAIGHT, GREEDY, and NLP and the produced by CNLP. The heuristic approaches eventually take smaller-magnitude steps as they approach while NLP and CNLP maintain a maximal input norm the whole way.

5 Concluding Remarks

Techniques from optimal control are under-utilized in machine teaching, yet they have the power to provide better quality solutions as well as useful insight into their structure.

As seen in Section 3, optimal trajectories for the least squares learner are fundamentally 2D. Moreover, there is a taxonomy of regimes that dictates their behavior. We also saw in Section 4 that the continuous CNLP solver can provide a good approximation to the true discrete trajectory when is small. CNLP is also more scalable than simply solving the discrete NLP directly because NLP becomes computationally intractable as gets large (or gets small), whereas the runtime of CNLP is independent of .

A drawback of both NLP and CNLP is that they produce trajectories rather than policies. In practice, using an open-loop teaching sequence will not yield the we expect due to the accumulation of small numerical errors as we iterate. In order to find a control policy, which is a map from state to input , we discussed the possibility of solving HJB (Section 2.1) which is computationally expensive.

An alternative to solving HJB is to pre-compute the desired trajectory via CNLP and then use model-predictive control (MPC) to find a policy that tracks the reference trajectory as closely as possible. Such an approach is used in Liniger et al. (2015), for example, to design controllers for autonomous race cars, and would be an interesting avenue of future work for the machine teaching problem.

Finally, this paper presents only a glimpse at what is possible using optimal control. For example, the PMP is not restricted to merely solving time-optimal control problems. It is possible to analyze problems with state- and input-dependent running costs, state and input pointwise or integral constraints, conditional constraints, and even problems where the goal is to reach a target set rather than a target point.

References

  • Athans and Falb (2013) M. Athans and P. L. Falb. Optimal control: An introduction to the theory and its applications. Courier Corporation, 2013.
  • Betts (2010) J. T. Betts. Practical methods for optimal control and estimation using nonlinear programming, volume 19. SIAM, 2010.
  • Biggio et al. (2012) B. Biggio, B. Nelson, and P. Laskov.

    Poisoning attacks against support vector machines.

    In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning (ICML). Omnipress, Omnipress, 2012.
  • Byrd et al. (2006) R. H. Byrd, J. Nocedal, and R. A. Waltz. Knitro: An integrated package for nonlinear optimization. In Large Scale Nonlinear Optimization, 35–59, 2006, pages 35–59. Springer Verlag, 2006.
  • Dunning et al. (2017) I. Dunning, J. Huchette, and M. Lubin.

    Jump: A modeling language for mathematical optimization.

    SIAM Review, 59(2):295–320, 2017. doi: 10.1137/15M1020575.
  • Febbo (2017) H. Febbo. Nloptcontrol.jl, 2017. URL :
    https://github.com/JuliaMPC/NLOptControl.jl
    .
  • Goldman and Kearns (1995) S. Goldman and M. Kearns. On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1):20–31, 1995.
  • Goodfellow et al. (2014) I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv e-prints, Dec. 2014.
  • Kao et al. (2004) C. Y. Kao, S. Osher, and J. Qian. Lax–Friedrichs sweeping scheme for static Hamilton–Jacobi equations. Journal of Computational Physics, 196(1):367–391, 2004.
  • Kirk (2012) D. E. Kirk. Optimal control theory: An introduction. Courier Corporation, 2012.
  • Liberzon (2011) D. Liberzon. Calculus of variations and optimal control theory: A concise introduction. Princeton University Press, 2011.
  • Liniger et al. (2015) A. Liniger, A. Domahidi, and M. Morari. Optimization-based autonomous racing of 1:43 scale RC cars. Optimal Control Applications and Methods, 36(5):628–647, 2015.
  • Liu et al. (2017) W. Liu, B. Dai, A. Humayun, C. Tay, C. Yu, L. B. Smith, J. M. Rehg, and L. Song. Iterative machine teaching. In International Conference on Machine Learning, pages 2149–2158, 2017.
  • Mei and Zhu (2015) S. Mei and X. Zhu. Using machine teaching to identify optimal training-set attacks on machine learners. In

    The Twenty-Ninth AAAI Conference on Artificial Intelligence

    , 2015.
  • Patil et al. (2014) K. Patil, X. Zhu, L. Kopec, and B. Love. Optimal teaching for limited-capacity human learners. In Advances in Neural Information Processing Systems (NIPS), 2014.
  • Patterson and Rao (2014) M. A. Patterson and A. V. Rao. GPOPS-II: A MATLAB software for solving multiple-phase optimal control problems using hp-adaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Transactions on Mathematical Software (TOMS), 41(1):1, 2014.
  • Rao (2009) A. V. Rao. A survey of numerical methods for optimal control. Advances in the Astronautical Sciences, 135(1):497–528, 2009.
  • Sen et al. (2018) A. Sen, P. Patel, M. A. Rau, B. Mason, R. Nowak, T. T. Rogers, and X. Zhu. Machine beats human at sequencing visuals for perceptual-fluency practice. In Educational Data Mining, 2018.
  • Tonon et al. (2017) D. Tonon, M. S. Aronna, and D. Kalise. Optimal control: Novel directions and applications, volume 2180. Springer, 2017.
  • Tsitsiklis (1995) J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control, 40(9):1528–1538, 1995.
  • Wächter and Biegler (2006) A. Wächter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
  • Zhu (2015) X. Zhu. Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track), 2015.
  • Zhu et al. (2018) X. Zhu, A. Singla, S. Zilles, and A. N. Rafferty. An Overview of Machine Teaching. ArXiv e-prints, Jan. 2018. https://arxiv.org/abs/1801.05927.

References

  • Athans and Falb (2013) M. Athans and P. L. Falb. Optimal control: An introduction to the theory and its applications. Courier Corporation, 2013.
  • Betts (2010) J. T. Betts. Practical methods for optimal control and estimation using nonlinear programming, volume 19. SIAM, 2010.
  • Biggio et al. (2012) B. Biggio, B. Nelson, and P. Laskov.

    Poisoning attacks against support vector machines.

    In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning (ICML). Omnipress, Omnipress, 2012.
  • Byrd et al. (2006) R. H. Byrd, J. Nocedal, and R. A. Waltz. Knitro: An integrated package for nonlinear optimization. In Large Scale Nonlinear Optimization, 35–59, 2006, pages 35–59. Springer Verlag, 2006.
  • Dunning et al. (2017) I. Dunning, J. Huchette, and M. Lubin.

    Jump: A modeling language for mathematical optimization.

    SIAM Review, 59(2):295–320, 2017. doi: 10.1137/15M1020575.
  • Febbo (2017) H. Febbo. Nloptcontrol.jl, 2017. URL :
    https://github.com/JuliaMPC/NLOptControl.jl
    .
  • Goldman and Kearns (1995) S. Goldman and M. Kearns. On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1):20–31, 1995.
  • Goodfellow et al. (2014) I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv e-prints, Dec. 2014.
  • Kao et al. (2004) C. Y. Kao, S. Osher, and J. Qian. Lax–Friedrichs sweeping scheme for static Hamilton–Jacobi equations. Journal of Computational Physics, 196(1):367–391, 2004.
  • Kirk (2012) D. E. Kirk. Optimal control theory: An introduction. Courier Corporation, 2012.
  • Liberzon (2011) D. Liberzon. Calculus of variations and optimal control theory: A concise introduction. Princeton University Press, 2011.
  • Liniger et al. (2015) A. Liniger, A. Domahidi, and M. Morari. Optimization-based autonomous racing of 1:43 scale RC cars. Optimal Control Applications and Methods, 36(5):628–647, 2015.
  • Liu et al. (2017) W. Liu, B. Dai, A. Humayun, C. Tay, C. Yu, L. B. Smith, J. M. Rehg, and L. Song. Iterative machine teaching. In International Conference on Machine Learning, pages 2149–2158, 2017.
  • Mei and Zhu (2015) S. Mei and X. Zhu. Using machine teaching to identify optimal training-set attacks on machine learners. In

    The Twenty-Ninth AAAI Conference on Artificial Intelligence

    , 2015.
  • Patil et al. (2014) K. Patil, X. Zhu, L. Kopec, and B. Love. Optimal teaching for limited-capacity human learners. In Advances in Neural Information Processing Systems (NIPS), 2014.
  • Patterson and Rao (2014) M. A. Patterson and A. V. Rao. GPOPS-II: A MATLAB software for solving multiple-phase optimal control problems using hp-adaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Transactions on Mathematical Software (TOMS), 41(1):1, 2014.
  • Rao (2009) A. V. Rao. A survey of numerical methods for optimal control. Advances in the Astronautical Sciences, 135(1):497–528, 2009.
  • Sen et al. (2018) A. Sen, P. Patel, M. A. Rau, B. Mason, R. Nowak, T. T. Rogers, and X. Zhu. Machine beats human at sequencing visuals for perceptual-fluency practice. In Educational Data Mining, 2018.
  • Tonon et al. (2017) D. Tonon, M. S. Aronna, and D. Kalise. Optimal control: Novel directions and applications, volume 2180. Springer, 2017.
  • Tsitsiklis (1995) J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control, 40(9):1528–1538, 1995.
  • Wächter and Biegler (2006) A. Wächter and L. T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
  • Zhu (2015) X. Zhu. Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track), 2015.
  • Zhu et al. (2018) X. Zhu, A. Singla, S. Zilles, and A. N. Rafferty. An Overview of Machine Teaching. ArXiv e-prints, Jan. 2018. https://arxiv.org/abs/1801.05927.

6 Appendix

Proof of modified Proposition 3.

In this version, we assume is a trajectory of (1) rather than being a trajectory of (8).

All we need to show is that for any pair of , there exist another pair , such that they give the same update. In particular, we set and show that there always exists an such that

This simplifies to

(15)

The discriminant of the quadratic (15) is

So there always exists a solution . Moreover, and , so there must be a real root in .

Proof of Theorem 3.

We showed in Section 3 that Regime V trajectories are 2D. We also argued that solutions that reach via Regime III–IV are not unique and need not be 2D. We will now show that it’s always possible to construct a 2D solution.

We begin by characterizing the set of reachable via Regime III–IV. Recall from Section 3 that the transition between III and IV occurs when . If is the time at which this transition occurs, then for , the solution is , which leads to a straight-line trajectory from to .

Now consider the part of the trajectory in Regime IV, where . As derived in Section 3, Regime IV trajectories satisfy . These lead to , which means that grows at the same rate regardless of . If our trajectory reaches , then we can deduce via integration that

(16)

Suppose for is a trajectory that reaches . Refer to Figure 8. The reachable set at time is a spherical sector whose boundary requires a trajectory that maximizes curvature. We will now derive this fact.

III

IV

0

Figure 8: If a reachable is contained in the concave funnel shape, which is the reachable set in Regime IV, it can be reached by some trajectory lying entirely in the 2D subspace defined by : follow the max-curvature solution until and then transition to a radial solution until .

Let be the largest possible angle between and any reachable , where we have fixed . Define to be the angle between and .

An alternative expression for this rate of change is the projection of onto the orthogonal complement of :

Where we used the fact that in Regime IV. Now,

(17)

In the final step, we maximized over . Notice that the integrand (17) is an upper bound that only depends on and but not on . One can also verify that this upper bound is achieved by the choice

where and is any vector that satisfies (16) with angle with . Any with this norm but angle can also be reached by using the max-curvature control until time , where is chosen such that , and then using for . This piecewise path is illustrated in Figure 8.

Our constructed optimal trajectory lies in the 2D span of and . This shows that all reachable can be reached via a 2D trajectory.