Structured Neural Network Dynamics for Model-based Control

08/03/2018 ∙ by Alexander Broad, et al. ∙ Northwestern University 0

We present a structured neural network architecture that is inspired by linear time-varying dynamical systems. The network is designed to mimic the properties of linear dynamical systems which makes analysis and control simple. The architecture facilitates the integration of learned system models with gradient-based model predictive control algorithms, and removes the requirement of computing potentially costly derivatives online. We demonstrate the efficacy of this modeling technique in computing autonomous control policies through evaluation in a variety of standard continuous control domains.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction and Background

The question of how to best generate autonomous control policies for mechanical systems is an important problem in robotics. Research in this field can be traced back to early work on optimal control by Pontryagin [20] and Bellman [5]. Since this time, significant progress has been made in both the theory and application of autonomous control techniques [22, 24]. However, challenges remain in developing strategies that are valid without a priori knowledge of the system dynamics.

One possible solution is to use model-free policy generation techniques [9]. These methods require no explicit model of the system dynamics and have been shown to be effective in numerous domains [14, 15]. However, model-free policy generation techniques often require massive amounts of data and are therefore difficult to evaluate on real-world robotic systems [9]. An alternative option is to learn an explicit model of the system dynamics which can be incorporated into an optimal control algorithm. Model-based control methods are more data-efficient and often easier to apply in real-world scenarios [3, 6]. However, many optimal control algorithms require some notion of derivatives to compute a control policy [2, 16, 18, 25].

Fig. 1: Structured neural network architecture for model-based predictive control. The final layer of the -subnet must be the same dimension as the state space, and the final layer of the -subnet must be the same dimension as the control space. The network then computes a function of the form . This structure makes it easy to recover time-varying derivatives (i.e., ) for use in model predictive control algorithms.

Computing the required derivatives can often prove challenging with complex modeling techniques like deep neural networks [4]. Additionally, these black-box methods make it difficult to analyze the underlying dynamics of the system. There are, of course, alternative modeling techniques [1, 3, 8, 11, 13, 15]; however, there remains a desire to incorporate modern, deep neural networks into the optimization loop due to their ability to model challenging dynamic features (e.g., contacts) and scale to high-dimensional tasks [12, 19, 27]. In this work, we provide a method that combines the expressive power of neural network models with gradient-based optimal control algorithms. Our solution is based on a neural network architecture that enforces a linear structure in the state and control space, making it easier to analyze and incorporate into model-based control.

Ii Structured Neural Networks for
Model Predictive Control

In this section, we define our structured neural network architecture and then detail how the learned models can be integrated into model-based control algorithms.

Ii-a Structured Neural Network Architecture

Our neural network architecture is composed of two parallel subnetworks (see Figure 1). The architecture of the first subnetwork (-subnet) can be defined by any number of layers and parameters, and is only constrained such that the final layer must have parameters, where is the dimension of the system’s state space. Similarly, the second subnetwork (-subnet) is only constrained such that the final layer must have parameters, where is the dimension of the system’s control space. The network then combines (1) the dot product of the output from the -subnet and the state , with (2) the dot product of the output from the -subnet and the control , through an element-wise add operation. This architecture describes a single, global model of the form , which is trained with standard gradient-based techniques and can be evaluated and linearized anywhere in the state space. Here, the -subnet represents the linearization of the dynamics model with respect to the state variables (i.e., ), and the -subnet represents the linearization of the dynamics model with respect to the control variables (i.e., ).

Ii-B Integration with Model-based Control

Given a learned dynamics model, one can compute autonomous control policies through data-driven methods [12] or through integration with optimal control algorithms [2, 16, 25]. On the optimal control side, researchers have mostly explored sampling-based optimization methods. For example, researchers have proposed computing control trajectories with a random shooting method [19] and with model predictive path integral [27] control. The reason that sampling-based methods are appealing in this domain is that the solution does not depend on computing potentially costly gradients with respect to the state and control variables. However, the solution does require generating a large number of samples to cover a sufficient portion of the action space. The challenge, then, is to balance the number of samples generated at each time-step with the rate of the control loop. As the dimensionality of the action space grows, this becomes more and more challenging.

In contrast with sampling-based methods, gradient-based optimization techniques provide an efficient method of computing control trajectories. Additionally, these methods provide sensitivity information in the form of time-varying Jacobians. However, integrating neural network models with these optimization techniques can prove difficult. This is because it is unclear a priori how to compute the necessary Jacobians (, and ). By enforcing a linear structure on the neural network architecture (as described in Section II-A), we can efficiently predict the evolution of the dynamic system as well as the required Jacobians. Then, to generate an autonomous policy, we solve the following optimal control problem

subject to

where is the learned, structured system dynamics, and are the running and terminal cost, and and are the set of valid control and state values. The solution of this problem is the control sequence that minimizes the cost.

(a) Mountain Car
(b) Cart-Pole
(c) Two-Link Arm
(d) Mountain Car
(e) Cart Pole
(f) Two Link Arm
Fig. 2: Subfigures (a), (b), and (c) are pictorial representations of our experimental environments. Subfigures (d), (e), and (f) are state diagrams demonstrating the efficacy of our model-based control algorithm.

Iii Experimental Validation

We validate the efficacy of our described approach through experimentation on three standard control domains. Our first experimental environment is OpenAI’s implementation of the continuous mountain car problem [7] (Figure 1(a)). The mountain car is defined by a two dimensional state space () and one dimensional control space (). The second experimental environment is an implementation of the classic cart-pole swing up problem written from scratch (Figure 1(b)). The cart-pole is defined by a four dimensional state space () and a one dimensional control space (). The final experimental environment is a two-link arm written in the Bullet physics engine and described in a related CMU course (Figure 1(c)). The two-link arm exists in a four dimensional state space () and is controlled with a two dimensional signal (). All three environments are defined with continuous-valued state and control spaces.

Iii-a Model Learning Details

In this section, we describe our data collection method and the training procedure.

Iii-A1 Data collection

We collect data through observation of trajectories produced by the system using control inputs sampled uniformly at random. The data is collected in tuples of (), where is computed as and is the timestep. For each environment, we collect 500 trajectories, which are terminated at either 500 steps or when the system violates environment boundary or safety conditions.

Iii-A2 Training the model

Given a dataset of tuples (), we train the dynamics model by minimizing the following error function


We use the Adam optimizer with a learning rate of 0.001. Half the data is used for training and half for validation. We find that no data preprocessing is necessary.

Iv Results

Our evaluation consists of state plots which demonstrate that our defined neural network architecture can be used to solve model-based control problems. Each example solution depicts the initial state of the system (the start of the state trajectory, which is chosen at random), the time-varying state produced by our model-based control algorithm (red and blue), and the goal state (black). In Figures 1(d)1(e)1(f), we relay a single solution for each experimental environment, however, we note that our algorithm produced successful control trajectories (with respect to the desired goal state) from a variety of initial conditions. Additionally, our approach was able to successfully generate control trajectories that reached arbitrary goal states in the two-link arm environment.

These results suggest that our structured neural network can be used to learn a global model of the system dynamics, while simultaneously enforcing linearization constraints that make it possible to recover time-varying derivatives without additional computation. In contrast to approximation methods (e.g., numerical differentiation) and symbolic methods (e.g., automatic differentiation), our approach can be thought of as a prediction method for computing the required time-varying derivatives. Related work in this area includes the transformation network proposed in [26] which directly predicts the parameters of an A and B matrix in a latent space. In contrast, our approach does not explicitly learn parameters of a matrix; instead we learn nonlinear mappings (A-subnet, B-subnet) that we treat as linearizations of the global model in the structure of our network network. This allows us to learn a global model of the system dynamics, while simultaneously enforcing linearization constraints. A related call for the use of structure in neural networks has been explored in model-free policy generation. In [23], researchers describe a network architecture that combines linear and nonlinear policies into a single control model. In our work, we instead enforce structure that mimics linear time-varying systems, and incorporate these models into optimal control algorithms.

Iv-a Why We Think This Works

In this work, we address the bottleneck associated with computing gradients of the system model through the application of a structured neural network that explicitly encodes linearization constraints and therefore reduces the computational complexity necessary to recover the required Jacobians. However, without further study, it is not clear whether or not the learned and

-subnetworks actually approximate the required time-varying derivatives. Experimental evidence suggests that the vectors represented by these networks are, at a minimum, pointing in the direction of the gradient. This claim is based on the fact that (1) our model-based control algorithm produces successful policies in a variety of control domains, and (2) when we incorporate the learned system model into an MPC algorithm, we treat the output of the subnetworks as first order derivatives of the system dynamics.

Iv-B Open Questions

We now pose a number of open questions that we plan to address in future work. In particular, we are interested in exploring how our structured neural network model compares with alternative methods of computing time-varying derivatives. One such solution is to use a finite differences method for numerical differentiation. From a practical standpoint, we note that this method is prone to round-off errors and is computationally expensive in an iterative, receding-horizon framework. Another solution is to use automatic differentiation [4]. This approach has been shown to work well, however it requires well formed expression graphs and derivatives computed at compile-time to work efficiently enough for online optimization [10]. In future work, we plan to compare and contrast these methods in high-dimensional control spaces.

V Conclusion

In this work, we propose a structured neural network that can be used to solve model-based control problems. The architecture makes it easy to integrate the learned models with gradient-based optimal control algorithms and simplifies the interpretation of a system model parameterized by a deep neural network. This idea is inline with other recent calls for simplification of data-driven control strategies such as [17, 21].