LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

09/16/2018
by   Robin Deits, et al.
0

Guided policy search is a popular approach for training controllers for high-dimensional systems, but it has a number of pitfalls. Non-convex trajectory optimization has local minima, and non-uniqueness in the optimal policy itself can mean that independently-optimized samples do not describe a coherent policy from which to train. We introduce LVIS, which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate the LVIS approach on a cart-pole system with walls and a planar humanoid robot model and show that it can be applied to a fundamentally hard problem in feedback control--control through contact.

READ FULL TEXT

page 5

page 6

research
04/17/2017

O^2TD: (Near)-Optimal Off-Policy TD Learning

Temporal difference learning and Residual Gradient methods are the most ...
research
09/27/2021

Non-prehensile Planar Manipulation via Trajectory Optimization with Complementarity Constraints

Contact adaption is an essential capability when manipulating objects. T...
research
04/19/2021

A Unified Framework for Multistage and Multilevel Mixed Integer Linear Optimization

We introduce a unified framework for the study of multilevel mixed integ...
research
10/19/2020

Robot Design With Neural Networks, MILP Solvers and Active Learning

Central to the design of many robot systems and their controllers is sol...
research
12/05/2019

A Clustering Approach to Edge Controller Placement in Software Defined Networks with Cost Balancing

In this work we introduce two novel deterministic annealing based cluste...
research
09/08/2021

Joint Search of Optimal Topology and Trajectory for Planar Linkages

We present an algorithm to compute planar linkage topology and geometry,...
research
11/30/2020

Testing for Uniqueness of Estimators

Uniqueness of the population value of an estimated descriptor is a stand...

Please sign up or login with your details

Forgot password? Click here to reset