Sebastian Trimpe

is this you? claim profile


  • Trajectory-Based Off-Policy Deep Reinforcement Learning

    Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

    05/14/2019 ∙ by Andreas Doerr, et al. ∙ 5 share

    read it

  • On the Design of LQR Kernels for Efficient Controller Learning

    Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

    09/20/2017 ∙ by Alonso Marco, et al. ∙ 0 share

    read it

  • Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

    PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.

    03/08/2017 ∙ by Andreas Doerr, et al. ∙ 0 share

    read it

  • Robust Gaussian Filtering using a Pseudo Measurement

    Many sensors, such as range, sonar, radar, GPS and visual devices, produce measurements which are contaminated by outliers. This problem can be addressed by using fat-tailed sensor models, which account for the possibility of outliers. Unfortunately, all estimation algorithms belonging to the family of Gaussian filters (such as the widely-used extended Kalman filter and unscented Kalman filter) are inherently incompatible with such fat-tailed sensor models. The contribution of this paper is to show that any Gaussian filter can be made compatible with fat-tailed sensor models by applying one simple change: Instead of filtering with the physical measurement, we propose to filter with a pseudo measurement obtained by applying a feature function to the physical measurement. We derive such a feature function which is optimal under some conditions. Simulation results show that the proposed method can effectively handle measurement outliers and allows for robust filtering in both linear and nonlinear systems.

    09/14/2015 ∙ by Manuel Wüthrich, et al. ∙ 0 share

    read it

  • Depth-Based Object Tracking Using a Robust Gaussian Filter

    We consider the problem of model-based 3D-tracking of objects given dense depth images as input. Two difficulties preclude the application of a standard Gaussian filter to this problem. First of all, depth sensors are characterized by fat-tailed measurement noise. To address this issue, we show how a recently published robustification method for Gaussian filters can be applied to the problem at hand. Thereby, we avoid using heuristic outlier detection methods that simply reject measurements if they do not match the model. Secondly, the computational cost of the standard Gaussian filter is prohibitive due to the high-dimensional measurement, i.e. the depth image. To address this problem, we propose an approximation to reduce the computational complexity of the filter. In quantitative experiments on real data we show how our method clearly outperforms the standard Gaussian filter. Furthermore, we compare its performance to a particle-filter-based tracking method, and observe comparable computational efficiency and improved accuracy and smoothness of the estimates.

    02/19/2016 ∙ by Jan Issac, et al. ∙ 0 share

    read it

  • Probabilistic Recurrent State-Space Models

    State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

    01/31/2018 ∙ by Andreas Doerr, et al. ∙ 0 share

    read it

  • Event-based State Estimation: An Emulation-based Approach

    An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-based design is shown to emulate the performance of a centralised state observer design up to guaranteed bounds, but with reduced communication. The stability results for state estimation are extended to the distributed control system that results when the local estimates are used for feedback control. Results from numerical simulations and hardware experiments illustrate the effectiveness of the proposed approach in reducing network communication.

    03/24/2017 ∙ by Sebastian Trimpe, et al. ∙ 0 share

    read it

  • Event-triggered Learning for Resource-efficient Networked Control

    Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents' behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents' states or measurements. Event-triggered learning is proposed herein as a novel concept to further reduce communication: whenever poor communication performance is detected, an identification experiment is triggered and an improved prediction model learned from data. Effective learning triggers are obtained by comparing the actual communication rate with the one that is expected based on the current model. By analyzing statistical properties of the inter-communication times and leveraging powerful convergence results, the proposed trigger is proven to limit learning experiments to the necessary instants. Numerical and physical experiments demonstrate that event-triggered learning improves robustness toward changing environments and yields lower communication rates than common ETSE.

    03/05/2018 ∙ by Friedrich Solowjow, et al. ∙ 0 share

    read it

  • Distributed Event-Based State Estimation for Networked Systems: An LMI-Approach

    In this work, a dynamic system is controlled by multiple sensor-actuator agents, each of them commanding and observing parts of the system's input and output. The different agents sporadically exchange data with each other via a common bus network according to local event-triggering protocols. From these data, each agent estimates the complete dynamic state of the system and uses its estimate for feedback control. We propose a synthesis procedure for designing the agents' state estimators and the event triggering thresholds. The resulting distributed and event-based control system is guaranteed to be stable and to satisfy a predefined estimation performance criterion. The approach is applied to the control of a vehicle platoon, where the method's trade-off between performance and communication, and the scalability in the number of agents is demonstrated.

    07/06/2017 ∙ by Michael Muehlebach, et al. ∙ 0 share

    read it

  • A Local Information Criterion for Dynamical Systems

    Encoding a sequence of observations is an essential task with many applications. The encoding can become highly efficient when the observations are generated by a dynamical system. A dynamical system imposes regularities on the observations that can be leveraged to achieve a more efficient code. We propose a method to encode a given or learned dynamical system. Apart from its application for encoding a sequence of observations, we propose to use the compression achieved by this encoding as a criterion for model selection. Given a dataset, different learning algorithms result in different models. But not all learned models are equally good. We show that the proposed encoding approach can be used to choose the learned model which is closer to the true underlying dynamics. We provide experiments for both encoding and model selection, and theoretical results that shed light on why the approach works.

    05/27/2018 ∙ by Arash Mehrjou, et al. ∙ 0 share

    read it

  • Learning an Approximate Model Predictive Controller with Guarantees

    A supervised learning framework is proposed to approximate a model predictive controller (MPC) with reduced computational complexity and guarantees on stability and constraint satisfaction. The framework can be used for a wide class of nonlinear systems. Any standard supervised learning technique (e.g. neural networks) can be employed to approximate the MPC from samples. In order to obtain closed-loop guarantees for the learned MPC, a robust MPC design is combined with statistical learning bounds. The MPC design ensures robustness to inaccurate inputs within given bounds, and Hoeffding's Inequality is used to validate that the learned MPC satisfies these bounds with high confidence. The result is a closed-loop statistical guarantee on stability and constraint satisfaction for the learned MPC. The proposed learning-based MPC framework is illustrated on a nonlinear benchmark problem, for which we learn a neural network controller with guarantees.

    06/11/2018 ∙ by Michael Hertneck, et al. ∙ 0 share

    read it