
Automata Guided Hierarchical Reinforcement Learning for Zeroshot Skill Composition
An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large amount of interactions with the environ ment in order to master a skill. The learned skill usually generalizes poorly across domains and retraining is often necessary when presented with a new task. We present a framework that combines methods in formal methods with hierarchi cal reinforcement learning (HRL). The set of techniques we provide allows for convenient specification of tasks with complex logic, learn hierarchical policies (metacontroller and lowlevel controllers) with welldefined intrinsic rewards us ing any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.
10/31/2017 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a realvalued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a modelfree learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.
09/27/2017 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it

Reinforcement Learning With Temporal Logic Rewards
Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the de sired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toastplacing task learned by a Baxter robot.
12/11/2016 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it

A Hierarchical Reinforcement Learning Method for Persistent TimeSensitive Tasks
Reinforcement learning has been applied to many interesting problems such as the famous TDgammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are timesensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training cases
06/20/2016 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it

A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems
We propose a technique to detect and generate patterns in a network of locally interacting dynamical systems. Central to our approach is a novel spatial superposition logic, whose semantics is defined over the quadtree of a partitioned image. We show that formulas in this logic can be efficiently learned from positive and negative examples of several types of patterns. We also demonstrate that pattern detection, which is implemented as a model checking algorithm, performs very well for test data sets different from the learning sets. We define a quantitative semantics for the logic and integrate the model checking algorithm with particle swarm optimization in a computational framework for synthesis of parameters leading to desired patterns in reactiondiffusion systems.
09/12/2014 ∙ by Ebru Aydin Gol, et al. ∙ 0 ∙ shareread it

Technical Report: Distribution Temporal Logic: Combining Correctness with Quality of Estimation
We present a new temporal logic called Distribution Temporal Logic (DTL) defined over predicates of belief states and hidden states of partially observable systems. DTL can express properties involving uncertainty and likelihood that cannot be described by existing logics. A cosafe formulation of DTL is defined and algorithmic procedures are given for monitoring executions of a partially observable Markov decision process with respect to such formulae. A simulation case study of a rescue robotics application outlines our approach.
09/09/2013 ∙ by Austin Jones, et al. ∙ 0 ∙ shareread it

Automata Guided Reinforcement Learning With Demonstrations
Tasks with complex temporal structures and long horizons pose a challenge for reinforcement learning agents due to the difficulty in specifying the tasks in terms of reward functions as well as large variances in the learning signals. We propose to address these problems by combining temporal logic (TL) with reinforcement learning from demonstrations. Our method automatically generates intrinsic rewards that align with the overall task goal given a TL task specification. The policy resulting from our framework has an interpretable and hierarchical structure. We validate the proposed method experimentally on a set of robotic manipulation tasks.
09/17/2018 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it

Metrics for Signal Temporal Logic Formulae
Signal Temporal Logic (STL) is a formal language for describing a broad range of realvalued, temporal properties in cyberphysical systems. While there has been extensive research on verification and control synthesis from STL requirements, there is no formal framework for comparing two STL formulae. In this paper, we show that under mild assumptions, STL formulae admit a metric space. We propose two metrics over this space based on i) the PompeiuHausdorff distance and ii) the symmetric difference measure, and present algorithms to compute them. Alongside illustrative examples, we present applications of these metrics for two fundamental problems: a) design quality measures: to compare all the temporal behaviors of a designed system, such as a synthetic genetic circuit, with the "desired" specification, and b) loss functions: to quantify errors in Temporal Logic Inference (TLI) as a first step to establish formal performance guarantees of TLI algorithms.
08/01/2018 ∙ by Curtis Madsen, et al. ∙ 0 ∙ shareread it

Control Barrier Functions for Systems with High Relative Degree
This paper extends control barrier functions (CBFs) to high order control barrier functions (HOCBFs) that can be used for high relative degree constraints. The proposed HOCBFs are more general than recently proposed (exponential) HOCBFs. We introduce high order barrier functions (HOBF), and show that their satisfaction of Lyapunovlike conditions implies the forward invariance of the intersection of a series of sets. We then introduce HOCBF, and show that any control input that satisfies the HOCBF constraints renders the intersection of a series of sets forward invariant. We formulate optimal control problems with constraints given by HOCBF and control Lyapunov functions (CLF) and analyze the influence of the choice of the class K functions used in the definition of the HOCBF on the size of the feasible control region. We also provide a promising method to address the conflict between HOCBF constraints and control limitations by penalizing the class K functions. We illustrate the proposed method on an adaptive cruise control problem.
03/12/2019 ∙ by Wei Xiao, et al. ∙ 0 ∙ shareread it

Reactive Control Meets Runtime Verification: A Case Study of Navigation
This paper presents an application of specification based runtime verification techniques to control mobile robots in a reactive manner. In our case study, we develop a layered control architecture where runtime monitors constructed from formal specifications are embedded into the navigation stack. We use temporal logic and regular expressions to describe safety requirements and mission specifications, respectively. An immediate benefit of our approach is that it leverages simple requirements and objectives of traditional control applications to more complex specifications in a nonintrusive and compositional way. Finally, we demonstrate a simulation of robots controlled by the proposed architecture and we discuss further extensions of our approach.
02/11/2019 ∙ by Dogan Ulus, et al. ∙ 0 ∙ shareread it

Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions
Using reinforcement learning to learn control policies is a challenge when the task is complex with potentially long horizons. Ensuring adequate but safe exploration is also crucial for controlling physical systems. In this paper, we use temporal logic to facilitate specification and learning of complex tasks. We combine temporal logic with control Lyapunov functions to improve exploration. We incorporate control barrier functions to safeguard the exploration and deployment process. We develop a flexible and learnable system that allows users to specify task objectives and constraints in different forms and at various levels. The framework is also able to take advantage of known system dynamics and handle unknown environmental dynamics by integrating modelfree learning with modelbased planning.
03/23/2019 ∙ by Xiao Li, et al. ∙ 0 ∙ shareread it
Calin Belta
is this you? claim profile