Yuxiang Yang

is this you? claim profile


  • NoRML: No-Reward Meta Learning

    Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML's finetune step. Our method has a more expressive update step than MAML, while maintaining MAML's gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies' parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks.

    03/04/2019 ∙ by Yuxiang Yang, et al. ∙ 28 share

    read it

  • Fine-grained ECG Classification Based on Deep CNN and Online Decision Fusion

    Early recognition of abnormal rhythm in ECG signals is crucial for monitoring or diagnosing patients' cardiac conditions and increasing the success rate of the treatment. Classifying abnormal rhythms into fine-grained categories is very challenging due to the the broad taxonomy of rhythms, noises and lack of real-world data and annotations from large number of patients. This paper presents a new ECG classification method based on Deep Convolutional Neural Networks (DCNN) and online decision fusion. Different from previous methods which utilize hand-crafted features or learn features from the original signal domain, the proposed DCNN based method learns features and classifiers from the time-frequency domain in an end-to-end manner. First, the ECG wave signal is transformed to time-frequency domain by using Short-Time Fourier Transform. Next, specific DCNN models are trained on ECG samples of specific length. Finally, an online decision fusion method is proposed to fuse past and current decisions from different models into a more accurate one. Experimental results on both synthetic and real-world ECG datasets convince the effectiveness and efficiency of the proposed method.

    01/19/2019 ∙ by Jing Zhang, et al. ∙ 6 share

    read it

  • Data Efficient Reinforcement Learning for Legged Robots

    We present a model-based framework for robot locomotion that achieves walking based on only 4.5 minutes (45,000 control steps) of data collected on a quadruped robot. To accurately model the robot's dynamics over a long horizon, we introduce a loss function that tracks the model's prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function. To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.

    07/08/2019 ∙ by Yuxiang Yang, et al. ∙ 6 share

    read it

  • Reinforcement Learning with Chromatic Networks

    We present a new algorithm for finding compact neural networks encoding reinforcement learning (RL) policies. To do it, we leverage in the novel RL setting the theory of pointer networks and ENAS-type algorithms for combinatorial optimization of RL policies as well as recent evolution strategies (ES) optimization methods, and propose to define the combinatorial search space to be the the set of different edge-partitionings (colorings) into same-weight classes. For several RL tasks, we manage to learn colorings translating to effective policies parameterized by as few as 17 weight parameters, providing 6x compression over state-of-the-art compact policies based on Toeplitz matrices. We believe that our work is one of the first attempts to propose a rigorous approach to training structured neural network architectures for RL problems that are of interest especially in mobile robotics with limited storage and computational resources.

    07/10/2019 ∙ by Xingyou Song, et al. ∙ 5 share

    read it

  • OpenRoACH: A Durable Open-Source Hexapedal Platform with Onboard Robot Operating System (ROS)

    OpenRoACH is a 15-cm 200-gram self-contained hexapedal robot with an onboard single-board computer. To our knowledge, it is the smallest legged robot with the capability of running the Robot Operating System (ROS) onboard. The robot is fully open sourced, uses accessible materials and off-the-shelf electronic components, can be fabricated with benchtop fast-prototyping machines such as a laser cutter and a 3D printer, and can be assembled by one person within two hours. Its sensory capacity has been tested with gyroscopes, accelerometers, Beacon sensors, color vision sensors, linescan sensors and cameras. It is low-cost within 150 including structure materials, motors, electronics, and a battery. The capabilities of OpenRoACH are demonstrated with multi-surface walking and running, 24-hour continuous walking burn-ins, carrying 200-gram dynamic payloads and 800-gram static payloads, and ROS control of steering based on camera feedback. Information and files related to mechanical design, fabrication, assembly, electronics, and control algorithms are all publicly available on https://wiki.eecs.berkeley.edu/biomimetics/Main/OpenRoACH.

    03/01/2019 ∙ by Liyu Wang, et al. ∙ 0 share

    read it