David Budden

is this you? claim profile


  • A Generalized Framework for Population Based Training

    Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous "trials" (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller. The black-box design does not make assumptions on model architectures, loss functions or training procedures. Our system supports dynamic hyperparameter schedules to optimize both differentiable and non-differentiable metrics. We apply our system to train a state-of-the-art WaveNet generative model for human voice synthesis. We show that our PBT system achieves better accuracy, less sensitivity and faster convergence compared to existing methods, given the same computational resource.

    02/05/2019 ∙ by Ang Li, et al. ∙ 32 share

    read it

  • One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

    Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

    10/11/2018 ∙ by Tom Le Paine, et al. ∙ 4 share

    read it

  • Playing hard exploration games by watching YouTube

    Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

    05/29/2018 ∙ by Yusuf Aytar, et al. ∙ 2 share

    read it

  • Sample Efficient Adaptive Text-to-Speech

    We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

    09/27/2018 ∙ by Yutian Chen, et al. ∙ 2 share

    read it

  • A Multi-Pass Approach to Large-Scale Connectomics

    The field of connectomics faces unprecedented "big data" challenges. To reconstruct neuronal connectivity, automated pixel-level segmentation is required for petabytes of streaming electron microscopy data. Existing algorithms provide relatively good accuracy but are unacceptably slow, and would require years to extract connectivity graphs from even a single cubic millimeter of neural tissue. Here we present a viable real-time solution, a multi-pass pipeline optimized for shared-memory multicore systems, capable of processing data at near the terabyte-per-hour pace of multi-beam electron microscopes. The pipeline makes an initial fast-pass over the data, and then makes a second slow-pass to iteratively correct errors in the output of the fast-pass. We demonstrate the accuracy of a sparse slow-pass reconstruction algorithm and suggest new methods for detecting morphological errors. Our fast-pass approach provided many algorithmic challenges, including the design and implementation of novel shallow convolutional neural nets and the parallelization of watershed and object-merging techniques. We use it to reconstruct, from image stack to skeletons, the full dataset of Kasthuri et al. (463 GB capturing 120,000 cubic microns) in a matter of hours on a single multicore machine rather than the weeks it has taken in the past on much larger distributed systems.

    12/07/2016 ∙ by Yaron Meirovitch, et al. ∙ 0 share

    read it

  • Generative Compression

    Traditional image and video compression algorithms rely on hand-crafted encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the data being compressed. Here we describe the concept of generative compression, the compression of data using generative models, and suggest that it is a direction worth pursuing to produce more accurate and visually pleasing reconstructions at much deeper compression levels for both image and video data. We also demonstrate that generative compression is orders-of-magnitude more resilient to bit error rates (e.g. from noisy wireless channels) than traditional variable-length coding schemes.

    03/04/2017 ∙ by Shibani Santurkar, et al. ∙ 0 share

    read it

  • Toward Streaming Synapse Detection with Compositional ConvNets

    Connectomics is an emerging field in neuroscience that aims to reconstruct the 3-dimensional morphology of neurons from electron microscopy (EM) images. Recent studies have successfully demonstrated the use of convolutional neural networks (ConvNets) for segmenting cell membranes to individuate neurons. However, there has been comparatively little success in high-throughput identification of the intercellular synaptic connections required for deriving connectivity graphs. In this study, we take a compositional approach to segmenting synapses, modeling them explicitly as an intercellular cleft co-located with an asymmetric vesicle density along a cell membrane. Instead of requiring a deep network to learn all natural combinations of this compositionality, we train lighter networks to model the simpler marginal distributions of membranes, clefts and vesicles from just 100 electron microscopy samples. These feature maps are then combined with simple rules-based heuristics derived from prior biological knowledge. Our approach to synapse detection is both more accurate than previous state-of-the-art (7 speed-up compared to the previous fastest implementations. We demonstrate by reconstructing the first complete, directed connectome from the largest available anisotropic microscopy dataset (245 GB) of mouse somatosensory cortex (S1) in just 9.7 hours on a single shared-memory CPU system. We believe that this work marks an important step toward the goal of a microscope-pace streaming connectomics pipeline.

    02/23/2017 ∙ by Shibani Santurkar, et al. ∙ 0 share

    read it

  • Deep Tensor Convolution on Multicores

    Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the N-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.

    11/20/2016 ∙ by David Budden, et al. ∙ 0 share

    read it

  • Simulation leagues: Analysis of competition formats

    The selection of an appropriate competition format is critical for both the success and credibility of any competition, both real and simulated. In this paper, the automated parallelism offered by the RoboCupSoccer 2D simulation league is leveraged to conduct a 28,000 game round-robin between the top 8 teams from RoboCup 2012 and 2013. A proposed new competition format is found to reduce variation from the resultant statistically significant team performance rankings by 75 RoboCup 2012 and 2013 respectively. These results are statistically validated by generating 10,000 random tournaments for each of the three considered formats and comparing the respective distributions of ranking discrepancy.

    03/17/2014 ∙ by David Budden, et al. ∙ 0 share

    read it

  • Addressing the non-functional requirements of computer vision systems: A case study

    Computer vision plays a major role in the robotics industry, where vision data is frequently used for navigation and high-level decision making. Although there is significant research in algorithms and functional requirements, there is a comparative lack of emphasis on how best to map these abstract concepts onto an appropriate software architecture. In this study, we distinguish between the functional and non-functional requirements of a computer vision system. Using a RoboCup humanoid robot system as a case study, we propose and develop a software architecture that fulfills the latter criteria. The modifiability of the proposed architecture is demonstrated by detailing a number of feature detection algorithms and emphasizing which aspects of the underlying framework were modified to support their integration. To demonstrate portability, we port our vision system (designed for an application-specific DARwIn-OP humanoid robot) to a general-purpose, Raspberry Pi computer. We evaluate performance on both platforms and compare them to a vision system optimised for functional requirements only. The architecture and implementation presented in this study provide a highly generalisable framework for computer vision system design that is of particular benefit in research and development, competition and other environments in which rapid system evolution is necessary.

    10/31/2014 ∙ by Shannon Fenn, et al. ∙ 0 share

    read it

  • RANSAC: Identification of Higher-Order Geometric Features and Applications in Humanoid Robot Soccer

    The ability for an autonomous agent to self-localise is directly proportional to the accuracy and precision with which it can perceive salient features within its local environment. The identification of such features by recognising geometric profile allows robustness against lighting variations, which is necessary in most industrial robotics applications. This paper details a framework by which the random sample consensus (RANSAC) algorithm, often applied to parameter fitting in linear models, can be extended to identify higher-order geometric features. Goalpost identification within humanoid robot soccer is investigated as an application, with the developed system yielding an order-of-magnitude improvement in classification performance relative to a traditional histogramming methodology.

    10/22/2013 ∙ by Madison Flannery, et al. ∙ 0 share

    read it