
MultiGoal Reinforcement Learning: Challenging Robotics Environments and Request for Research
The purpose of this technical report is twofold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as inhand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a MultiGoal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to MultiGoal RL and Hindsight Experience Replay.
02/26/2018 ∙ by Matthias Plappert, et al. ∙ 2 ∙ shareread it

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
We present techniques for speeding up the testtime evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internetscale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the linear structure present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large stateoftheart models, we demonstrate we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2x, while keeping the accuracy within 1
04/02/2014 ∙ by Emily Denton, et al. ∙ 0 ∙ shareread it

Addressing the Rare Word Problem in Neural Machine Translation
Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: endtoend NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible outofvocabulary (OOV) word. In this paper, we propose and implement an effective technique to address this problem. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sentence. This information is later utilized in a postprocessing step that translates every OOV word using a dictionary. Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2.8 BLEU points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT system is the first to surpass the best result achieved on a WMT14 contest task.
10/30/2014 ∙ by MinhThang Luong, et al. ∙ 0 ∙ shareread it

Reinforcement Learning Neural Turing Machines  Revised
The Neural Turing Machine (NTM) is more expressive than all previously considered models because of its external memory. It can be viewed as a broader effort to use abstract external Interfaces and to learn a parametric model that interacts with them. The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world. These external Interfaces include memory, a database, a search engine, or a piece of software such as a theorem verifier. Some of these Interfaces are provided by the developers of the model. However, many important existing Interfaces, such as databases and search engines, are discrete. We examine feasibility of learning models to interact with discrete Interfaces. We investigate the following discrete Interfaces: a memory Tape, an input Tape, and an output Tape. We use a Reinforcement Learning algorithm to train a neural network that interacts with such Interfaces to solve simple algorithmic tasks. Our Interfaces are expressive enough to make our model Turing complete.
05/04/2015 ∙ by Wojciech Zaremba, et al. ∙ 0 ∙ shareread it

OneShot Imitation Learning
Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring taskspecific engineering. In this paper, we propose a metalearning framework for achieving such capability, which we call oneshot imitation learning. Specifically, we consider the setting where there is a very large set of tasks, and each task has many instantiations. For example, a task could be to stack all blocks on a table into a single tower, another task could be to place all blocks on a table into twoblock towers, etc. In each case, different instances of the task would consist of different sets of blocks with different initial states. At training time, our algorithm is presented with pairs of demonstrations for a subset of all tasks. A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration. At test time, a demonstration of a single instance of a new task is presented, and the neural net is expected to perform well on new instances of this new task. The use of soft attention allows the model to generalize to conditions and tasks unseen in the training data. We anticipate that by training this model on a much greater variety of tasks and settings, we will obtain a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks. Videos available at https://bit.ly/nips2017oneshot .
03/21/2017 ∙ by Yan Duan, et al. ∙ 0 ∙ shareread it

Hindsight Experience Replay
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sampleefficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary offpolicy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pickandplace, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.
07/05/2017 ∙ by Marcin Andrychowicz, et al. ∙ 0 ∙ shareread it

Convolutional networks and learning invariant to homogeneous multiplicative scalings
The conventional classification schemes  notably multinomial logistic regression  used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scaleinvariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. "Scaleinvariant" means that multiplying the input values by any nonzero scalar leaves the output unchanged.
06/26/2015 ∙ by Mark Tygert, et al. ∙ 0 ∙ shareread it

Recurrent Neural Network Regularization
We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long ShortTerm Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.
09/08/2014 ∙ by Wojciech Zaremba, et al. ∙ 0 ∙ shareread it

Extensions and Limitations of the Neural GPU
The Neural GPU is a recent model that can learn algorithms such as multidigit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. The latter requires a memory efficient implementation, as a naive implementation of the Neural GPU is memory intensive. We find that these techniques increase the set of algorithmic problems that can be solved by the Neural GPU: we have been able to learn to perform all the arithmetic operations (and generalize to arbitrarily long numbers) when the arguments are given in the decimal representation (which, surprisingly, has not been possible before). We have also been able to train the Neural GPU to evaluate long arithmetic expressions with multiple operands that require respecting the precedence order of the operands, although these have succeeded only in their binary representation, and not with perfect accuracy. In addition, we gain insight into the Neural GPU by investigating its failure modes. We find that Neural GPUs that correctly generalize to arbitrarily long numbers still fail to compute the correct answer on highlysymmetric, atypical inputs: for example, a Neural GPU that achieves nearperfect generalization on decimal multiplication of up to 100digit long numbers can fail on 000000...002 × 000000...002 while succeeding at 2 × 2. These failure modes are reminiscent of adversarial examples.
11/02/2016 ∙ by Eric Price, et al. ∙ 0 ∙ shareread it

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulationbased control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which realworld action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.
10/11/2016 ∙ by Paul Christiano, et al. ∙ 0 ∙ shareread it

OpenAI Gym
OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
06/05/2016 ∙ by Greg Brockman, et al. ∙ 0 ∙ shareread it
Wojciech Zaremba
is this you? claim profile
Wojciech Zaremba is the OpenAI cofounder, leading the robotics team. His team is developing general purpose robots through new approaches to transferring complicated learning and teaching robots. OpenAI’s mission is to create safe artificial intelligence and ensure its advantages are distributed as evenly as possible.
Zaremba was born in Poland, in Kluczbork. He won local competitions and awards in mathematics, IT, chemistry and physics at a young age. Zaremba represented Poland at the International Olympiad in 2007 and won an Silver Medal in Vietnam. Zaremba studied mathematics and computerscience at the University of Warsaw and at the École Polytechnique and graduated in mathematics in 2013. He then started his doctorate in profound education at New York University under Yann LeCun and Rob Fergus. Zaremba graduated in 2016, and was awarded his PhD.
He spent time at NVIDIA during the predeep education period during his bachelor’s studies. NVIDIA chips were developed in the following years
A pillar of artificial intelligence. His PhD was divided between a year spent in Google brain and another year spent in Facebook Artificial Intelligence Research.
During his stay at Google, he coauthored work on opponents of neural networks. This result created the field of neural network adversarial attacks. His Ph.D. focuses on matching neural network capabilities to algorithmic computer power. The problem of training neural networks to represent a programmable computer algorithm was turned into a separate field.
In 2015, Zaremba joined OpenAI, a nonprofit research firm for artificial intelligence. The project’s objective is to create safe artificial intelligence. The headquarters in San Francisco has a budget of one billion dollars. Zaremba works as Robotics Research Manager at OpenAl. Zaremba sits on the consultancy board of Growbots, a startup firm in Silicon Valley that aims to automate machine learning and artificial intelligence sales processes.
Zaremba has published over 40 publications devoted to machine learning and artificial intelligence with several thousand quotes.
30 the Polish magazine “Forbes” 2017, the most influential before the 30s.
48th International Mathematical Olympiad, Vietnam Silver Medal
Polish Children’s Fund Scholar from 2000 to 2007
Aleksander Kwaśniewski President Skilful Children’s Scholarship