
From Language to Goals: Inverse Reinforcement Learning for VisionBased Instruction Following
Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous machines, such as robots, is a significant challenge: conventionally, reward functions and goal states have been used to communicate objectives. But people can communicate objectives to each other simply by describing or demonstrating them. How can we build learning algorithms that will allow us to tell machines what we want them to do? In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that languageconditioned rewards are more transferable than languageconditioned policies to new environments. We propose languageconditioned reward learning (LCRL), which grounds language commands as a reward function represented by a deep neural network. We demonstrate that our model learns rewards that transfer to novel tasks and environments on realistic, highdimensional visual environments with natural language commands, whereas directly learning a languageconditioned policy leads to poor performance.
02/20/2019 ∙ by Justin Fu, et al. ∙ 6 ∙ shareread it

Speed/accuracy tradeoffs for modern convolutional object detectors
The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but applestoapples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster RCNN [Ren et al., 2015], RFCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "metaarchitectures" and trace out the speed/accuracy tradeoff curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these metaarchitectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves stateoftheart performance measured on the COCO detection task.
11/30/2016 ∙ by Jonathan Huang, et al. ∙ 0 ∙ shareread it

Bayesian Dark Knowledge
We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time). We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [HernandezLobato and Adams, 2015] and an approach based on variational Bayes [Blundell et al., 2015]. Our method performs better than both of these, is much simpler to implement, and uses less computation at test time.
06/14/2015 ∙ by Anoop Korattikara, et al. ∙ 0 ∙ shareread it

LargeScale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid overfitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1 Netflix dataset and an 1.8
03/05/2015 ∙ by Sungjin Ahn, et al. ∙ 0 ∙ shareread it

Austerity in MCMC Land: Cutting the MetropolisHastings Budget
Can we make Bayesian posterior MCMC sampling more efficient when faced with very large datasets? We argue that computing the likelihood for N datapoints in the MetropolisHastings (MH) test to reach a single binary decision is computationally inefficient. We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule. While this method introduces an asymptotic bias, we show that this bias can be controlled and is more than offset by a decrease in variance due to our ability to draw more samples per unit of time.
04/19/2013 ∙ by Anoop Korattikara, et al. ∙ 0 ∙ shareread it

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
In this paper we address the following question: Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small minibatch of dataitems for every sample we generate?. An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal approximation of the posterior, while for slow mixing rates it will mimic the behavior of SGLD with a preconditioner matrix. As a bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic gradients) and as such an efficient optimizer during burnin.
06/27/2012 ∙ by Sungjin Ahn, et al. ∙ 0 ∙ shareread it
Anoop Korattikara
is this you? claim profile