
Robust Value Iteration for Continuous Control Tasks
When transferring a control policy from simulation to a physical system,...
read it

Value Iteration in Continuous Actions, States and Time
Classical value iteration approaches are not applicable to environments ...
read it

Better than the Best: Gradientbased Improper Reinforcement Learning for Network Scheduling
We consider the problem of scheduling in constrained queueing networks w...
read it

Using Kalman Filter The Right Way: Noise Estimation Is Not Optimal
Determining the noise parameters of a Kalman Filter (KF) has been resear...
read it

Maximum Entropy Reinforcement Learning with Mixture Policies
Mixture models are an expressive hypothesis class that can approximate a...
read it

Action Redundancy in Reinforcement Learning
Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning p...
read it

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
Offline reinforcement learning approaches can generally be divided to pr...
read it

Reinforcement Learning for Datacenter Congestion Control
We approach the task of network congestion control in datacenters using ...
read it

Improper Learning with Gradientbased Policy Optimization
We consider an improper reinforcement learning setting where the learner...
read it

Online Apprenticeship Learning
In Apprenticeship Learning (AL), we are given a Markov Decision Process ...
read it

RL for Latent MDPs: Regret Guarantees and a Lower Bound
In this work, we consider the regret minimization problem for reinforcem...
read it

Dimension Free Generalization Bounds for Non Linear Metric Learning
In this work we study generalization guarantees for the metric learning ...
read it

Online Limited Memory NeuralLinear Bandits with Likelihood Matching
We study neurallinear bandits for solving problems where both explorati...
read it

Acting in Delayed Environments with NonStationary Markov Policies
The standard Markov Decision Process (MDP) formulation hinges on the ass...
read it

The Architectural Implications of Distributed Reinforcement Learning on CPUGPU Systems
With deep reinforcement learning (RL) methods achieving results that exc...
read it

Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering
Detection of deterioration of agent performance in dynamic environments ...
read it

How to Stop Epidemics: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks
We consider the problem of monitoring and controlling a partiallyobserv...
read it

Reinforcement Learning with Trajectory Feedback
The computational model of reinforcement learning is based upon the abil...
read it

Lenient Regret for MultiArmed Bandits
We consider the MultiArmed Bandit (MAB) problem, where the agent sequen...
read it

The Pendulum Arrangement: Maximizing the Escape Time of Heterogeneous Random Walks
We identify a fundamental phenomenon of heterogeneous one dimensional ra...
read it

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
As modern neural networks have grown to billions of parameters, meeting ...
read it

Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
read it

Distributional Robustness and Regularization in Reinforcement Learning
Distributionally Robust Optimization (DRO) has enabled to prove the equi...
read it

ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
read it

Stealing BlackBox Functionality Using The Deep Neural Tree Architecture
This paper makes a substantial step towards cloning the functionality of...
read it

Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
read it

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
Policy evaluation is a key process in Reinforcement Learning (RL). It as...
read it

Tight Lower Bounds for Combinatorial MultiArmed Bandits
The Combinatorial MultiArmed Bandit problem is a sequential decisionma...
read it

Patternless Adversarial Attacks on Video Recognition Networks
Deep neural networks for classification of videos, just like image class...
read it

Maximizing the Total Reward via Reward Tweaking
In reinforcement learning, the discount factor γ controls the agent's ef...
read it

Stabilizing OffPolicy Reinforcement Learning with Conservative Policy Gradients
In recent years, advances in deep learning have enabled the application ...
read it

Natural Language State Representation for Reinforcement Learning
Recent advances in Reinforcement Learning have highlighted the difficult...
read it

MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
read it

OffPolicy Evaluation in Partially Observable Environments
This work studies the problem of batch offpolicy evaluation for Reinfor...
read it

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
read it

Practical Risk Measures in Reinforcement Learning
Practical application of Reinforcement Learning (RL) often involves risk...
read it

Topic Modeling via Full Dependence Mixtures
We consider the topic modeling problem for large datasets. For this prob...
read it

Variance Estimation For Online Regression via Spectrum Thresholding
We consider the online linear regression problem, where the predictor ve...
read it

Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
read it

Distributional Policy Optimization: An Alternative Approach for Continuous Control
We identify a fundamental problem in policy gradientbased methods in co...
read it

Inverse Reinforcement Learning in Contextual MDPs
We consider the Inverse Reinforcement Learning (IRL) problem in Contextu...
read it

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
We propose a computationally efficient algorithm that combines compresse...
read it

A Bayesian Approach to Robust Reinforcement Learning
Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...
read it

BatchSize Independent Regret Bounds for the Combinatorial MultiArmed Bandit Problem
We consider the combinatorial multiarmed bandit (CMAB) problem, where t...
read it

Image Matters: Detecting Offensive and NonCompliant Content / Logo in Product Images
In ecommerce, product content, especially product images have a signifi...
read it

A ProblemAdaptive Algorithm for Resource Allocation
We consider a sequential stochastic resource allocation problem under th...
read it

The Natural Language of Actions
We introduce Act2Vec, a general framework for learning contextbased act...
read it

Value Propagation for Decentralized Networked Deep Multiagent Reinforcement Learning
We consider the networked multiagent reinforcement learning (MARL) prob...
read it

Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
read it

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching
We study the neurallinear bandit model for solving sequential decision...
read it
Shie Mannor
is this you? claim profile
Faculty member at the Department of Electrical Engineering at the Technion where I am a member of the Technion Machine Learning Center and the Grand Technion Energy Program.