
Robust Value Iteration for Continuous Control Tasks
When transferring a control policy from simulation to a physical system,...
Value Iteration in Continuous Actions, States and Time
Classical value iteration approaches are not applicable to environments ...
Better than the Best: Gradientbased Improper Reinforcement Learning for Network Scheduling
We consider the problem of scheduling in constrained queueing networks w...
Using Kalman Filter The Right Way: Noise Estimation Is Not Optimal
Determining the noise parameters of a Kalman Filter (KF) has been resear...
Maximum Entropy Reinforcement Learning with Mixture Policies
Mixture models are an expressive hypothesis class that can approximate a...
Action Redundancy in Reinforcement Learning
Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning p...
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
Offline reinforcement learning approaches can generally be divided to pr...
Reinforcement Learning for Datacenter Congestion Control
We approach the task of network congestion control in datacenters using ...
Improper Learning with Gradientbased Policy Optimization
We consider an improper reinforcement learning setting where the learner...
Online Apprenticeship Learning
In Apprenticeship Learning (AL), we are given a Markov Decision Process ...
RL for Latent MDPs: Regret Guarantees and a Lower Bound
In this work, we consider the regret minimization problem for reinforcem...
Dimension Free Generalization Bounds for Non Linear Metric Learning
In this work we study generalization guarantees for the metric learning ...
Online Limited Memory NeuralLinear Bandits with Likelihood Matching
We study neurallinear bandits for solving problems where both explorati...
Acting in Delayed Environments with NonStationary Markov Policies
The standard Markov Decision Process (MDP) formulation hinges on the ass...
The Architectural Implications of Distributed Reinforcement Learning on CPUGPU Systems
With deep reinforcement learning (RL) methods achieving results that exc...
Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering
Detection of deterioration of agent performance in dynamic environments ...
How to Stop Epidemics: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks
We consider the problem of monitoring and controlling a partiallyobserv...
Reinforcement Learning with Trajectory Feedback
The computational model of reinforcement learning is based upon the abil...
Lenient Regret for MultiArmed Bandits
We consider the MultiArmed Bandit (MAB) problem, where the agent sequen...
The Pendulum Arrangement: Maximizing the Escape Time of Heterogeneous Random Walks
We identify a fundamental phenomenon of heterogeneous one dimensional ra...
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
As modern neural networks have grown to billions of parameters, meeting ...
Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
Distributional Robustness and Regularization in Reinforcement Learning
Distributionally Robust Optimization (DRO) has enabled to prove the equi...
ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
Stealing BlackBox Functionality Using The Deep Neural Tree Architecture
This paper makes a substantial step towards cloning the functionality of...
Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
Policy evaluation is a key process in Reinforcement Learning (RL). It as...
Tight Lower Bounds for Combinatorial MultiArmed Bandits
The Combinatorial MultiArmed Bandit problem is a sequential decisionma...
Patternless Adversarial Attacks on Video Recognition Networks
Deep neural networks for classification of videos, just like image class...
Maximizing the Total Reward via Reward Tweaking
In reinforcement learning, the discount factor γ controls the agent's ef...
Stabilizing OffPolicy Reinforcement Learning with Conservative Policy Gradients
In recent years, advances in deep learning have enabled the application ...
Natural Language State Representation for Reinforcement Learning
Recent advances in Reinforcement Learning have highlighted the difficult...
MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
OffPolicy Evaluation in Partially Observable Environments
This work studies the problem of batch offpolicy evaluation for Reinfor...
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
Practical Risk Measures in Reinforcement Learning
Practical application of Reinforcement Learning (RL) often involves risk...
Topic Modeling via Full Dependence Mixtures
We consider the topic modeling problem for large datasets. For this prob...
Variance Estimation For Online Regression via Spectrum Thresholding
We consider the online linear regression problem, where the predictor ve...
Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
Distributional Policy Optimization: An Alternative Approach for Continuous Control
We identify a fundamental problem in policy gradientbased methods in co...
Inverse Reinforcement Learning in Contextual MDPs
We consider the Inverse Reinforcement Learning (IRL) problem in Contextu...
Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
We propose a computationally efficient algorithm that combines compresse...
A Bayesian Approach to Robust Reinforcement Learning
Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...
BatchSize Independent Regret Bounds for the Combinatorial MultiArmed Bandit Problem
We consider the combinatorial multiarmed bandit (CMAB) problem, where t...
Image Matters: Detecting Offensive and NonCompliant Content / Logo in Product Images
In ecommerce, product content, especially product images have a signifi...
A ProblemAdaptive Algorithm for Resource Allocation
We consider a sequential stochastic resource allocation problem under th...
The Natural Language of Actions
We introduce Act2Vec, a general framework for learning contextbased act...
Value Propagation for Decentralized Networked Deep Multiagent Reinforcement Learning
We consider the networked multiagent reinforcement learning (MARL) prob...
Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching
We study the neurallinear bandit model for solving sequential decision...
Shie Mannor
Faculty member at the Department of Electrical Engineering at the Technion where I am a member of the Technion Machine Learning Center and the Grand Technion Energy Program.