
Bandits with Partially Observable Offline Data
We study linear contextual bandits with access to a large, partially obs...
read it

Distributional Robustness and Regularization in Reinforcement Learning
Distributionally Robust Optimization (DRO) has enabled to prove the equi...
read it

ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
read it

Stealing BlackBox Functionality Using The Deep Neural Tree Architecture
This paper makes a substantial step towards cloning the functionality of...
read it

Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of R...
read it

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
Policy evaluation is a key process in Reinforcement Learning (RL). It as...
read it

Tight Lower Bounds for Combinatorial MultiArmed Bandits
The Combinatorial MultiArmed Bandit problem is a sequential decisionma...
read it

Patternless Adversarial Attacks on Video Recognition Networks
Deep neural networks for classification of videos, just like image class...
read it

Maximizing the Total Reward via Reward Tweaking
In reinforcement learning, the discount factor γ controls the agent's ef...
read it

Stabilizing OffPolicy Reinforcement Learning with Conservative Policy Gradients
In recent years, advances in deep learning have enabled the application ...
read it

Natural Language State Representation for Reinforcement Learning
Recent advances in Reinforcement Learning have highlighted the difficult...
read it

MultiStep Greedy and Approximate Real Time Dynamic Programming
Real Time Dynamic Programming (RTDP) is a wellknown Dynamic Programming...
read it

OffPolicy Evaluation in Partially Observable Environments
This work studies the problem of batch offpolicy evaluation for Reinfor...
read it

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Trust region policy optimization (TRPO) is a popular and empirically suc...
read it

Practical Risk Measures in Reinforcement Learning
Practical application of Reinforcement Learning (RL) often involves risk...
read it

Topic Modeling via Full Dependence Mixtures
We consider the topic modeling problem for large datasets. For this prob...
read it

Variance Estimation For Online Regression via Spectrum Thresholding
We consider the online linear regression problem, where the predictor ve...
read it

Tight Regret Bounds for ModelBased Reinforcement Learning with Greedy Policies
Stateoftheart efficient modelbased Reinforcement Learning (RL) algor...
read it

Distributional Policy Optimization: An Alternative Approach for Continuous Control
We identify a fundamental problem in policy gradientbased methods in co...
read it

Inverse Reinforcement Learning in Contextual MDPs
We consider the Inverse Reinforcement Learning (IRL) problem in Contextu...
read it

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
We propose a computationally efficient algorithm that combines compresse...
read it

A Bayesian Approach to Robust Reinforcement Learning
Robust Markov Decision Processes (RMDPs) intend to ensure robustness wit...
read it

BatchSize Independent Regret Bounds for the Combinatorial MultiArmed Bandit Problem
We consider the combinatorial multiarmed bandit (CMAB) problem, where t...
read it

Image Matters: Detecting Offensive and NonCompliant Content / Logo in Product Images
In ecommerce, product content, especially product images have a signifi...
read it

A ProblemAdaptive Algorithm for Resource Allocation
We consider a sequential stochastic resource allocation problem under th...
read it

The Natural Language of Actions
We introduce Act2Vec, a general framework for learning contextbased act...
read it

Value Propagation for Decentralized Networked Deep Multiagent Reinforcement Learning
We consider the networked multiagent reinforcement learning (MARL) prob...
read it

Action Robust Reinforcement Learning and Applications in Continuous Control
A policy is said to be robust if it maximizes the reward while consideri...
read it

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching
We study the neurallinear bandit model for solving sequential decision...
read it

Trust Region Value Optimization using Kalman Filtering
Policy evaluation is a key process in reinforcement learning. It assesse...
read it

Multi Instance Learning For Unbalanced Data
In the context of Multi Instance Learning, we analyze the Single Instanc...
read it

Revisiting ExplorationConscious Reinforcement Learning
The objective of Reinforcement Learning is to learn an optimal policy by...
read it

Inspiration Learning through Preferences
Current imitation learning techniques are too restrictive because they r...
read it

OnLine Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters
Kalman filter is a key tool for timeseries forecasting and analysis. We...
read it

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
Learning how to act when there are many available actions in each state ...
read it

How to Combine TreeSearch Methods in Reinforcement Learning
Finitehorizon lookahead policies are abundantly used in Reinforcement L...
read it

Multiuser Communication Networks: A Coordinated Multiarmed Bandit Approach
Communication networks shared by many users are a widespread challenge n...
read it

A General Approach to MultiArmed Bandits Under Risk Criteria
Different riskrelated criteria have received recent interest in learnin...
read it

Reward Constrained Policy Optimization
Teaching agents to perform tasks using Reinforcement Learning is no easy...
read it

MultipleStep Greedy Policies in Online and Approximate Reinforcement Learning
Multiplestep lookahead policies have demonstrated high empirical compet...
read it

Nonlinear Distributional Gradient TemporalDifference Learning
We devise a distributional variant of gradient temporaldifference (TD) ...
read it

Interdependent Gibbs Samplers
Gibbs sampling, as a model learning method, is known to produce the most...
read it

Deep Learning Reconstruction of UltraShort Pulses
Ultrashort laser pulses with femtosecond to attosecond pulse duration a...
read it

SoftRobust ActorCritic PolicyGradient
Robust Reinforcement Learning aims to derive an optimal behavior that ac...
read it

Train on Validation: Squeezing the Data Lemon
Model selection on validation data is an essential step in machine learn...
read it

Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
read it

Learning Robust Options
Robust reinforcement learning aims to produce policies that have strong ...
read it

ChanceConstrained Outage Scheduling using a Machine Learning Proxy
Outage scheduling aims at defining, over a horizon of several months to ...
read it

The Stochastic Firefighter Problem
The dynamics of infectious diseases spread is crucial in determining the...
read it

Situationally Aware Options
Hierarchical abstractions, also known as options  a type of temporally...
read it
Shie Mannor
is this you? claim profile
Faculty member at the Department of Electrical Engineering at the Technion where I am a member of the Technion Machine Learning Center and the Grand Technion Energy Program.