
Maximizing the Total Reward via Reward Tweaking
In reinforcement learning, the discount factor γ controls the agent's ef...
read it

Tight Lower Bounds for Combinatorial MultiArmed Bandits
The Combinatorial MultiArmed Bandit problem is a sequential decisionma...
read it

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
Learning how to act when there are many available actions in each state ...
read it

Finite Sample Analyses for TD(0) with Function Approximation
TD(0) is one of the most commonly used algorithms in reinforcement learn...
read it

Shallow Updates for Deep Reinforcement Learning
Deep reinforcement learning (DRL) methods such as the Deep QNetwork (DQ...
read it

Unit Commitment using Nearest Neighbor as a ShortTerm Proxy
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be us...
read it

Situational Awareness by RiskConscious Skills
Hierarchical Reinforcement Learning has been previously shown to speed u...
read it

Deep Robust Kalman Filter
A Robust Markov Decision Process (RMDP) is a sequential decision making ...
read it

Rotting Bandits
The MultiArmed Bandits (MAB) framework highlights the tension between a...
read it

Consistent OnLine OffPolicy Evaluation
The problem of online offpolicy evaluation (OPE) has been actively stu...
read it

Deep Reinforcement Learning Discovers Internal Models
Deep Reinforcement Learning (DRL) is a trending field of research, showi...
read it

Outlier Robust Online Learning
We consider the problem of learning from noisy data in practical setting...
read it

Adaptive Lambda LeastSquares Temporal Difference Learning
Temporal Difference learning or TD(λ) is a fundamental algorithm in the ...
read it

Hierarchical Decision Making In Electricity Grid Management
The power grid is a complex and vital system that necessitates careful r...
read it

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)
For complex, highdimensional Markov Decision Processes (MDPs), it may b...
read it

Graying the black box: Understanding DQNs
In recent years there is a growing interest in using deep representation...
read it

A nonparametric sequential test for online randomized experiments
We propose a nonparametric sequential test that aims to address two prac...
read it

Bayesian Reinforcement Learning: A Survey
Bayesian methods for machine learning have been widely investigated, yie...
read it

Situationally Aware Options
Hierarchical abstractions, also known as options  a type of temporally...
read it

Bootstrapping Skills
The monolithic approach to policy representation in Markov Decision Proc...
read it

RiskSensitive and Robust DecisionMaking: a CVaR Optimization Approach
In this paper we address the problem of decision making within a Markov ...
read it

How to Allocate Resources For Features Acquisition?
We study classification problems where features are corrupted by noise a...
read it

Visualizing Dynamics: from tSNE to SEMIMDPs
Deep Reinforcement Learning (DRL) is a trending field of research, showi...
read it

Clustering Time Series and the Surprising Robustness of HMMs
Suppose that we are given a time series where consecutive samples are be...
read it

Adaptive Skills, Adaptive Partitions (ASAP)
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework t...
read it

Generalized Emphatic Temporal Difference Learning: BiasVariance Analysis
We consider the offpolicy evaluation problem in Markov decision process...
read it

Emphatic TD Bellman Operator is a Contraction
Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...
read it

Overlapping Communities Detection via Measure Space Embedding
We present a new algorithm for community detection. The algorithm uses r...
read it

Actively Learning to Attract Followers on Twitter
Twitter, a popular social network, presents great opportunities for onl...
read it

Policy Gradient for Coherent Risk Measures
Several authors have recently developed risksensitive policy gradient m...
read it

Offpolicy evaluation for MDPs with unknown structure
Offpolicy learning in dynamic decision problems is essential for provid...
read it

Contextual Markov Decision Processes
We consider a planning problem where the dynamics and rewards of the env...
read it

Implicit Temporal Differences
In reinforcement learning, the TD(λ) algorithm is a fundamental policy e...
read it

Nonstochastic MultiArmed Bandits with GraphStructured Feedback
We present and study a partialinformation model of online learning, whe...
read it

Distributed Robust Learning
We propose a framework for distributed robust statistical learning on b...
read it

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms
The question why deep learning algorithms generalize so well has attract...
read it

Thompson Sampling for Learning Parameterized Markov Decision Processes
We consider reinforcement learning in parameterized Markov Decision Proc...
read it

Optimizing the CVaR via Sampling
Conditional Value at Risk (CVaR) is a prominent risk measure that is bei...
read it

Approachability in unknown games: Online learning meets multiobjective optimization
In the standard setting of approachability there are two players and a t...
read it

MeanVariance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance m...
read it

Scaling Up Robust MDPs by Reinforcement Learning
We consider largescale Markov decision processes (MDPs) with parameter ...
read it

A Primal Condition for Approachability with Partial Monitoring
In approachability with full monitoring there are two types of condition...
read it

Adaptive Bases for Reinforcement Learning
We consider the problem of reinforcement learning using function approxi...
read it

Robust High Dimensional Sparse Regression and Matching Pursuit
We consider high dimensional sparse regression, and develop strategies a...
read it

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes
In this paper we extend temporal difference policy evaluation algorithms...
read it

The Perturbed Variation
We introduce a new discrepancy score between two distributions that give...
read it

How to sample if you must: on optimal functional sampling
We examine a fundamental problem that models various active sampling set...
read it

Policy Gradients with Variance Related Risk Criteria
Managing risk in dynamic decision problems is of cardinal importance in ...
read it

From Bandits to Experts: On the Value of SideObservations
We consider an adversarial online learning setting where a decision make...
read it

The Sample Complexity of Dictionary Learning
A large set of signals can sometimes be described sparsely using a dicti...
read it
Shie Mannor
is this you? claim profile
Faculty member at the Department of Electrical Engineering at the Technion where I am a member of the Technion Machine Learning Center and the Grand Technion Energy Program.