
Learning to Stop with Surprisingly Few Samples
We consider a discounted infinite horizon optimal stopping problem. If t...
read it

The Daily Life of Software Engineers during the COVID19 Pandemic
Following the onset of the COVID19 pandemic and subsequent lockdowns, s...
read it

Empirical Standards for Software Engineering Research
Empirical Standards are naturallanguage models of a scientific communit...
read it

Predictors of Wellbeing and Productivity among Software Professionals during the COVID19 Pandemic – A Longitudinal Study
The COVID19 pandemic has forced governments worldwide to impose movemen...
read it

Approximation Benefits of Policy Gradient Methods with Aggregated States
Folklore suggests that policy gradient can be more robust to misspecific...
read it

A Note on the Linear Convergence of Policy Gradient Methods
We revisit the finite time analysis of policy gradient methods in the si...
read it

SQuAPOnt: an Ontology of Software Quality Relational Factors from Financial Systems
Quality, architecture, and process are considered the keystones of softw...
read it

WorstCase Regret Bounds for Exploration via Randomized Value Functions
This paper studies a recent proposal to use randomized value functions t...
read it

Global Optimality Guarantees For Policy Gradient Methods
Policy gradients methods are perhaps the most widely used class of reinf...
read it

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents
This note gives a short, selfcontained, proof of a sharp connection bet...
read it

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Temporal difference learning (TD) is a simple iterative algorithm used t...
read it

Satisficing in TimeSensitive Bandit Learning
Much of the recent literature on bandit learning focuses on algorithms t...
read it

Improving the Expected Improvement Algorithm
The expected improvement (EI) algorithm is a popular strategy for inform...
read it

Deep Exploration via Randomized Value Functions
We study the use of randomized value functions to guide deep exploration...
read it

How much does your data exploration overfit? Controlling bias via information usage
Modern data is messy and highdimensional, and it is often not clear a p...
read it

(More) Efficient Reinforcement Learning via Posterior Sampling
Most provablyefficient learning algorithms introduce optimism about poo...
read it
Daniel Russo
is this you? claim profile