
DataDriven SpatioTemporal Analysis of Curbside Parking Demand: A CaseStudy in Seattle
Due to rapid expansion of urban areas in recent years, management of curbside parking has become increasingly important. To mitigate congestion, while meeting a city's diverse needs, performancebased pricing schemes have received a significant amount of attention. However, several recent studies suggest location, timeofday, and awareness of policies are the primary factors that drive parking decisions. In light of this, we provide an extensive datadriven study of the spatiotemporal characteristics of curbside parking. This work advances the understanding of where and when to set pricing policies, as well as where to target information and incentives to drivers looking to park. Harnessing data provided by the Seattle Department of Transportation, we develop a Gaussian mixture model based technique to identify zones with similar spatial parking demand as quantified by spatial autocorrelation. In support of this technique, we introduce a metric based on the repeatability of our Gaussian mixture model to investigate temporal consistency.
12/02/2017 ∙ by Tanner Fiez, et al. ∙ 0 ∙ shareread it

Incentives in the Dark: Multiarmed Bandits for Evolving Users with Unknown Type
Design of incentives or recommendations to users is becoming more common as platform providers continually emerge. We propose a multiarmed bandit approach to the problem in which users types are unknown a priori and evolve dynamically in time. Unlike the traditional bandit setting, observed rewards are generated by a single Markov process. We demonstrate via an illustrative example that blindly applying the traditional bandit algorithms results in very poor performance as measured by regret. We introduce two variants of classical bandit algorithms, upper confidence bound (UCB) and epsilongreedy, for which we provide theoretical bounds on the regret. We conduct a number of simulationbased experiments to show how the algorithms perform in comparison to traditional UCB and epsilongreedy algorithms as well as reinforcement learning (Qlearning).
03/11/2018 ∙ by Lillian J. Ratliff, et al. ∙ 0 ∙ shareread it

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences
The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multiarmed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bikesharing platform) examples.
07/06/2018 ∙ by Tanner Fiez, et al. ∙ 0 ∙ shareread it

Adaptive Incentive Design
We apply control theoretic and optimization techniques to adaptively design incentives. In particular, we consider the problem of a planner with an objective that depends on data from strategic decision makers. The planner does not know the process by which the strategic agents make decisions. Under the assumption that the agents are utility maximizers, we model their interactions as a noncooperative game and utilize the Nash equilibrium concept as well as myopic update rules to model the selection of their decision. By parameterizing the agents' utility functions and the incentives offered, we develop an algorithm that the planner can employ to learn the agents' decisionmaking processes while simultaneously designing incentives to change their response to a more desirable response from the planner's perspective. We provide convergence results for this algorithm both in the noisefree and noisy cases and present illustrative examples.
06/14/2018 ∙ by Lillian J. Ratliff, et al. ∙ 0 ∙ shareread it

Sequential Experimental Design for Transductive Linear Bandits
In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors X⊂R^d, a set of items Z⊂R^d, a fixed confidence δ, and an unknown vector θ^∗∈R^d, the goal is to infer argmax_z∈Z z^θ^∗ with probability 1δ by making as few sequentially chosen noisy measurements of the form x^θ^∗ as possible. When X=Z, this setting generalizes linear bandits, and when X is the standard basis vectors and Z⊂{0,1}^d, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages X a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages Z that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books X a user is queried about may be restricted to well known bestsellers even though the goal might be to recommend more esoteric titles Z. In this paper, we provide instancedependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first nonasymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.
06/20/2019 ∙ by Tanner Fiez, et al. ∙ 0 ∙ shareread it

Convergence of Learning Dynamics in Stackelberg Games
This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We show that in zerosum games, the only stable attractors of the Stackelberg gradient dynamics are Stackelberg equilibria. This insight allows us to develop a gradientbased update for the leader that converges to Stackelberg equilibria in zerosum games and the set of stable attractors in generalsum games. We then consider a follower employing a gradientplay update rule instead of a best response strategy and propose a twotimescale algorithm with similar asymptotic convergence results. For this algorithm, we also provide finitetime high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in generalsum games.
06/04/2019 ∙ by Tanner Fiez, et al. ∙ 0 ∙ shareread it
Tanner Fiez
is this you? claim profile