
Solving Discounted Stochastic TwoPlayer Games with NearOptimal Time and Sample Complexity
In this paper, we settle the sampling complexity of solving discounted twoplayer turnbased zerosum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor γ∈(0,1) we provide an algorithm that computes an ϵoptimal strategy with highprobability given Õ((1  γ)^3ϵ^2) samples from the transition function for each stateactionpair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of stateaction pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to Azar et al (2013). We achieve our results by showing how to generalize a nearoptimal Qlearning based algorithms for MDP, in particular Sidford et al (2018), to twoplayer strategy computation algorithms. This overcomes limitations of standard Qlearning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multiagent settings with little loss.
08/29/2019 ∙ by Aaron Sidford, et al. ∙ 12 ∙ shareread it

Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuousstate discretetime Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our threephase analysis further provides a finitesample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.
08/29/2018 ∙ by Chris Junchi Li, et al. ∙ 4 ∙ shareread it

Online Factorization and Partition of Complex Networks From Random Walks
Finding the reduceddimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit largescale networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm. The algorithm is able to process dependent statetransition data dynamically generated by the underlying network and learn a lowdimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuoustime limiting process of the stochastic algorithm converges globally to the "principal components" of the Markov chain and achieves a nearly optimal sample complexity. Once given the learned lowdimensional representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability. We apply the proposed approach to model the traffic flow of Manhattan as citywide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.
05/22/2017 ∙ by Lin F. Yang, et al. ∙ 0 ∙ shareread it

Stochastic PrimalDual Methods and Sample Complexity of Reinforcement Learning
We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic PrimalDual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absoluteϵoptimal policy, with high probability, using O(S^4 A^2σ^2 /(1γ)^6ϵ^2) iterations/samples for the infinitehorizon discountedreward MDP and O(S^4 A^2H^6σ^2 /ϵ^2) for the finitehorizon MDP.
12/08/2016 ∙ by Yichen Chen, et al. ∙ 0 ∙ shareread it

Accelerating Stochastic Composition Optimization
Consider the stochastic composition optimization problem where the objective is a composition of two expectedvalue functions. We propose a new stochastic firstorder method, namely the accelerated stochastic compositional proximal gradient (ASCPG) method, which updates based on queries to the sampling oracle using two different timescales. The ASCPG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. We show that the ASCPG exhibits faster convergence than the best known algorithms, and that it achieves the optimal sampleerror complexity in several important special cases. We further demonstrate the application of ASCPG to reinforcement learning and conduct numerical experiments.
07/25/2016 ∙ by Mengdi Wang, et al. ∙ 0 ∙ shareread it

NearOptimal Stochastic Approximation for Online Principal Component Estimation
Principal component analysis (PCA) has been a prominent tool for highdimensional data analysis. Online algorithms that estimate the principal component by processing streaming data are of tremendous practical and theoretical interests. Despite its rich applications, theoretical convergence analysis remains largely open. In this paper, we cast online PCA into a stochastic nonconvex optimization problem, and we analyze the online PCA algorithm as a stochastic approximation iteration. The stochastic approximation iteration processes data points incrementally and maintains a running estimate of the principal component. We prove for the first time a nearly optimal finitesample error bound for the online PCA algorithm. Under the subgaussian assumption, we show that the finitesample error bound closely matches the minimax information lower bound.
03/16/2016 ∙ by Chris Junchi Li, et al. ∙ 0 ∙ shareread it

Random MultiConstraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints
Consider convex optimization problems subject to a large number of constraints. We focus on stochastic problems in which the objective takes the form of expected values and the feasible set is the intersection of a large number of convex sets. We propose a class of algorithms that perform both stochastic gradient descent and random feasibility updates simultaneously. At every iteration, the algorithms sample a number of projection points onto a randomly selected small subsets of all constraints. Three feasibility update schemes are considered: averaging over random projected points, projecting onto the most distant sample, projecting onto a special polyhedral set constructed based on sample points. We prove the almost sure convergence of these algorithms, and analyze the iterates' feasibility error and optimality error, respectively. We provide new convergence rate benchmarks for stochastic firstorder optimization with many constraints. The rate analysis and numerical experiments reveal that the algorithm using the polyhedralset projection scheme is the most efficient one within known algorithms.
11/12/2015 ∙ by Mengdi Wang, et al. ∙ 0 ∙ shareread it

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of ExpectedValue Functions
Classical stochastic gradient methods are well suited for minimizing expectedvalue objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expectedvalue functions, i.e., problems of the form _x E_v [f_v(E_w [g_w(x)])]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasigradient method. SCGD update the solutions based on noisy sample gradients of f_v,g_w and use an auxiliary variable to track the unknown quantity E_w[g_w(x)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of O(k^1/4) in the general case and O(k^2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k^2/7) in the general case and O(k^4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expectedvalue functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
11/14/2014 ∙ by Mengdi Wang, et al. ∙ 0 ∙ shareread it

Improved Incremental FirstOrder Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization
We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finitesum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them. SCVRG and its variants have recently drawn much attention given their edge over stochastic compositional gradient descent (SCGD); but the theoretical analysis exclusively assumes strong convexity of the objective, which excludes several important examples such as Lasso, logistic regression, principle component analysis and deep neural nets. In contrast, we prove nonasymptotic incremental firstorder oracle (IFO) complexity of SCVRG or its novel variants for nonsmooth convex composition optimization and show that they are provably faster than SCGD and gradient descent. More specifically, our method achieves the total IFO complexity of O((m+n)(1/ϵ)+1/ϵ^3) which improves that of O(1/ϵ^3.5) and O((m+n)/√(ϵ)) obtained by SCGD and accelerated gradient descent respectively. Experiments on sparse meanvariance optimization problem demonstrates that our method outperforms other competing methods.
02/07/2018 ∙ by Tianyi Lin, et al. ∙ 0 ∙ shareread it

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization
We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finitesum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them. SCVRG and its variants have recently drawn much attention given their edge over stochastic compositional gradient descent (SCGD); but the theoretical analysis exclusively assumes strong convexity of the objective, which excludes several important examples such as Lasso, logistic regression, principle component analysis and deep neural nets. In contrast, we prove nonasymptotic incremental firstorder oracle (IFO) complexity of SCVRG or its novel variants for nonsmooth convex composition optimization and show that they are provably faster than SCGD and gradient descent. More specifically, our method achieves the total IFO complexity of O((m+n)(1/ϵ)+1/ϵ^3) which improves that of O(1/ϵ^3.5) and O((m+n)/√(ϵ)) obtained by SCGD and accelerated gradient descent respectively. Experiments on sparse meanvariance optimization problem demonstrates that our method outperforms other competing methods.
02/07/2018 ∙ by Tianyi Lin, et al. ∙ 0 ∙ shareread it

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the global convergence in terms of the continuous time limiting solution trajectory and further implies near optimal sample complexity. Numerical experiments are provided to support our analysis.
03/06/2018 ∙ by Minshuo Chen, et al. ∙ 0 ∙ shareread it
Mengdi Wang
is this you? claim profile
Assistant Professor of Department of Operations Research and Financial Engineering at Princeton University