Shampoo: Preconditioned Stochastic Tensor Optimization

02/26/2018
by   Vineet Gupta, et al.
0

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo's runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/08/2022

A Unified Convergence Theorem for Stochastic Optimization Methods

In this work, we provide a fundamental unified convergence theorem used ...
03/18/2020

NeCPD: An Online Tensor Decomposition with Optimal Stochastic Gradient Descent

Multi-way data analysis has become an essential tool for capturing under...
06/08/2018

Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data

Variance reduction has been commonly used in stochastic optimization. It...
06/05/2021

Tensor Normal Training for Deep Learning Models

Despite the predominant use of first-order methods for training deep lea...
12/07/2020

SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a g...
06/12/2019

Tensor Canonical Correlation Analysis

In many applications, such as classification of images or videos, it is ...