Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

02/05/2021
by   Kanchan Chowdhury, et al.
0

Increasing the batch size of a deep learning model is a challenging task. Although it might help in utilizing full available system memory during training phase of a model, it results in significant loss of test accuracy most often. LARS solved this issue by introducing an adaptive learning rate for each layer of a deep learning model. However, there are doubts on how popular distributed machine learning systems such as SystemML or MLlib will perform with this optimizer. In this work, we apply LARS optimizer to a deep learning model implemented using SystemML.We perform experiments with various batch sizes and compare the performance of LARS optimizer with Stochastic Gradient Descent. Our experimental results show that LARS optimizer performs significantly better than Stochastic Gradient Descent for large batch sizes even with the distributed machine learning framework, SystemML.

READ FULL TEXT
research
08/21/2022

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

Practical results have shown that deep learning optimizers using small c...
research
02/04/2020

Large Batch Training Does Not Need Warmup

Training deep neural networks using a large batch size has shown promisi...
research
01/22/2021

Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning

We introduce Gravity, another algorithm for gradient-based optimization....
research
10/11/2019

On Empirical Comparisons of Optimizers for Deep Learning

Selecting an optimizer is a central step in the contemporary deep learni...
research
07/25/2023

How to Scale Your EMA

Preserving training dynamics across batch sizes is an important tool for...
research
08/01/2022

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitu...
research
03/31/2022

Exploiting Explainable Metrics for Augmented SGD

Explaining the generalization characteristics of deep learning is an eme...

Please sign up or login with your details

Forgot password? Click here to reset