Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

09/21/2023
by   Hao Chen, et al.
0

Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a soft merging method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the l_0 norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.

READ FULL TEXT
research
09/10/2023

Is Learning in Biological Neural Networks based on Stochastic Gradient Descent? An analysis using stochastic processes

In recent years, there has been an intense debate about how learning in ...
research
11/18/2021

Merging Models with Fisher-Weighted Averaging

Transfer learning provides a way of leveraging knowledge from one task w...
research
12/22/2020

Stochastic Gradient Variance Reduction by Solving a Filtering Problem

Deep neural networks (DNN) are typically optimized using stochastic grad...
research
08/13/2023

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Stochastic gradient descent (SGD) and adaptive gradient methods, such as...
research
11/27/2018

MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms

Distributed synchronous stochastic gradient descent has been widely used...
research
12/04/2017

Learning Sparse Neural Networks through L_0 Regularization

We propose a practical method for L_0 norm regularization for neural net...
research
09/23/2019

On Model Stability as a Function of Random Seed

In this paper, we focus on quantifying model stability as a function of ...

Please sign up or login with your details

Forgot password? Click here to reset