On the Adversarial Robustness of Mixture of Experts

10/19/2022
by   Joan Puigcerver, et al.
0

Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, Bubeck and Sellke proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do – and can – functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts. The robustness of MoEs can suffer when the highest weighted experts for an input implement sufficiently different functions. We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost. We make key observations showing the robustness of MoEs to the choice of experts, highlighting the redundancy of experts in models trained in practice.

READ FULL TEXT
research
02/18/2022

Mixture-of-Experts with Expert Choice Routing

Sparsely-activated Mixture-of-experts (MoE) models allow the number of p...
research
12/09/2022

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Training large, deep neural networks to convergence can be prohibitively...
research
08/08/2022

A Theoretical View on Sparsely Activated Networks

Deep and wide neural networks successfully fit very complex functions to...
research
05/31/2021

Exploring Sparse Expert Models and Beyond

Mixture-of-Experts (MoE) models can achieve promising results with outra...
research
10/14/2021

Towards More Effective and Economic Sparsely-Activated Model

The sparsely-activated models have achieved great success in natural lan...
research
11/22/2022

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

Recent research in robust optimization has shown an overfitting-like phe...
research
11/23/2021

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Mixture-of-experts based acoustic models with dynamic routing mechanisms...

Please sign up or login with your details

Forgot password? Click here to reset