Fine-tuning Neural-Operator architectures for training and generalization

01/27/2023
by   JA Lara Benitez, et al.
0

In this work, we present an analysis of the generalization of Neural Operators (NOs) and derived architectures. We proposed a family of networks, which we name (sNO+ε), where we modify the layout of NOs towards an architecture resembling a Transformer; mainly, we substitute the Attention module with the Integral Operator part of NOs. The resulting network preserves universality, has a better generalization to unseen data, and similar number of parameters as NOs. On the one hand, we study numerically the generalization by gradually transforming NOs into sNO+ε and verifying a reduction of the test loss considering a time-harmonic wave dataset with different frequencies. We perform the following changes in NOs: (a) we split the Integral Operator (non-local) and the (local) feed-forward network (MLP) into different layers, generating a sequential structure which we call sequential Neural Operator (sNO), (b) we add the skip connection, and layer normalization in sNO, and (c) we incorporate dropout and stochastic depth that allows us to generate deep networks. In each case, we observe a decrease in the test loss in a wide variety of initialization, indicating that our changes outperform the NO. On the other hand, building on infinite-dimensional Statistics, and in particular the Dudley Theorem, we provide bounds of the Rademacher complexity of NOs and sNO, and we find the following relationship: the upper bound of the Rademacher complexity of the sNO is a lower-bound of the NOs, thereby, the generalization error bound of sNO is smaller than NO, which further strengthens our numerical results.

READ FULL TEXT

page 7

page 26

page 29

page 31

page 32

page 34

page 35

research
07/04/2023

Free energy of Bayesian Convolutional Neural Network with Skip Connection

Since the success of Residual Network(ResNet), many of architectures of ...
research
05/30/2018

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Despite existing work on ensuring generalization of neural networks in t...
research
11/27/2022

Adversarial Rademacher Complexity of Deep Neural Networks

Deep neural networks are vulnerable to adversarial attacks. Ideally, a r...
research
06/30/2022

FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer

Prompt tuning is an emerging way of adapting pre-trained language models...
research
05/24/2019

Loss Surface Modality of Feed-Forward Neural Network Architectures

It has been argued in the past that high-dimensional neural networks do ...
research
04/12/2021

Generalization bounds via distillation

This paper theoretically investigates the following empirical phenomenon...
research
03/28/2023

Operator learning with PCA-Net: upper and lower complexity bounds

PCA-Net is a recently proposed neural operator architecture which combin...

Please sign up or login with your details

Forgot password? Click here to reset