research
          
      
      ∙
      05/16/2023
    Mimetic Initialization of Self-Attention Layers
It is notoriously difficult to train Transformers on small datasets; typ...
          
            research
          
      
      ∙
      10/07/2022
    Understanding the Covariance Structure of Convolutional Filters
Neural network weights are typically initialized at random from univaria...
          
            research
          
      
      ∙
      01/24/2022
    Patches Are All You Need?
Although convolutional networks have been the dominant architecture for ...
          
            research
          
      
      ∙
      04/14/2021
     
             
                     
  
  
     
                             share
 share