
Which transformer architecture fits my data? A vocabulary bottleneck in selfattention
After their successful debut in natural language processing, Transformer...
PMIMasking: Principled masking of correlated spans
Masking tokens uniformly at random constitutes a common flaw in the pret...
Limits to Depth Efficiencies of SelfAttention
Selfattention architectures, which are rapidly pushing the frontier in ...
SenseBERT: Driving Some Sense into BERT
Selfsupervision techniques have allowed neural language models to advan...
Deep autoregressive models for the efficient variational simulation of manybody quantum systems
Artificial Neural Networks were recently shown to be an efficient repres...
Bridging ManyBody Quantum Physics and Deep Learning via Tensor Networks
The harnessing of modern computational abilities for manybody wavefunc...
Benefits of Depth for LongTerm Memory of Recurrent Networks
The key attribute that drives the unprecedented success of modern Recurr...
Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions
The driving force behind convolutional networks  the most successful de...
Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design
Deep convolutional networks have witnessed unprecedented success in vari...
Yoav Levine
