
Which transformer architecture fits my data? A vocabulary bottleneck in selfattention
After their successful debut in natural language processing, Transformer...
read it

PMIMasking: Principled masking of correlated spans
Masking tokens uniformly at random constitutes a common flaw in the pret...
read it

Limits to Depth Efficiencies of SelfAttention
Selfattention architectures, which are rapidly pushing the frontier in ...
read it

SenseBERT: Driving Some Sense into BERT
Selfsupervision techniques have allowed neural language models to advan...
read it

Deep autoregressive models for the efficient variational simulation of manybody quantum systems
Artificial Neural Networks were recently shown to be an efficient repres...
read it

Bridging ManyBody Quantum Physics and Deep Learning via Tensor Networks
The harnessing of modern computational abilities for manybody wavefunc...
read it

Benefits of Depth for LongTerm Memory of Recurrent Networks
The key attribute that drives the unprecedented success of modern Recurr...
read it

Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions
The driving force behind convolutional networks  the most successful de...
read it

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design
Deep convolutional networks have witnessed unprecedented success in vari...
read it
Yoav Levine
is this you? claim profile