Music representation learning is notoriously difficult for its complex
h...
Generally, residual connections are indispensable network components in
...
The sparse Mixture-of-Experts (MoE) model is powerful for large-scale
pr...
As more and more pre-trained language models adopt on-cloud deployment, ...
Pre-trained language models achieve outstanding performance in NLP tasks...
Many real-world applications require the prediction of long sequence
tim...
Neural architecture search, which aims to automatically search for
archi...
Kernel methods are powerful tools to capture nonlinear patterns behind d...