research
∙
02/08/2023
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Despite the recent success of stochastic gradient descent in deep learni...
research
∙
10/03/2022
A Non-monotonic Self-terminating Language Model
Recent large-scale neural autoregressive sequence models have shown impr...
research
∙
09/25/2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
In natural language processing, it has been observed recently that gener...
research
∙
09/29/2018