Named entity recognition (NER) is a crucial task for online advertisemen...
Transformer models have achieved superior performance in various natural...
Layer-wise distillation is a powerful tool to compress large models (i.e...
E-commerce queries are often short and ambiguous. Consequently, query
un...
Graph neural network (GNN) pre-training methods have been proposed to en...
Point process models are of great importance in real world applications....
Large Transformer-based models have exhibited superior performance in va...
Pre-trained language models have demonstrated superior performance in va...
Recent research has shown the existence of significant redundancy in lar...
Sparsely activated models (SAMs), such as Mixture-of-Experts (MoE), can
...
Self-training achieves enormous success in various semi-supervised and
w...
Adversarial regularization can improve model generalization in many natu...
The Lottery Ticket Hypothesis suggests that an over-parametrized network...
Adversarial training has been shown to improve the generalization perfor...
We consider a regression problem, where the correspondence between input...
Fine-tuned pre-trained language models (LMs) achieve enormous success in...
Modern data acquisition routinely produce massive amounts of event seque...
There has long been debates on how we could interpret neural networks an...