Despite their impressive capabilities, large language models (LLMs) are ...
Reward design is a fundamental, yet challenging aspect of practical
rein...
Large Language Models (LLMs) have shown remarkable proficiency in follow...
Meetings play a critical infrastructural role in the coordination of wor...
Transformer models have achieved remarkable results in various natural
l...
Summarizing lengthy documents is a common and essential task in our dail...
Large Language Models (LLMs) play a powerful Reader of the
Retrieve-then...
Based on the remarkable achievements of pre-trained language models in
a...
Many applications of text generation such as summarization benefit from
...
Tailoring outputs of large language models, such as ChatGPT, to specific...
We present Prompt Diffusion, a framework for enabling in-context learnin...
Through prompting, large-scale pre-trained models have become more expre...
Diffusion models are powerful, but they require a lot of time and data t...
Prior work has shown that finetuning large language models (LLMs) using
...
Fine-tuning large pre-trained language models on downstream tasks has be...
Large language models (LLMs), such as ChatGPT, are able to generate
huma...
We introduce a new framework, Directional Stimulus Prompting, that uses ...
Unsupervised clustering under domain shift (UCDS) studies how to transfe...
Dialogue summarization has recently garnered significant attention due t...
The input and output of most text generation tasks can be transformed to...
Fine-tuning large language models for different tasks can be costly and
...
Layer-wise distillation is a powerful tool to compress large models (i.e...
The information in tables can be an important complement to text, making...
Large Transformer-based models have exhibited superior performance in va...
We introduce GODEL (Grounded Open Dialogue Language Model), a large
pre-...
For stable training of generative adversarial networks (GANs), injecting...
Active learning, which effectively collects informative unlabeled data f...
Pre-trained language models have demonstrated superior performance in va...
Model ensemble is a popular approach to produce a low-variance and
well-...
Employing a forward Markov diffusion chain to gradually map the data to ...
Token-mixing multi-layer perceptron (MLP) models have shown competitive
...
Recent research has shown the existence of significant redundancy in lar...
Most of today's AI systems focus on using self-attention mechanisms and
...
This paper presents a new pre-trained language model, DeBERTaV3, which
i...
Adversarial regularization can improve model generalization in many natu...
The Lottery Ticket Hypothesis suggests that an over-parametrized network...
Adversarial training has been shown to improve the generalization perfor...
Existing curriculum learning approaches to Neural Machine Translation (N...
Multi-step off-policy reinforcement learning has achieved great success....
Current open-domain question answering (QA) systems often follow a
Retri...
To date, most of recent work under the retrieval-reader framework for
op...
We review the EfficientQA competition from NeurIPS 2020. The competition...
Conventional sparse retrieval methods such as TF-IDF and BM25 are simple...
Recent progress in pre-trained neural language models has significantly
...
In this work, we aim at equipping pre-trained language models with struc...
Generalization and robustness are both key desiderata for designing mach...
We present MT-DNN, an open-source natural language understanding (NLU)
t...
Transfer learning has fundamentally changed the landscape of natural lan...
In this work, we present X-SQL, a new network architecture for the probl...
The learning rate warmup heuristic achieves remarkable success in stabil...