On Compositionality and Improved Training of NADO

06/20/2023
by   Sidi Lu, et al.
0

NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. Differentiating from finetuning/prompt tuning, it has the potential to avoid catastrophic forgetting of the large base model and achieve guaranteed convergence to an entropy-maximized closed-form solution without significantly limiting the model capacity. Despite its success, several challenges arise when applying NADO to more complex scenarios. First, the best practice of using NADO for the composition of multiple control signals is under-explored. Second, vanilla NADO suffers from gradient vanishing for low-probability control signals and is highly reliant on the forward-consistency regularization. In this paper, we study the aforementioned challenges when using NADO theoretically and empirically. We show we can achieve guaranteed compositional generalization of NADO with a certain practice, and propose a novel alternative parameterization of NADO to perfectly guarantee the forward-consistency. We evaluate the improved training of NADO, i.e. NADO++, on CommonGen. Results show that NADO++ improves the effectiveness of the algorithm in multiple aspects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Controllable Text Generation with Neurally-Decomposed Oracle

We propose a general and efficient framework to control auto-regressive ...
research
08/17/2023

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine lear...
research
02/27/2022

Controllable Natural Language Generation with Contrastive Prefixes

To guide the generation of large pretrained language models (LM), previo...
research
06/17/2023

KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation

Self-training (ST) has come to fruition in language understanding tasks ...
research
11/16/2018

On Training Recurrent Neural Networks for Lifelong Learning

Capacity saturation and catastrophic forgetting are the central challeng...
research
08/22/2022

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

Prompt-tuning, which freezes pretrained language models (PLMs) and only ...
research
09/18/2023

Stabilizing RLHF through Advantage Model and Selective Rehearsal

Large Language Models (LLMs) have revolutionized natural language proces...

Please sign up or login with your details

Forgot password? Click here to reset