Learn to Talk via Proactive Knowledge Transfer

08/23/2020
by   Qing Sun, et al.
4

Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distributions of learners and teachers. The equivalence gives us a new perspective in understanding variants of the KL-divergence by looking at how learners structure their interaction with teachers in order to acquire knowledge. In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward. In contrast, learners are supervised in Forward. Moreover, our analysis is gradient-based, so it can be generalized to arbitrary tasks and help to decide which order to minimize given the property of the task. By replacing Forward with Backward in Knowledge Distillation, we observed +0.7-1.1 BLEU gains on the WMT'17 De-En and IWSLT'15 Th-En machine translation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2022

A Theory for Knowledge Transfer in Continual Learning

Continual learning of a stream of tasks is an active area in deep neural...
research
11/04/2020

Independent Gaussian Distributions Minimize the Kullback-Leibler (KL) Divergence from Independent Gaussian Distributions

This short note is on a property of the Kullback-Leibler (KL) divergence...
research
05/19/2021

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Knowledge distillation (KD), transferring knowledge from a cumbersome te...
research
12/09/2020

On Knowledge Distillation for Direct Speech Translation

Direct speech translation (ST) has shown to be a complex task requiring ...
research
05/13/2021

Empirical Evaluation of Biased Methods for Alpha Divergence Minimization

In this paper we empirically evaluate biased methods for alpha-divergenc...
research
06/08/2019

Forward and Backward Knowledge Transfer for Sentiment Classification

This paper studies the problem of learning a sequence of sentiment class...

Please sign up or login with your details

Forgot password? Click here to reset