Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation

10/24/2022
by   Tsz Kin Lam, et al.
0

Customer feedback can be an important signal for improving commercial machine translation systems. One solution for fixing specific translation errors is to remove the related erroneous training instances followed by re-training of the machine translation system, which we refer to as instance-specific data filtering. Influence functions (IF) have been shown to be effective in finding such relevant training examples for classification tasks such as image classification, toxic speech detection and entailment task. Given a probing instance, IF find influential training examples by measuring the similarity of the probing instance with a set of training examples in gradient space. In this work, we examine the use of influence functions for Neural Machine Translation (NMT). We propose two effective extensions to a state of the art influence function and demonstrate on the sub-problem of copied training examples that IF can be applied more generally than handcrafted regular expressions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2018

Learning from Chunk-based Feedback in Neural Machine Translation

We empirically investigate learning from partial feedback in neural mach...
research
03/25/2020

RelatIF: Identifying Explanatory Training Examples via Relative Influence

In this work, we focus on the use of influence functions to identify rel...
research
02/27/2023

Make Every Example Count: On Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets

Increasingly larger datasets have become a standard ingredient to advanc...
research
11/08/2021

Revisiting Methods for Finding Influential Examples

Several instance-based explainability methods for finding influential tr...
research
11/29/2021

A General Framework for Defending Against Backdoor Attacks via Influence Graph

In this work, we propose a new and general framework to defend against b...
research
11/22/2019

Optimizing Data Usage via Differentiable Rewards

To acquire a new skill, humans learn better and faster if a tutor, based...
research
12/08/2020

Efficient Estimation of Influence of a Training Instance

Understanding the influence of a training instance on a neural network m...

Please sign up or login with your details

Forgot password? Click here to reset