Like a Good Nearest Neighbor: Practical Content Moderation with Sentence Transformers

02/17/2023
by   Luke Bates, et al.
0

Modern text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall et al., 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Text classification is important for addressing the problem of domain drift in detecting harmful content, which plagues all social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), an inexpensive modification to SetFit that requires no additional parameters or hyperparameters but modifies input with information about its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was optimized. LaGoNN is effective at the task of detecting harmful content and generally improves performance compared to SetFit. To demonstrate the value of our system, we conduct a thorough study of text classification systems in the context of content moderation under four label distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2017

k-Nearest Neighbor Augmented Neural Networks for Text Classification

In recent years, many deep-learning based models are proposed for text c...
research
12/05/2022

Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration

Pre-trained language models (PLMs) have exhibited remarkable few-shot le...
research
11/04/2019

Metric Learning for Dynamic Text Classification

Traditional text classifiers are limited to predicting over a fixed set ...
research
12/27/2017

Improving Text Normalization by Optimizing Nearest Neighbor Matching

Text normalization is an essential task in the processing and analysis o...
research
09/13/2022

Non-Parametric Temporal Adaptation for Social Media Topic Classification

User-generated social media data is constantly changing as new trends in...
research
11/30/2022

Task-Specific Embeddings for Ante-Hoc Explainable Text Classification

Current state-of-the-art approaches to text classification typically lev...

Please sign up or login with your details

Forgot password? Click here to reset