Detecting Text Formality: A Study of Text Classification Approaches

04/19/2022
by   Daryna Dementieva, et al.
0

Formality is an important characteristic of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks, such as retrieval of texts with a desired formality level, integration in language learning and document editing platforms, or evaluating the desired conversation tone by chatbots. Recently two large-scale datasets were introduced for multiple languages featuring formality annotation. However, they were primarily used for the training of style transfer models. However, detection text formality on its own may also be a useful application. This work proposes the first systematic study of formality detection methods based on current (and more classic) machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments – monolingual, multilingual, and cross-lingual. The study shows the overcome of BiLSTM-based models over transformer-based ones for the formality classification task. We release formality detection models for several languages yielding state of the art results and possessing tested cross-lingual capabilities.

READ FULL TEXT
research
06/05/2022

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Detoxification is a task of generating text in polite style while preser...
research
05/24/2017

Deep Investigation of Cross-Language Plagiarism Detection Methods

This paper is a deep investigation of cross-language plagiarism detectio...
research
08/25/2023

Compressor-Based Classification for Atrial Fibrillation Detection

Atrial fibrillation (AF) is one of the most common arrhythmias with chal...
research
06/08/2023

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

Cross-lingual text classification leverages text classifiers trained in ...
research
03/23/2019

Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Most work in text classification and Natural Language Processing (NLP) f...
research
08/19/2023

Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs

Large language models (LLMs) possess a wealth of knowledge encoded in th...
research
11/17/2022

Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material

Midrash collections are complex rabbinic works that consist of text in m...

Please sign up or login with your details

Forgot password? Click here to reset