SimpLex: a lexical text simplification architecture

04/14/2023
by   Ciprian-Octavian Truică, et al.
0

Text simplification (TS) is the process of generating easy-to-understand sentences from a given sentence or piece of text. The aim of TS is to reduce both the lexical (which refers to vocabulary complexity and meaning) and syntactic (which refers to the sentence structure) complexity of a given text or sentence without the loss of meaning or nuance. In this paper, we present SimpLex, a novel simplification architecture for generating simplified English sentences. To generate a simplified sentence, the proposed architecture uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity. The solution is incorporated into a user-friendly and simple-to-use software. We evaluate our system using two metrics, i.e., SARI, and Perplexity Decrease. Experimentally, we observe that the transformer models outperform the other models in terms of the SARI score. However, in terms of Perplexity, the Word-Embeddings-based models achieve the biggest decrease. Thus, the main contributions of this paper are: (1) We propose a new Word Embedding and Transformer based algorithm for text simplification; (2) We design SimpLex – a modular novel text simplification system – that can provide a baseline for further research; and (3) We perform an in-depth analysis of our solution and compare our results with two state-of-the-art models, i.e., LightLS [19] and NTS-w2v [44]. We also make the code publicly available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Towards Arabic Sentence Simplification via Classification and Generative Approaches

This paper presents an attempt to build a Modern Standard Arabic (MSA) s...
research
10/23/2020

Domain Specific Complex Sentence (DCSC) Semantic Similarity Dataset

Semantic textual similarity is one of the open research challenges in th...
research
11/27/2018

Verb Argument Structure Alternations in Word and Sentence Embeddings

Verbs occur in different syntactic environments, or frames. We investiga...
research
03/29/2021

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Most unsupervised NLP models represent each word with a single point or ...
research
01/15/2022

Automatic Lexical Simplification for Turkish

In this paper, we present the first automatic lexical simplification sys...
research
04/16/2023

Syntactic Complexity Identification, Measurement, and Reduction Through Controlled Syntactic Simplification

Text simplification is one of the domains in Natural Language Processing...
research
10/10/2019

Controllable Sentence Simplification: Employing Syntactic and Lexical Constraints

Sentence simplification aims to make sentences easier to read and unders...

Please sign up or login with your details

Forgot password? Click here to reset