A Comparative Study of Feature Types for Age-Based Text Classification

09/24/2020
by   Anna Glazkova, et al.
0

The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for children. Finally, it will be useful for writers and publishers to determine which features influence whether the texts are suitable for children. In this article, we compare the empirical effectiveness of various types of linguistic features for the task of age-based classification of fiction texts. For this purpose, we collected a text corpus of book previews labeled with one of two categories – children's or adult. We evaluated the following types of features: readability indices, sentiment, lexical, grammatical and general features, and publishing attributes. The results obtained show that the features describing the text at the document level can significantly increase the quality of machine learning models.

READ FULL TEXT
research
08/21/2023

Age Recommendation from Texts and Sentences for Children

Children have less text understanding capability than adults. Moreover, ...
research
01/12/2022

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a r...
research
01/07/2020

Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL

The goal of this work is to build a classifier that can identify text co...
research
02/02/2021

Child-Computer Interaction: Recent Works, New Dataset, and Age Detection

We overview recent research in Child-Computer Interaction and describe o...
research
11/18/2019

Aging Deep Face Features: Finding Missing Children

Given a gallery of face images of missing children, state-of-the-art fac...
research
11/23/2020

Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions

The increasing adoption of technology to augment or even replace traditi...
research
06/07/2018

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Semantic annotations have to satisfy quality constraints to be useful fo...

Please sign up or login with your details

Forgot password? Click here to reset