An Inclusive Notion of Text

11/10/2022
by   Ilia Kuznetsov, et al.
0

Natural language processing researchers develop models of grammar, meaning and human communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest systematic community-level reporting as a crucial next step to consolidate the discussion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

The Role of Explanatory Value in Natural Language Processing

A key aim of science is explanation, yet the idea of explaining language...
research
12/19/2022

The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges

Code-Switching, a common phenomenon in written text and conversation, ha...
research
03/18/2022

Challenges and Strategies in Cross-Cultural NLP

Various efforts in the Natural Language Processing (NLP) community have ...
research
09/06/2023

Addressing the Blind Spots in Spoken Language Processing

This paper explores the critical but often overlooked role of non-verbal...
research
06/08/2023

Dealing with Semantic Underspecification in Multimodal NLP

Intelligent systems that aim at mastering language as humans do must dea...
research
03/24/2021

Representing Numbers in NLP: a Survey and a Vision

NLP systems rarely give special consideration to numbers found in text. ...
research
09/27/2021

Language Invariant Properties in Natural Language Processing

Meaning is context-dependent, but many properties of language (should) r...

Please sign up or login with your details

Forgot password? Click here to reset