Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

09/18/2023
by   Joseph Gatto, et al.
0

User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP solutions able to work with this set of limited-data problems. In this study, we employ Abstract Meaning Representation (AMR) graphs as a means to model low-resource Health NLP tasks sourced from various online health resources and communities. AMRs are well suited to model online health texts as they can represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships between co-referring tokens. AMRs thus improve the ability of pre-trained language models to reason about high-complexity texts. Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings. Our approach is task agnostic and easy to merge into any standard text classification pipeline. We experimentally validate that AMRs are useful in the modeling of complex texts by analyzing performance through the lens of two textual complexity measures: the Flesch Kincaid Reading Level and Syntactic Complexity. Our error analysis shows that AMR-infused language models perform better on complex texts and generally show less predictive variance in the presence of changing complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2023

CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts

Social media platforms play an essential role in crisis communication, b...
research
01/04/2022

ZeroBERTo – Leveraging Zero-Shot Text Classification by Topic Modeling

Traditional text classification approaches often require a good amount o...
research
03/07/2023

A Challenging Benchmark for Low-Resource Learning

With promising yet saturated results in high-resource settings, low-reso...
research
03/29/2023

Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages

Many natural language processing (NLP) tasks make use of massively pre-t...
research
10/06/2015

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...
research
06/05/2020

Prague Dependency Treebank – Consolidated 1.0

We present a richly annotated and genre-diversified language resource, t...
research
08/24/2019

DAST Model: Deciding About Semantic Complexity of a Text

Measuring of text complexity is a needed task in several domains and app...

Please sign up or login with your details

Forgot password? Click here to reset