belabBERT: a Dutch RoBERTa-based language model applied to psychiatric classification

by   Joppe Wouts, et al.

Natural language processing (NLP) is becoming an important means for automatic recognition of human traits and states, such as intoxication, presence of psychiatric disorders, presence of airway disorders and states of stress. Such applications have the potential to be an important pillar for online help lines, and may gradually be introduced into eHealth modules. However, NLP is language specific and for languages such as Dutch, NLP models are scarce. As a result, recent Dutch NLP models have a low capture of long range semantic dependencies over sentences. To overcome this, here we present belabBERT, a new Dutch language model extending the RoBERTa architecture. belabBERT is trained on a large Dutch corpus (+32 GB) of web crawled texts. We applied belabBERT to the classification of psychiatric illnesses. First, we evaluated the strength of text-based classification using belabBERT, and compared the results to the existing RobBERT model. Then, we compared the performance of belabBERT to audio classification for psychiatric disorders. Finally, a brief exploration was performed, extending the framework to a hybrid text- and audio-based classification. Our results show that belabBERT outperformed the current best text classification network for Dutch, RobBERT. belabBERT also outperformed classification based on audio alone.



page 1

page 2

page 3

page 4


Text-based classification of interviews for mental health – juxtaposing the state of the art

Currently, the state of the art for classification of psychiatric illnes...

Automatic Extraction of Personality from Text: Challenges and Opportunities

In this study, we examined the possibility to extract personality traits...

A Subword Level Language Model for Bangla Language

Language models are at the core of natural language processing. The abil...

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

Language models for historical states of language are becoming increasin...

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Data augmentation (DA) is a common solution to data scarcity and imbalan...

Deep Lexical Hypothesis: Identifying personality structure in natural language

Recent advances in natural language processing (NLP) have produced gener...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.