Binary and Multitask Classification Model for Dutch Anaphora Resolution: Die/Dat Prediction

01/09/2020
by   Liesbeth Allein, et al.
0

The correct use of Dutch pronouns 'die' and 'dat' is a stumbling block for both native and non-native speakers of Dutch due to the multiplicity of syntactic functions and the dependency on the antecedent's gender and number. Drawing on previous research conducted on neural context-dependent dt-mistake correction models (Heyman et al. 2018), this study constructs the first neural network model for Dutch demonstrative and relative pronoun resolution that specifically focuses on the correction and part-of-speech prediction of these two pronouns. Two separate datasets are built with sentences obtained from, respectively, the Dutch Europarl corpus (Koehn 2015) - which contains the proceedings of the European Parliament from 1996 to the present - and the SoNaR corpus (Oostdijk et al. 2013) - which contains Dutch texts from a variety of domains such as newspapers, blogs and legal texts. Firstly, a binary classification model solely predicts the correct 'die' or 'dat'. The classifier with a bidirectional long short-term memory architecture achieves 84.56 accuracy. Secondly, a multitask classification model simultaneously predicts the correct 'die' or 'dat' and its part-of-speech tag. The model containing a combination of a sentence and context encoder with both a bidirectional long short-term memory architecture results in 88.63 prediction and 87.73 evenly-balanced data, larger word embeddings, an extra bidirectional long short-term memory layer and integrated part-of-speech knowledge positively affects die/dat prediction performance, while a context encoder architecture raises part-of-speech prediction performance. This study shows promising results and can serve as a starting point for future research on machine learning models for Dutch anaphora resolution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2019

Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory

Stuttering is a speech impediment affecting tens of millions of people o...
research
07/07/2021

Intensity Prediction of Tropical Cyclones using Long Short-Term Memory Network

Tropical cyclones can be of varied intensity and cause a huge loss of li...
research
11/04/2021

Conformal prediction for text infilling and part-of-speech prediction

Modern machine learning algorithms are capable of providing remarkably a...
research
01/10/2019

Context Aware Machine Learning

We propose a principle for exploring context in machine learning models....
research
09/15/2021

Scope resolution of predicted negation cues: A two-step neural network-based approach

Neural network-based methods are the state of the art in negation scope ...
research
07/17/2023

Can We Trust Race Prediction?

In the absence of sensitive race and ethnicity data, researchers, regula...
research
10/02/2019

NASS-AI: Towards Digitization of Parliamentary Bills using Document Level Embedding and Bidirectional Long Short-Term Memory

There has been several reports in the Nigerian and International media a...

Please sign up or login with your details

Forgot password? Click here to reset