Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

03/31/2015
by   António Lopes, et al.
0

Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events, little has been said on their cross-language behavior and on building an inventory of multilingual lexica of discourse markers. This work describes new methods and approaches for the description, classification, and annotation of discourse markers in the specific domain of the Europarl corpus. The study of discourse markers in the context of translation is crucial due to the idiomatic nature of these structures. Multilingual lexica together with the functional analysis of such structures are useful tools for the hard task of translating discourse markers into possible equivalents from one language to another. Using Daniel Marcu's validated discourse markers for English, extracted from the Brown Corpus, our purpose is to build multilingual lexica of discourse markers for other languages, based on machine translation techniques. The major assumption in this study is that the usage of a discourse marker is independent of the language, i.e., the rhetorical function of a discourse marker in a sentence in one language is equivalent to the rhetorical function of the same discourse marker in another language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2020

Automatic Discourse Segmentation: Review and Perspectives

Multilingual discourse parsing is a very prominent research topic. The f...
research
08/28/2018

WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse

We release a corpus of 43 million atomic edits across 8 languages. These...
research
09/25/2019

TalkDown: A Corpus for Condescension Detection in Context

Condescending language use is caustic; it can bring dialogues to an end ...
research
02/19/2022

Is there an aesthetic component of language?

Speakers of all human languages make use of grammatical devices to expre...
research
04/14/2019

From News to Medical: Cross-domain Discourse Segmentation

The first step in discourse analysis involves dividing a text into segme...
research
08/08/2023

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

We study Socially Unacceptable Discourse (SUD) characterization and dete...
research
06/10/2018

SciDTB: Discourse Dependency TreeBank for Scientific Abstracts

Annotation corpus for discourse relations benefits NLP tasks such as mac...

Please sign up or login with your details

Forgot password? Click here to reset