Splitting Compounds by Semantic Analogy

09/15/2015
by   Joachim Daiber, et al.
0

Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. In this paper, we investigate whether distributional semantics in the form of word embeddings can enable a deeper, i.e., more knowledge-rich, processing of compounds than the standard string-based methods. We present an unsupervised approach that exploits regularities in the semantic vector space (based on analogies such as "bookshop is to shop as bookshelf is to shelf") to produce compound analyses of high quality. A subsequent compound splitting algorithm based on these analyses is highly effective, particularly for ambiguous compounds. German to English machine translation experiments show that this semantic analogy-based compound splitter leads to better translations than a commonly used frequency-based method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2018

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings

The notions of concreteness and imageability, traditionally important in...
research
05/09/2017

Word and Phrase Translation with word2vec

Word and phrase tables are key inputs to machine translations, but costl...
research
09/03/2018

Affordance Extraction and Inference based on Semantic Role Labeling

Common-sense reasoning is becoming increasingly important for the advanc...
research
12/08/2016

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Word embeddings are widely used in Natural Language Processing, mainly d...
research
07/11/2016

Mapping distributional to model-theoretic semantic spaces: a baseline

Word embeddings have been shown to be useful across state-of-the-art sys...
research
03/21/2020

A Joint Approach to Compound Splitting and Idiomatic Compound Detection

Applications such as machine translation, speech recognition, and inform...
research
09/14/2022

vec2text with Round-Trip Translations

We investigate models that can generate arbitrary natural language text ...

Please sign up or login with your details

Forgot password? Click here to reset