DeepAI AI Chat
Log In Sign Up

Learning Word Embeddings with Domain Awareness

by   Guoyin Wang, et al.
Duke University

Word embeddings are traditionally trained on a large corpus in an unsupervised setting, with no specific design for incorporating domain knowledge. This can lead to unsatisfactory performances when training data originate from heterogeneous domains. In this paper, we propose two novel mechanisms for domain-aware word embedding training, namely domain indicator and domain attention, which integrate domain-specific knowledge into the widely used SG and CBOW models, respectively. The two methods are based on a joint learning paradigm and ensure that words in a target domain are intensively focused when trained on a source domain corpus. Qualitative and quantitative evaluation confirm the validity and effectiveness of our models. Compared to baseline methods, our method is particularly effective in near-cold-start scenarios.


page 1

page 2

page 3

page 4


Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Word embedding is a Natural Language Processing (NLP) technique that aut...

Domain-Specific Word Embeddings with Structure Prediction

Complementary to finding good general word embeddings, an important ques...

Lifelong Domain Word Embedding via Meta-Learning

Learning high-quality domain word embeddings is important for achieving ...

Improving Cross-Domain Chinese Word Segmentation with Word Embeddings

Cross-domain Chinese Word Segmentation (CWS) remains a challenge despite...

Corpus specificity in LSA and Word2vec: the role of out-of-domain documents

Latent Semantic Analysis (LSA) and Word2vec are some of the most widely ...

Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation

Open-domain conversational systems are assumed to generate equally good ...

LEAPME: Learning-based Property Matching with Embeddings

Data integration tasks such as the creation and extension of knowledge g...