Essentia: Mining Domain-specific Paraphrases with Word-Alignment Graphs

10/01/2019
by   Danni Ma, et al.
0

Paraphrases are important linguistic resources for a wide variety of NLP applications. Many techniques for automatic paraphrase mining from general corpora have been proposed. While these techniques are successful at discovering generic paraphrases, they often fail to identify domain-specific paraphrases (e.g., staff, concierge in the hospitality domain). This is because current techniques are often based on statistical methods, while domain-specific corpora are too small to fit statistical methods. In this paper, we present an unsupervised graph-based technique to mine paraphrases from a small set of sentences that roughly share the same topic or intent. Our system, Essentia, relies on word-alignment techniques to create a word-alignment graph that merges and organizes tokens from input sentences. The resulting graph is then used to generate candidate paraphrases. We demonstrate that our system obtains high-quality paraphrases, as evaluated by crowd workers. We further show that the majority of the identified paraphrases are domain-specific and thus complement existing paraphrase databases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2018

Lifelong Domain Word Embedding via Meta-Learning

Learning high-quality domain word embeddings is important for achieving ...
research
10/06/2022

Domain-Specific Word Embeddings with Structure Prediction

Complementary to finding good general word embeddings, an important ques...
research
04/27/2019

Enabling Open-World Specification Mining via Unsupervised Learning

Many programming tasks require using both domain-specific code and well-...
research
12/11/2021

Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

Continuously-growing data volumes lead to larger generic models. Specifi...
research
10/13/2021

FlexiTerm: A more efficient implementation of flexible multi-word term recognition

Terms are linguistic signifiers of domain-specific concepts. Automated r...
research
06/21/2022

WikiDoMiner: Wikipedia Domain-specific Miner

We introduce WikiDoMiner, a tool for automatically generating domain-spe...
research
08/20/2019

Flud: a hybrid crowd-algorithm approach for visualizing biological networks

Modern experiments in many disciplines generate large quantities of netw...

Please sign up or login with your details

Forgot password? Click here to reset