Graphine: A Dataset for Graph-aware Terminology Definition Generation

09/09/2021
by   Zequn Liu, et al.
11

Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graph-aware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of terminologies. We further demonstrated how Graphine can be used to evaluate pretrained language models, compare graph representation learning methods and predict sentence granularity. We envision Graphine to be a unique resource for definition generation and many other NLP tasks in biomedicine.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 8

page 9

page 10

page 11

research
07/16/2020

Investigating Pretrained Language Models for Graph-to-Text Generation

Graph-to-text generation, a subtask of data-to-text generation, aims to ...
research
10/20/2021

SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

Generating texts in scientific papers requires not only capturing the co...
research
06/19/2021

JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs

Existing pre-trained models for knowledge-graph-to-text (KG-to-text) gen...
research
02/12/2023

Investigating the Effect of Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters

Text generation from Abstract Meaning Representation (AMR) has substanti...
research
11/18/2019

Graph Transformer for Graph-to-Sequence Learning

The dominant graph-to-sequence transduction models employ graph neural n...
research
06/14/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Advanced large-scale neural language models have led to significant succ...
research
06/29/2020

Learning Sparse Prototypes for Text Generation

Prototype-driven text generation uses non-parametric models that first c...

Please sign up or login with your details

Forgot password? Click here to reset