Combining keyphrase extraction and lexical diversity to characterize ideas in publication titles

08/30/2022
by   James Powell, et al.
0

Beyond bibliometrics, there is interest in characterizing the evolution of the number of ideas in scientific papers. A common approach for investigating this involves analyzing the titles of publications to detect vocabulary changes over time. With the notion that phrases, or more specifically keyphrases, represent concepts, lexical diversity metrics are applied to phrased versions of the titles. Thus changes in lexical diversity are treated as indicators of shifts, and possibly expansion, of research. Therefore, optimizing detection of keyphrases is an important aspect of this process. Rather than just one, we propose to use multiple phrase detection models with the goal to produce a more comprehensive set of keyphrases from the source corpora. Another potential advantage to this approach is that the union and difference of these sets may provide automated techniques for identifying and omitting non-specific phrases. We compare the performance of several phrase detection models, analyze the keyphrase sets output of each, and calculate lexical diversity of corpora variants incorporating keyphrases from each model, using four common lexical diversity metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase representations derived from BERT often do not exhibit complex ph...
research
05/06/2020

Evaluating text coherence based on the graph of the consistency of phrases to identify symptoms of schizophrenia

Different state-of-the-art methods of the detection of schizophrenia sym...
research
06/18/1999

Automatically Selecting Useful Phrases for Dialogue Act Tagging

We present an empirical investigation of various ways to automatically i...
research
06/15/2016

Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

We introduce an agreement-based approach to learning parallel lexicons a...
research
10/24/2022

Investigating the detection of Tortured Phrases in Scientific Literature

With the help of online tools, unscrupulous authors can today generate a...
research
08/09/2019

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

In this paper, we investigate the integration of sentence position and s...
research
04/13/2021

Journals Titles and Mission Statements: Lexical structure, diversity and readability

There is an established research agenda on dissecting an articles compon...

Please sign up or login with your details

Forgot password? Click here to reset