Text sampling strategies for predicting missing bibliographic links

01/04/2023
by   F. V. Krasnova, et al.
0

The paper proposes various strategies for sampling text data when performing automatic sentence classification for the purpose of detecting missing bibliographic links. We construct samples based on sentences as semantic units of the text and add their immediate context which consists of several neighboring sentences. We examine a number of sampling strategies that differ in context size and position. The experiment is carried out on the collection of STEM scientific papers. Including the context of sentences into samples improves the result of their classification. We automatically determine the optimal sampling strategy for a given text collection by implementing an ensemble voting when classifying the same data sampled in different ways. Sampling strategy taking into account the sentence context with hard voting procedure leads to the classification accuracy of 98 of detecting missing bibliographic links can be used in recommendation engines of applied intelligent information systems.

READ FULL TEXT
research
01/20/2020

Short Text Classification via Term Graph

Short text classi cation is a method for classifying short sentence with...
research
02/09/2021

Decontextualization: Making Sentences Stand-Alone

Models for question answering, dialogue agents, and summarization often ...
research
11/09/2019

How Decoding Strategies Affect the Verifiability of Generated Text

Language models are of considerable importance. They are used for pretra...
research
08/06/2015

Automatic classification of bengali sentences based on sense definitions present in bengali wordnet

Based on the sense definition of words available in the Bengali WordNet,...
research
10/24/2021

Sentence Punctuation for Collaborative Commentary Generation in Esports Live-Streaming

To solve the existing sentence punctuation problem for collaborative com...
research
06/24/2018

Disentangled VAE Representations for Multi-Aspect and Missing Data

Many problems in machine learning and related application areas are fund...
research
01/01/2019

Text Infilling

Recent years have seen remarkable progress of text generation in differe...

Please sign up or login with your details

Forgot password? Click here to reset