The similarity index of mathematical and other scientific publications with equations and formulas and the problem of self-plagiarism identification

10/08/2021
by   A. D. Polyanin, et al.
0

The problems of estimating the similarity index of inhomogeneous scientific publications containing equations and formulas are discussed for the first time. It is shown that the presence of equations and formulas (as well as figures, drawings, and tables) is a complicating factor that significantly complicates the study of such texts. It has been proved that the method for determining the similarity index of publications, based on taking into account individual mathematical symbols and parts of equations and formulas, is ineffective and can lead to erroneous and even completely absurd conclusions. Possibilities of the most popular software systems Antiplagiat and iThenticate, currently used in scientific journals, are investigated for detecting plagiarism and self-plagiarism. The results of processing by the iThenticate system of specific examples and specific test problems containing equations and formulas are presented. It has been established that this software system, when analyzing heterogeneous texts, is often unable to distinguish self-plagiarism from pseudo-self-plagiarism, seeming real (but false and imaginary) self-plagiarism. A model complex situation is considered, in which the identification of self-plagiarism requires the involvement of highly qualified specialists of a narrow profile. Various ways to improve the work of software systems for comparing inhomogeneous texts are proposed. This article will be useful to researchers and university teachers in physics, mathematics, and engineering, programmers dealing with problems in image recognition and research topics of digital image processing, as well as a wide range of readers who are interested in issues of plagiarism and self-plagiarism.

READ FULL TEXT

page 1

page 16

research
01/08/2023

Traditional Readability Formulas Compared for English

Traditional English readability formulas, or equations, were largely dev...
research
02/16/2019

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

Scientific documents rely on both mathematics and text to communicate id...
research
09/30/2022

New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences

The application of machine learning to physics problems is widely found ...
research
11/08/2017

A compressed dynamic self-index for highly repetitive text collections

We present a novel compressed dynamic self-index for highly repetitive t...
research
02/03/2017

Archiving Software Surrogates on the Web for Future Reference

Software has long been established as an essential aspect of the scienti...
research
11/11/2017

A distributed system for SearchOnMath based on the Microsoft BizSpark program

Mathematical information retrieval is a relatively new area, so the firs...

Please sign up or login with your details

Forgot password? Click here to reset