TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

05/22/2023
by   Ankit Satpute, et al.
0

This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair – TEIMMA. Annotating content reuse is particularly useful to develop plagiarism detection algorithms. Real-world content reuse is often obfuscated, which makes it challenging to identify such cases. TEIMMA allows entering the obfuscation type to enable novel classifications for confirmed cases of plagiarism. It enables recording different reuse types for text, images, and mathematical formulae in HTML and supports users by visualizing the content reuse in a document pair using similarity detection methods for text and math.

READ FULL TEXT
research
06/10/2021

Analyzing Non-Textual Content Elements to Detect Academic Plagiarism

Identifying academic plagiarism is a pressing problem, among others, for...
research
12/21/2018

Wikipedia Text Reuse: Within and Without

We study text reuse related to Wikipedia at scale by compiling the first...
research
02/08/2023

Reception Reader: Exploring Text Reuse in Early Modern British Publications

The Reception Reader is a web tool for studying text reuse in the Early ...
research
07/03/2014

Enhanced EZW Technique for Compression of Image by Setting Detail Retaining Pass Number

This submission has been withdrawn by arXiv administrators because it co...
research
11/20/2019

Talking datasets: Understanding data sensemaking behaviours

The sharing and reuse of data are seen as critical to solving the most c...
research
05/06/2023

LibAM: An Area Matching Framework for Detecting Third-party Libraries in Binaries

Third-party libraries (TPLs) are extensively utilized by developers to e...
research
03/21/2018

Generic Zero-Cost Reuse for Dependent Types

Dependently typed languages are well known for having a problem with cod...

Please sign up or login with your details

Forgot password? Click here to reset