Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

10/28/2018
by   Oscar Karnalim, et al.
0

To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

A Comparison of Information Retrieval Techniques for Detecting Source Code Plagiarism

Plagiarism is a commonly encountered problem in the academia. While ther...
research
09/23/2018

Which Source Code Plagiarism Detection Approach is More Humane?

This paper contributes in developing source code plagiarism detection th...
research
11/29/2017

An Abstract Method Linearization for Detecting Source Code Plagiarism in Object-Oriented Environment

Despite the fact that plagiarizing source code is a trivial task for mos...
research
07/26/2019

Scalable Source Code Similarity Detection in Large Code Repositories

Source code similarity are increasingly used in application development ...
research
07/21/2023

Identifying document similarity using a fast estimation of the Levenshtein Distance based on compression and signatures

Identifying document similarity has many applications, e.g., source code...
research
08/31/2021

More WiFi for Everyone: Increasing Spectral Efficiency in WiFi6 Networks using OBSS/PD Mechanism

This study aims to enhance spatial reuse by using the new features of IE...
research
06/15/2018

Oreo: Detection of Clones in the Twilight Zone

Source code clones are categorized into four types of increasing difficu...

Please sign up or login with your details

Forgot password? Click here to reset