Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension

08/25/2023
by   Bin Lin, et al.
0

Context: Developers spend most of their time comprehending source code during software development. Automatically assessing how readable and understandable source code is can provide various benefits in different tasks, such as task triaging and code reviews. While several studies have proposed approaches to predict software readability and understandability, most of them only focus on local characteristics of source code. Besides, the performance of understandability prediction is far from satisfactory. Objective: In this study, we aim to assess readability and understandability from the perspective of language acquisition. More specifically, we would like to investigate whether code readability and understandability are correlated with the naturalness and vocabulary difficulty of source code. Method: To assess code naturalness, we adopted the cross-entropy metric, while we use a manually crafted list of code elements with their assigned advancement levels to assess the vocabulary difficulty. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of code readability and understandability prediction methods. The study will be conducted on existing datasets.

READ FULL TEXT
research
03/06/2020

Code Obfuscation for the C/C++ Language

Obfuscation is the action of making something unintelligible. In softwar...
research
08/16/2022

Identifying Source Code File Experts

In software development, the identification of source code file experts ...
research
03/17/2020

Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

Statistical language modeling techniques have successfully been applied ...
research
12/16/2020

The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

Static code analysis tools and integrated development environments prese...
research
07/20/2021

On the Interplay of Smells Large Class, Complex Class and Duplicate Code

Bad smells have been defined to describe potential problems in code, pos...
research
02/01/2019

Applications of Multi-view Learning Approaches for Software Comprehension

Program comprehension concerns the ability of an individual to make an u...
research
10/23/2020

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

There is an emerging interest in the application of deep learning models...

Please sign up or login with your details

Forgot password? Click here to reset