TASTY: A Transformer based Approach to Space and Time complexity

05/06/2023
by   Kaushik Moudgalya, et al.
0

Code based Language Models (LMs) have shown very promising results in the field of software engineering with applications such as code refinement, code completion and generation. However, the task of time and space complexity classification from code has not been extensively explored due to a lack of datasets, with prior endeavors being limited to Java. In this project, we aim to address these gaps by creating a labelled dataset of code snippets spanning multiple languages (Python and C++ datasets currently, with C, C#, and JavaScript datasets being released shortly). We find that existing time complexity calculation libraries and tools only apply to a limited number of use-cases. The lack of a well-defined rule based system motivates the application of several recently proposed code-based LMs. We demonstrate the effectiveness of dead code elimination and increasing the maximum sequence length of LMs. In addition to time complexity, we propose to use LMs to find space complexities from code, and to the best of our knowledge, this is the first attempt to do so. Furthermore, we introduce a novel code comprehension task, called cross-language transfer, where we fine-tune the LM on one language and run inference on another. Finally, we visualize the activation of the attention fed classification head of our LMs using Non-negative Matrix Factorization (NMF) to interpret our results.

READ FULL TEXT

page 13

page 14

research
12/26/2017

Space-Efficient Algorithms for Longest Increasing Subsequence

Given a sequence of integers, we want to find a longest increasing subse...
research
04/19/2022

On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules

Pre-trained neural Language Models (PTLM), such as CodeBERT, are recentl...
research
01/30/2020

Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

Authorship attribution of source code has been an established research t...
research
06/05/2023

SelfEvolve: A Code Evolution Framework via Large Language Models

Large language models (LLMs) have already revolutionized code generation...
research
08/11/2022

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code,...

Please sign up or login with your details

Forgot password? Click here to reset