Demystifying Code Summarization Models

02/09/2021
by   Yu Wang, et al.
0

The last decade has witnessed a rapid advance in machine learning models. While the black-box nature of these systems allows powerful predictions, it cannot be directly explained, posing a threat to the continuing democratization of machine learning technology. Tackling the challenge of model explainability, research has made significant progress in demystifying the image classification models. In the same spirit of these works, this paper studies code summarization models, particularly, given an input program for which a model makes a prediction, our goal is to reveal the key features that the model uses for predicting the label of the program. We realize our approach in HouYi, which we use to evaluate four prominent code summarization models: extreme summarizer, code2vec, code2seq, and sequence GNN. Results show that all models base their predictions on syntactic and lexical properties with little to none semantic implication. Based on this finding, we present a novel approach to explaining the predictions of code summarization models through the lens of training data. Our work opens up this exciting, new direction of studying what models have learned from source code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

Demystifying What Code Summarization Models Learned

Study patterns that models have learned has long been a focus of pattern...
research
11/14/2021

Code Representation Learning with Prüfer Sequences

An effective and efficient encoding of the source code of a computer pro...
research
03/18/2022

M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

Source code summarization aims to generate natural language descriptions...
research
03/21/2021

Language-Agnostic Representation Learning of Source Code from Structure and Context

Source code (Context) and its parsed abstract syntax tree (AST; Structur...
research
10/23/2018

Interpreting Black Box Predictions using Fisher Kernels

Research in both machine learning and psychology suggests that salient e...
research
08/16/2023

Epicure: Distilling Sequence Model Predictions into Patterns

Most machine learning models predict a probability distribution over con...
research
12/29/2020

SIT3: Code Summarization with Structure-Induced Transformer

Code summarization (CS) is becoming a promising area in recent natural l...

Please sign up or login with your details

Forgot password? Click here to reset