Demystifying Code Summarization Models

by   Yu Wang, et al.

The last decade has witnessed a rapid advance in machine learning models. While the black-box nature of these systems allows powerful predictions, it cannot be directly explained, posing a threat to the continuing democratization of machine learning technology. Tackling the challenge of model explainability, research has made significant progress in demystifying the image classification models. In the same spirit of these works, this paper studies code summarization models, particularly, given an input program for which a model makes a prediction, our goal is to reveal the key features that the model uses for predicting the label of the program. We realize our approach in HouYi, which we use to evaluate four prominent code summarization models: extreme summarizer, code2vec, code2seq, and sequence GNN. Results show that all models base their predictions on syntactic and lexical properties with little to none semantic implication. Based on this finding, we present a novel approach to explaining the predictions of code summarization models through the lens of training data. Our work opens up this exciting, new direction of studying what models have learned from source code.



There are no comments yet.


page 1

page 2

page 3

page 4


Code Representation Learning with Prüfer Sequences

An effective and efficient encoding of the source code of a computer pro...

M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

Source code summarization aims to generate natural language descriptions...

Language-Agnostic Representation Learning of Source Code from Structure and Context

Source code (Context) and its parsed abstract syntax tree (AST; Structur...

A Transformer-based Approach for Source Code Summarization

Generating a readable summary that describes the functionality of a prog...

Interpreting Black Box Predictions using Fisher Kernels

Research in both machine learning and psychology suggests that salient e...

Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations

Code retrieval and summarization are useful tasks for developers, but it...

Extending the R Language with a Scalable Matrix Summarization Operator

Analysts prefer simpler interpreted languages to program their computati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.