Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models

02/07/2023
by   Shuzheng Gao, et al.
0

Previous research on code intelligence usually trains a deep learning model on a fixed dataset in an offline manner. However, in real-world scenarios, new code repositories emerge incessantly, and the carried new knowledge is beneficial for providing up-to-date code intelligence services to developers. In this paper, we aim at the following problem: How to enable code intelligence models to continually learn from ever-increasing data? One major challenge here is catastrophic forgetting, meaning that the model can easily forget knowledge learned from previous datasets when learning from the new dataset. To tackle this challenge, we propose REPEAT, a novel method for continual learning of code intelligence models. Specifically, REPEAT addresses the catastrophic forgetting problem with representative exemplars replay and adaptive parameter regularization. The representative exemplars replay component selects informative and diverse exemplars in each dataset and uses them to retrain model periodically. The adaptive parameter regularization component recognizes important parameters in the model and adaptively penalizes their changes to preserve the knowledge learned before. We evaluate the proposed approach on three code intelligence tasks including code summarization, software vulnerability detection, and code clone detection. Extensive experiments demonstrate that REPEAT consistently outperforms baseline methods on all tasks. For example, REPEAT improves the conventional fine-tuning method by 1.22, 5.61, and 1.72 on code summarization, vulnerability detection and clone detection, respectively.

READ FULL TEXT
research
08/28/2021

Prototypes-Guided Memory Replay for Continual Learning

Continual learning (CL) refers to a machine learning paradigm that using...
research
10/02/2020

Continual Learning for Natural Language Generation in Task-oriented Dialog Systems

Natural language generation (NLG) is an essential component of task-orie...
research
06/02/2021

Online Coreset Selection for Rehearsal-based Continual Learning

A dataset is a shred of crucial evidence to describe a task. However, ea...
research
07/23/2020

ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning for Session-based Recommendation

Session-based recommendation has received growing attention recently due...
research
10/21/2021

Center Loss Regularization for Continual Learning

The ability to learn different tasks sequentially is essential to the de...
research
06/26/2023

Continual Learning for Out-of-Distribution Pedestrian Detection

A continual learning solution is proposed to address the out-of-distribu...
research
08/07/2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Current fake audio detection algorithms have achieved promising performa...

Please sign up or login with your details

Forgot password? Click here to reset