Cobol2Vec: Learning Representations of Cobol code

01/24/2022
by   Ankit Kulshrestha, et al.
0

There has been a steadily growing interest in development of novel methods to learn a representation of a given input data and subsequently using them for several downstream tasks. The field of natural language processing has seen a significant improvement in different tasks by incorporating pre-trained embeddings into their pipelines. Recently, these methods have been applied to programming languages with a view to improve developer productivity. In this paper, we present an unsupervised learning approach to encode old mainframe languages into a fixed dimensional vector space. We use COBOL as our motivating example and create a corpus and demonstrate the efficacy of our approach in a code-retrieval task on our corpus.

READ FULL TEXT
research
05/18/2021

CoTexT: Multi-task Learning with Code-Text Transformer

We present CoTexT, a pre-trained, transformer-based encoder-decoder mode...
research
10/03/2022

ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expres...
research
02/05/2023

Exploring Data Augmentation for Code Generation Tasks

Advances in natural language processing, such as transfer learning from ...
research
02/15/2021

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Recent advances in self-supervised learning have dramatically improved t...
research
12/28/2020

BURT: BERT-inspired Universal Representation from Learning Meaningful Segment

Although pre-trained contextualized language models such as BERT achieve...
research
03/29/2019

Using Structured Input and Modularity for Improved Learning

We describe a method for utilizing the known structure of input data to ...
research
01/09/2019

High Fidelity Vector Space Models of Structured Data

Machine learning systems regularly deal with structured data in real-wor...

Please sign up or login with your details

Forgot password? Click here to reset