Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++

07/15/2023
by   Bin Lei, et al.
0

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of × 5.1 for models with no prior coding knowledge and × 9.9 for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing. The dataset is available at https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2021

Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

Cant is important for understanding advertising, comedies and dog-whistl...
research
09/27/2021

GANiry: Bald-to-Hairy Translation Using CycleGAN

This work presents our computer vision course project called bald men-to...
research
08/02/2023

Do Multilingual Language Models Think Better in English?

Translate-test is a popular technique to improve the performance of mult...
research
08/09/2022

Learning to Improve Code Efficiency

Improvements in the performance of computing systems, driven by Moore's ...
research
05/22/2019

FQL: An Extensible Feature Query Language and Toolkit on Searching Software Characteristics for HPC Applications

The amount of large-scale scientific computing software is dramatically ...
research
07/30/2021

ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

We introduce ChrEnTranslate, an online machine translation demonstration...
research
06/27/2023

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

We evaluate AI-assisted generative capabilities on fundamental numerical...

Please sign up or login with your details

Forgot password? Click here to reset