Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

09/12/2023
by   Pedro Valero-Lara, et al.
0

We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous work that is based on the OpenAI Codex, which is a descendant of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric. Llama-2 has a simplified model that shows competitive or even superior accuracy. We also report on the differences between these foundational large language models as generative AI continues to redefine human-computer interactions. Overall, Copilot generates codes that are more reliable but less optimized, whereas codes generated by Llama-2 are less reliable but more optimized when correct.

READ FULL TEXT
research
06/27/2023

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

We evaluate AI-assisted generative capabilities on fundamental numerical...
research
07/07/2023

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models

We present ongoing work on a new automatic code generation approach for ...
research
06/29/2020

OpenDVC: An Open Source Implementation of the DVC Video Compression Method

We introduce an open source Tensorflow implementation of the Deep Video ...
research
08/18/2022

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

AlphaCode is a code generation system for assisting software developers ...
research
11/16/2010

Fast GPGPU Data Rearrangement Kernels using CUDA

Many high performance-computing algorithms are bandwidth limited, hence ...
research
06/25/2021

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

In this paper, we present FLASH 1.0, a C++-based software framework for ...
research
11/15/2019

Corrfunc: Blazing fast correlation functions with AVX512F SIMD Intrinsics

Correlation functions are widely used in extra-galactic astrophysics to ...

Please sign up or login with your details

Forgot password? Click here to reset