LS-CAT: A Large-Scale CUDA AutoTuning Dataset

03/26/2021
by   Lars Bjertnes, et al.
0

The effectiveness of Machine Learning (ML) methods depend on access to large suitable datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 CUDA kernels focused on linear algebra. In addition to the CUDA codes, our LS-CAT dataset contains 5 028 536 associated runtimes, with different combinations of kernels, block sizes and matrix sizes. The runtime are GPU benchmarks on both Nvidia GTX 980 and Nvidia T4 systems. This information creates a foundation upon which NLP-based models can find correlations between source-code features and optimal choice of thread block sizes. There are several results that can be drawn out of our LS-CAT database. E.g., our experimental results show that an optimal choice in thread block size can gain an average of 6 performance increase can be achieved in general, finding that in 10 cases more than 20 block. A description of current and future work is also included.

READ FULL TEXT
research
02/16/2022

BB-ML: Basic Block Performance Prediction using Machine Learning Techniques

Recent years have seen the adoption of Machine Learning (ML) techniques ...
research
03/08/2023

Defectors: A Large, Diverse Python Dataset for Defect Prediction

Defect prediction has been a popular research topic where machine learni...
research
04/11/2023

Machine learning for structure-property relationships: Scalability and limitations

We present a scalable machine learning (ML) framework for predicting int...
research
09/26/2022

Prayatul Matrix: A Direct Comparison Approach to Evaluate Performance of Supervised Machine Learning Models

Performance comparison of supervised machine learning (ML) models are wi...
research
08/30/2022

Towards making the most of NLP-based device mapping optimization for OpenCL kernels

Nowadays, we are living in an era of extreme device heterogeneity. Despi...
research
08/01/2023

GRDD: A Dataset for Greek Dialectal NLP

In this paper, we present a dataset for the computational study of a num...
research
11/02/2021

Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density

We study the effectiveness of Feature Density (FD) using different lingu...

Please sign up or login with your details

Forgot password? Click here to reset