High-Performance Level-1 and Level-2 BLAS

08/04/2021
by   Amit Singh, et al.
0

The introduction of the Basic Linear Algebra Subroutine (BLAS) in the 1970s paved the way for different libraries to solve the same problem with an improved approach and hardware. The new BLAS implementation led to High-Performance Computing (HPC) innovation. All the love went to the level 3 BLAS due to its humongous application in different fields, not bounded by computer science. However, level 1 and level 2 got neglected; we tried to solve the problem by introducing the new algorithm for the Vector-Vector dot product, Vector-Vector outer product and Matrix-Vector product, which improves the performance of these operations in a significant way. We are not introducing any library but algorithms, which improves upon the current state of art algorithms. Also, we rely on the FMA instruction, OpenMP, and the compiler to optimize the code rather than implementing the algorithm in assembly. Therefore, our current implementation is machine oblivious and depends on the compilers ability to optimize the code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2016

Automating the Last-Mile for High Performance Dense Linear Algebra

High performance dense linear algebra (DLA) libraries often rely on a ge...
research
03/13/2020

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Achieving high-performance GPU kernels requires optimizing algorithm imp...
research
05/15/2023

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

The resurgence of machine learning has increased the demand for high-per...
research
10/21/2022

A portable coding strategy to exploit vectorization on combustion simulations

The complexity of combustion simulations demands the latest high-perform...
research
05/12/2018

Program Generation for Small-Scale Linear Algebra Applications

We present SLinGen, a program generation system for linear algebra. The ...
research
08/13/2022

Tensor Algebra on an Optoelectronic Microchip

Tensor algebra lies at the core of computational science and machine lea...
research
06/02/2020

Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

Xilinx's AI Engine is a recent industry example of energy-efficient vect...

Please sign up or login with your details

Forgot password? Click here to reset