Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

11/06/2015
by   Sandra Catalán, et al.
0

Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures,the adaption of these libraries for asymmetric multicore processors (AMPs)is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the BLIS framework, and tailored for AMPs equipped with two types of cores: fast/power hungry versus slow/energy efficient. For this purpose, we integrate coarse-grain and fine-grain parallelization strategies into the library routines which, respectively, dynamically distribute the workload between the two core types and statically repartition this work among the cores of the same type. Our results on an ARM big.LITTLE processor embedded in the Exynos 5422 SoC, using the asymmetry-aware version of the BLAS and a plain migration of the legacy version of LAPACK, experimentally assess the benefits, limitations, and potential of this approach.

READ FULL TEXT

page 7

page 12

research
06/30/2020

Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

In this paper, we present Ginkgo, a modern C++ math library for scientif...
research
04/08/2017

BLASFEO: basic linear algebra subroutines for embedded optimization

BLASFEO is a dense linear algebra library providing high-performance imp...
research
08/07/2021

Asymmetry-aware Scalable Locking

The pursuit of power-efficiency is popularizing asymmetric multicore pro...
research
08/11/2022

Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs

General Matrix Multiplication (GEMM) has a wide range of applications in...
research
04/27/2023

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

This paper advocates for an intertwined design of the dense linear algeb...
research
03/14/2019

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

IoT Edge intelligence requires Convolutional Neural Network (CNN) infere...
research
04/12/2016

BoxLib with Tiling: An AMR Software Framework

In this paper we introduce a block-structured adaptive mesh refinement (...

Please sign up or login with your details

Forgot password? Click here to reset