
High performance and energy efficient inference for deep learning on ARM processors
We evolve PyDTNN, a framework for distributed parallel training of Deep ...
read it

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
This work is based on the seminar titled “Resiliency in Numerical Algori...
read it

Compressed Basis GMRES on High Performance GPUs
Krylov methods provide a fast and highly parallel numerical tool for the...
read it

Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing
In this paper, we present Ginkgo, a modern C++ math library for scientif...
read it

Reproducibility of Parallel Preconditioned Conjugate Gradient in Hybrid Programming Environments
The Preconditioned Conjugate Gradient method is often employed for the s...
read it

High Performance and Portable Convolution Operators for ARMbased Multicore Processors
The considerable impact of Convolutional Neural Networks on many Artific...
read it

DMR API: Improving cluster productivity by turning applications into malleable
Adaptive workloads can change on–the–fly the configuration of their jobs...
read it

Exploiting nested taskparallelism in the ℋLU factorization
We address the parallelization of the LU factorization of hierarchical m...
read it

Programming Parallel Dense Matrix Factorizations with LookAhead and OpenMP
We investigate a parallelization strategy for dense matrix factorization...
read it

LookAhead in the TwoSided Reduction to Compact Band Forms for Symmetric Eigenvalue Problems and the SVD
We address the reduction to compact band forms, via unitary similarity t...
read it

A Case for Malleable ThreadLevel Linear Algebra Libraries: The LU Factorization with Partial Pivoting
We propose two novel techniques for overcoming loadimbalance encountere...
read it

MultiThreaded Dense Linear Algebra Libraries for LowPower Asymmetric Multicore Processors
Dense linear algebra libraries, such as BLAS and LAPACK, provide a relev...
read it
Enrique S. QuintanaOrtí
is this you? claim profile