-
Algorithmic patterns for H-matrices on many-core processors
In this work, we consider the reformulation of hierarchical (H) matrix a...
read it
-
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra
Sparse-dense linear algebra is crucial in many domains, but challenging ...
read it
-
Compressed Basis GMRES on High Performance GPUs
Krylov methods provide a fast and highly parallel numerical tool for the...
read it
-
Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations
GPU accelerators have become an important backbone for scientific high p...
read it
-
Optimization of Tensor-product Operations in Nekbone on GPUs
In the CFD solver Nek5000, the computation is dominated by the evaluatio...
read it
-
FALKON: An Optimal Large Scale Kernel Method
Kernel methods provide a principled way to perform non linear, nonparame...
read it
-
Implementing Push-Pull Efficiently in GraphBLAS
We factor Beamer's push-pull, also known as direction-optimized breadth-...
read it
Kernel methods through the roof: handling billions of points efficiently
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance. Additionally, we make our software available as an easy to use library.
READ FULL TEXT
Comments
There are no comments yet.