Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

08/11/2018
by   Wei Tan, et al.
0

Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability e.g. easy to handle positive-unlabeled inputs, fast convergence and parallelization capability. Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficient. This is because a single machine provides limited compute power for large-scale data while multiple machines suffer from the network communication bottleneck. To address the aforementioned challenge, accelerating ALS on graphics processing units (GPUs) is a promising direction. We propose the novel approach in enhancing the MF efficiency via both memory optimization and approximate computing. The former exploits GPU memory hierarchy to increase data reuse, while the later reduces unnecessary computing without hurting the convergence of learning algorithms. Extensive experiments on large-scale datasets show that our solution not only outperforms the competing CPU solutions by a large margin but also has a 2x-4x performance gain compared to the state-of-the-art GPU solutions. Our implementations are open-sourced and publicly available.

READ FULL TEXT
research
01/02/2019

BMF: Block matrix approach to factorization of large scale data

Matrix Factorization (MF) on large scale matrices is computationally as ...
research
01/27/2022

Efficient hybrid topology optimization using GPU and homogenization based multigrid approach

We propose a new hybrid topology optimization algorithm based on multigr...
research
03/13/2018

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fac...
research
06/14/2021

Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data

While the advances in synchrotron light sources, together with the devel...
research
07/31/2023

Accelerating Optimal Power Flow with GPUs: SIMD Abstraction of Nonlinear Programs and Condensed-Space Interior-Point Methods

This paper introduces a novel computational framework for solving altern...
research
06/24/2020

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning an...
research
04/02/2020

VoxCap: FFT-Accelerated and Tucker-Enhanced Capacitance Extraction Simulator for Voxelized Structures

VoxCap, a fast Fourier transform (FFT)-accelerated and Tucker-enhanced i...

Please sign up or login with your details

Forgot password? Click here to reset