ALX: Large Scale Matrix Factorization on TPUs

12/03/2021
by   Harsh Mehta, et al.
11

We present ALX, an open-source library for distributed matrix factorization using Alternating Least Squares, written in JAX. Our design allows for efficient use of the TPU architecture and scales well to matrix factorization problems of O(B) rows/columns by scaling the number of available TPU cores. In order to spur future research on large scale matrix factorization methods and to illustrate the scalability properties of our own implementation, we also built a real world web link prediction dataset called WebGraph. This dataset can be easily modeled as a matrix factorization problem. We created several variants of this dataset based on locality and sparsity properties of sub-graphs. The largest variant of WebGraph has around 365M nodes and training a single epoch finishes in about 20 minutes with 256 TPU cores. We include speed and performance numbers of ALX on all variants of WebGraph. Both the framework code and the dataset is open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2018

NIMFA: A Python Library for Nonnegative Matrix Factorization

NIMFA is an open-source Python library that provides a unified interface...
research
04/04/2019

SMURFF: a High-Performance Framework for Matrix Factorization

Bayesian Matrix Factorization (BMF) is a powerful technique for recommen...
research
10/22/2014

A Parallel and Efficient Algorithm for Learning to Match

Many tasks in data mining and related fields can be formalized as matchi...
research
10/26/2021

iALS++: Speeding up Matrix Factorization with Subspace Optimization

iALS is a popular algorithm for learning matrix factorization models fro...
research
04/06/2020

A High-Performance Implementation of Bayesian Matrix Factorization with Limited Communication

Matrix factorization is a very common machine learning technique in reco...
research
05/28/2023

Heterogeneous Matrix Factorization: When Features Differ by Datasets

In myriad statistical applications, data are collected from related but ...
research
12/06/2017

Exchangeable modelling of relational data: checking sparsity, train-test splitting, and sparse exchangeable Poisson matrix factorization

A variety of machine learning tasks---e.g., matrix factorization, topic ...

Please sign up or login with your details

Forgot password? Click here to reset