Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

02/19/2022
by   Manish Bhattarai, et al.
0

With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been an exponential growth in the volume of data produced around the world. Among these data, relational datasets are growing in popularity as they provide unique insights regarding the evolution of communities and their interactions. Relational datasets are naturally non-negative, sparse, and extra-large. Relational data usually contain triples, (subject, relation, object), and are represented as graphs/multigraphs, called knowledge graphs, which need to be embedded into a low-dimensional dense vector space. Among various embedding models, RESCAL allows learning of relational data to extract the posterior distributions over the latent variables and to make predictions of missing relations. However, RESCAL is computationally demanding and requires a fast and distributed implementation to analyze extra-large real-world datasets. Here we introduce a distributed non-negative RESCAL algorithm for heterogeneous CPU/GPU architectures with automatic selection of the number of latent communities (model selection), called pyDRESCALk. We demonstrate the correctness of pyDRESCALk with real-world and large synthetic tensors, and the efficacy showing near-linear scaling that concurs with the theoretical complexities. Finally, pyDRESCALk determines the number of latent communities in an 11-terabyte dense and 9-exabyte sparse synthetic tensor.

READ FULL TEXT

page 13

page 14

page 18

page 20

page 22

page 23

research
08/04/2020

Distributed Non-Negative Tensor Train Decomposition

The era of exascale computing opens new venues for innovations and disco...
research
10/03/2022

Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection

Monitoring of industrial processes is a critical capability in industry ...
research
06/24/2014

Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs

We consider a problem of grouping multiple graphs into several clusters ...
research
04/04/2021

Non-negative matrix and tensor factorisations with a smoothed Wasserstein loss

Non-negative matrix and tensor factorisations are a classical tool in ma...
research
02/19/2022

Distributed Out-of-Memory NMF of Dense and Sparse Data on CPU/GPU Architectures with Automatic Model Selection for Exascale Data

The need for efficient and scalable big-data analytics methods is more e...
research
09/20/2023

A Spike-and-Slab Prior for Dimension Selection in Generalized Linear Network Eigenmodels

Latent space models (LSMs) are frequently used to model network data by ...
research
02/04/2018

Out-of-Core and Distributed Algorithms for Dense Subtensor Mining

How can we detect fraudulent lockstep behavior in large-scale multi-aspe...

Please sign up or login with your details

Forgot password? Click here to reset