Out-of-Core and Distributed Algorithms for Dense Subtensor Mining

02/04/2018
by   Kijung Shin, et al.
0

How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods have low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-CUBE, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, D-CUBE is (1) Memory Efficient: requires up to 1,600X less memory and handles 1,000X larger data (2.6TB), (2) Fast: up to 7X faster due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately.

READ FULL TEXT
research
06/11/2017

DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams

Consider a stream of retweet events - how can we spot fraudulent lock-st...
research
03/23/2021

CubeFlow: Money Laundering Detection with Coupled Tensors

Money laundering (ML) is the behavior to conceal the source of money ach...
research
12/03/2020

AugSplicing: Synchronized Behavior Detection in Streaming Tensors

How can we track synchronized behavior in a stream of time-stamped tuple...
research
10/20/2020

a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs

Tucker decomposition is one of the most popular models for analyzing and...
research
10/06/2017

Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) f...
research
07/11/2023

Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network

Sparse tensor decomposition and completion are common in numerous applic...
research
02/19/2022

Distributed non-negative RESCAL with Automatic Model Selection for Exascale Data

With the boom in the development of computer hardware and software, soci...

Please sign up or login with your details

Forgot password? Click here to reset