Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities over Time

11/22/2018
by   Pinghui Wang, et al.
0

Online monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited websites) are monitored to report network anomalies. These real-world graph streams may contain user-item duplicates and have a huge number of distinct user-item pairs, therefore, it is infeasible to exactly compute user cardinalities when memory and computation resources are limited.Existing methods are designed to approximately estimate user cardinalities, whose accuracy highly depends on parameters that are not easy to set. Moreover, these methods cannot provide anytime-available estimation, as the user cardinalities are computed at the end of the data stream. Real-time applications such as anomaly detection require that user cardinalities are estimated on the fly. To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users' connected items respectively. Compared with previous bit and register sharing methods, our algorithms exploit the dynamic properties of the bit and register arrays (e.g., the fraction of zero bits in the bit array at each time) to significantly improve the estimation accuracy, and have low time complexity (O(1)) to update the estimations each time they observe a new user-item pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.

READ FULL TEXT
research
01/03/2019

A Fast Sketch Method for Mining User Similarities over Fully Dynamic Graph Streams

Many real-world networks such as Twitter and YouTube are given as fully ...
research
12/16/2019

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate t...
research
10/15/2022

Parameter-free Dynamic Graph Embedding for Link Prediction

Dynamic interaction graphs have been widely adopted to model the evoluti...
research
09/04/2018

Fast and Accurate Graph Stream Summarization

A graph stream is a continuous sequence of data items, in which each ite...
research
03/07/2023

Fast and Multi-aspect Mining of Complex Time-stamped Event Streams

Given a huge, online stream of time-evolving events with multiple attrib...
research
01/28/2020

Estimating Descriptors for Large Graphs

Embedding networks into a fixed dimensional Euclidean feature space, whi...
research
01/06/2022

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Stream monitoring is fundamental in many data stream applications, such ...

Please sign up or login with your details

Forgot password? Click here to reset