One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

03/02/2019
by   Fatih Taşyaran, et al.
0

Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

(Learned) Frequency Estimation Algorithms under Zipfian Distribution

The frequencies of the elements in a data stream are an important statis...
research
11/06/2021

Frequency Estimation with One-Sided Error

Frequency estimation is one of the most fundamental problems in streamin...
research
04/27/2018

Buffered Count-Min Sketch on SSD: Theory and Experiments

Frequency estimation data structures such as the count-min sketch (CMS) ...
research
04/01/2022

Double-Hashing Algorithm for Frequency Estimation in Data Streams

Frequency estimation of elements is an important task for summarizing da...
research
03/02/2021

DM algorithms in healthindustry

This survey reviews several approaches of data mining (DM) in healthindu...
research
11/20/2019

Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers

The minimum distance of a linear code is a key concept in information th...
research
07/17/2020

Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme

We present a novel approach for the problem of frequency estimation in d...

Please sign up or login with your details

Forgot password? Click here to reset