Efficient algorithms for collecting the statistics of large-scale IP address data

08/09/2021
by   Jie Chen, et al.
0

Compiling the statistics of large-scale IP address data is an essential task in network traffic measurement. The statistical results are used to evaluate the potential impact of user behaviors on network traffic. This requires algorithms that are capable of storing and retrieving a high volume of IP addresses within time and memory constraints. In this paper, we present two efficient algorithms for collecting the statistics of large-scale IP addresses that balance time efficiency and memory consumption. The proposed solutions take into account the sparse nature of the statistics of IP addresses while building the hash function and maintain a dynamic balance among layered memory blocks. There are two layers in the first proposed method, each of which contains a limited number of memory blocks. Each memory block contains 256 elements of size 256 × 8 bytes for a 64-bit system. In contrast to built-in hash mapping functions, the proposed solution completely avoids expensive hash collisions while retaining the linear time complexity of hash-based solutions. Moreover, the mechanism dynamically determines the hash index length according to the range of IP addresses, and can balance the time and memory constraints. In addition, we propose an efficient parallel scheme to speed up the collection of statistics. The experimental results on several synthetic datasets show that the proposed method substantially outperforms the baselines with respect to time and memory space efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

Optimal Hashing in External Memory

Hash tables are a ubiquitous class of dictionary data structures. Howeve...
research
12/30/2022

Detecting TCP Packet Reordering in the Data Plane

Network administrators want to detect TCP-level packet reordering to dia...
research
05/13/2020

Practical Hash-based Anonymity for MAC Addresses

Given that a MAC address can uniquely identify a person or a vehicle, co...
research
11/12/2022

We have to go back: A Historic IP Attribution Service for Network Measurement

Researchers and practitioners often face the issue of having to attribut...
research
11/06/2019

Polymorphic Encryption and Pseudonymisation of IP Network Flows

We describe a system, PEP3, for storage and retrieval of IP flow informa...
research
06/05/2023

Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Locality-sensitive hashing (LSH) based frameworks have been used efficie...
research
10/05/2022

SHINE-Mapping: Large-Scale 3D Mapping Using Sparse Hierarchical Implicit Neural Representations

Accurate mapping of large-scale environments is an essential building bl...

Please sign up or login with your details

Forgot password? Click here to reset