From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice

09/02/2023
by   Domenico Amato, et al.
0

This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance of Table Search Procedures with a sorted layout only, e.g., Binary Search. We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary, for instance, Balanced Binary Search Trees or Binary Search on layouts other that sorted, i.e., Eytzinger. Theoretically, based on it, we obtain several results of interest, such as (a) the first Learned Optimum Binary Search Forest, with mean access time bounded by the Entropy of the probability distribution of the accesses to the Dictionary; (b) the first Learned Sorted Set Dictionary that, in the Dynamic Case and in an amortized analysis setting, matches the same time bounds known for Classic Dictionaries. This latter under widely accepted assumptions regarding the size of the Universe. The experimental part, somewhat complex in terms of software development, clearly indicates the nonobvious finding that the generalization we propose can yield effective and competitive Learned Data Structural Booster, even with respect to specific benchmark models.

READ FULL TEXT
research
06/24/2022

Learning Augmented Binary Search Trees

A treap is a classic randomized binary search tree data structure that i...
research
07/20/2020

Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

Sorted Table Search Procedures are the quintessential query-answering to...
research
07/19/2021

Learned Sorted Table Search and Static Indexes in Small Model Space

Machine Learning Techniques, properly combined with Data Structures, hav...
research
02/21/2022

On the Suitability of Neural Networks as Building Blocks for The Design of Efficient Learned Indexes

With the aim of obtaining time/space improvements in classic Data Struct...
research
06/14/2019

Dynamic Path-Decomposed Tries

A keyword dictionary is an associative array whose keys are strings. Rec...
research
12/13/2021

On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters

Bloom Filters are a fundamental and pervasive data structure. Within the...

Please sign up or login with your details

Forgot password? Click here to reset