Generalized Deduplication: Bounds, Convergence, and Asymptotic Properties

01/09/2019
by   Rasmus Vestergaard, et al.
0

We study a generalization of deduplication, which enables lossless deduplication of highly similar data and show that standard deduplication with fixed chunk length is a special case. We provide bounds on the expected length of coded sequences for generalized deduplication and show that the coding has asymptotic near-entropy cost under the proposed source model. More importantly, we show that generalized deduplication allows for multiple orders of magnitude faster convergence than standard deduplication. This means that generalized deduplication can provide compression benefits much earlier than standard deduplication, which is key in practical systems. Numerical examples demonstrate our results, showing that our lower bounds are achievable, and illustrating the potential gain of using the generalization over standard deduplication. In fact, we show that even for the simplest case of generalized deduplication, the gain in convergence speed is linear with the size of the data chunks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2013

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

In this paper, we present the Bennett-type generalization bounds of the ...
research
07/30/2020

Determination of 2-Adic Complexity of Generalized Binary Sequences of Order 2

The generalized binary sequences of order 2 have been used to construct ...
research
06/21/2022

Supermodular f-divergences and bounds on lossy compression and generalization error with mutual f-information

In this paper, we introduce super-modular -divergences and provide three...
research
03/13/2013

Generalizing Jeffrey Conditionalization

Jeffrey's rule has been generalized by Wagner to the case in which new e...
research
01/13/2020

On the Gap between Scalar and Vector Solutions of Generalized Combination Networks

We study scalar-linear and vector-linear solutions to the generalized co...
research
02/14/2018

Stronger generalization bounds for deep nets via a compression approach

Deep nets generalize well despite having more parameters than the number...
research
07/30/2018

Faster Convergence & Generalization in DNNs

Deep neural networks have gained tremendous popularity in last few years...

Please sign up or login with your details

Forgot password? Click here to reset