Graceful Forgetting II. Data as a Process

11/20/2022
by   Alain de Cheveigné, et al.
0

Data are rapidly growing in size and importance for society, a trend motivated by their enabling power. The accumulation of new data, sustained by progress in technology, leads to a boundless expansion of stored data, in some cases with an exponential increase in the accrual rate itself. Massive data are hard to process, transmit, store, and exploit, and it is particularly hard to keep abreast of the data store as a whole. This paper distinguishes three phases in the life of data: acquisition, curation, and exploitation. Each involves a distinct process, that may be separated from the others in time, with a different set of priorities. The function of the second phase, curation, is to maximize the future value of the data given limited storage. I argue that this requires that (a) the data take the form of summary statistics and (b) these statistics follow an endless process of rescaling. The summary may be more compact than the original data, but its data structure is more complex and it requires an on-going computational process that is much more sophisticated than mere storage. Rescaling results in dimensionality reduction that may be beneficial for learning, but that must be carefully controlled to preserve relevance. Rescaling may be tuned based on feedback from usage, with the proviso that our memory of the past serves the future, the needs of which are not fully known.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Scheduling of Sensor Transmissions Based on Value of Information for Summary Statistics

The optimization of Value of Information (VoI) in sensor networks integr...
research
01/19/2021

Selection of Summary Statistics for Network Model Choice with Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) now serves as one of the major st...
research
12/21/2018

Fast post-hoc method for updating moments of large datasets

Moments of large datasets utilise the mean of the dataset; consequently,...
research
02/11/2019

CPOI: A Compact Method to Archive Versioned RDF Triple-Sets

Large amounts of RDF/S data are produced and published lately, and sever...
research
06/08/2023

Fully Robust Federated Submodel Learning in a Distributed Storage System

We consider the federated submodel learning (FSL) problem in a distribut...
research
08/18/2019

A New Fast Computation of a Permanent

This paper proposes a general algorithm called Store-zechin for quickly ...
research
08/10/2023

Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception

We depend on our own memory to encode, store, and retrieve our experienc...

Please sign up or login with your details

Forgot password? Click here to reset