On Data-centric Myths

11/22/2021
by   Antonia Marcu, et al.
0

The community lacks theory-informed guidelines for building good data sets. We analyse theoretical directions relating to what aspects of the data matter and conclude that the intuitions derived from the existing literature are incorrect and misleading. Using empirical counter-examples, we show that 1) data dimension should not necessarily be minimised and 2) when manipulating data, preserving the distribution is inessential. This calls for a more data-aware theoretical understanding. Although not explored in this work, we propose the study of the impact of data modification on learned representations as a promising research direction.

READ FULL TEXT
research
05/23/2023

Provably Learning Object-Centric Representations

Learning structured representations of the visual world in terms of obje...
research
10/25/2021

Poisson-modification of the Quasi Lindley distribution and its zero modification for over-dispersed count data

In this paper, an alternative mixed Poisson distribution is proposed by ...
research
10/26/2021

On the Effects of Data Distortion on Model Analysis and Training

Data modification can introduce artificial information. It is often assu...
research
12/22/2018

Distributed sequential method for analyzing massive data

To analyse a very large data set containing lengthy variables, we adopt ...
research
02/16/2023

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Prompting has become an important mechanism by which users can more effe...

Please sign up or login with your details

Forgot password? Click here to reset