Measuring Dataset Granularity

12/21/2019
by   Yin Cui, et al.
54

Despite the increasing visibility of fine-grained recognition in our field, "fine-grained” has thus far lacked a precise definition. In this work, building upon clustering theory, we pursue a framework for measuring dataset granularity. We argue that dataset granularity should depend not only on the data samples and their labels, but also on the distance function we choose. We propose an axiomatic framework to capture desired properties for a dataset granularity measure and provide examples of measures that satisfy these properties. We assess each measure via experiments on datasets with hierarchical labels of varying granularity. When measuring granularity in commonly used datasets with our measure, we find that certain datasets that are widely considered fine-grained in fact contain subsets of considerable size that are substantially more coarse-grained than datasets generally regarded as coarse-grained. We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.

READ FULL TEXT
research
08/29/2022

From Fine- to Coarse-Grained Dynamic Information Flow Control and Back, a Tutorial on Dynamic Information Flow

This tutorial provides a complete and homogeneous account of the latest ...
research
10/14/2022

Fine-grained Category Discovery under Coarse-grained supervision with Hierarchical Weighted Self-contrastive Learning

Novel category discovery aims at adapting models trained on known catego...
research
03/14/2022

Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

As far as Scene Graph Generation (SGG), coarse and fine predicates mix i...
research
03/29/2023

Towards Understanding the Effect of Pretraining Label Granularity

In this paper, we study how pretraining label granularity affects the ge...
research
04/14/2021

Virtines: Virtualization at Function Call Granularity

Virtual execution environments provide strong isolation, on-demand infra...
research
07/10/2020

Hypothetical Reasoning via Provenance Abstraction

Data analytics often involves hypothetical reasoning: repeatedly modifyi...
research
09/05/2022

Continuous Decomposition of Granularity for Neural Paraphrase Generation

While Transformers have had significant success in paragraph generation,...

Please sign up or login with your details

Forgot password? Click here to reset