A Bag-of-Prototypes Representation for Dataset-Level Applications

03/23/2023
by   Weijie Tu, et al.
0

This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty. The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model. Central to the two tasks is measuring the underlying relationship between datasets. This needs a desirable dataset vectorization scheme, which should preserve as much discriminative dataset information as possible so that the distance between the resulting dataset vectors can reflect dataset-to-dataset similarity. To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image-level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. Specifically, we develop a codebook consisting of K prototypes clustered from a reference dataset. Given a dataset to be encoded, we quantize each of its image features to a certain prototype in the codebook and obtain a K-dimensional histogram. Without assuming access to dataset labels, the BoP representation provides a rich characterization of the dataset semantic distribution. Furthermore, BoP representations cooperate well with Jensen-Shannon divergence for measuring dataset-to-dataset similarity. Although very simple, BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.

READ FULL TEXT
research
03/07/2018

A bag-to-class divergence approach to multiple-instance learning

In multi-instance (MI) learning, each object (bag) consists of multiple ...
research
03/30/2019

Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions

This paper presents a neural relation extraction method to deal with the...
research
09/11/2022

Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset

Speech is inherently continuous, where discrete words, phonemes and othe...
research
09/22/2013

Multiple Instance Learning with Bag Dissimilarities

Multiple instance learning (MIL) is concerned with learning from sets (b...
research
12/03/2015

Bag Reference Vector for Multi-instance Learning

Multi-instance learning (MIL) has a wide range of applications due to it...
research
03/28/2023

Iteratively Coupled Multiple Instance Learning from Instance to Bag Classifier for Whole Slide Image Classification

Whole Slide Image (WSI) classification remains a challenge due to their ...
research
02/06/2014

Quantile Representation for Indirect Immunofluorescence Image Classification

In the diagnosis of autoimmune diseases, an important task is to classif...

Please sign up or login with your details

Forgot password? Click here to reset