Low-Budget Unsupervised Label Query through Domain Alignment Enforcement

01/01/2020
by   Jurandy Almeida, et al.
0

Deep learning revolution happened thanks to the availability of a massive amount of labelled data which have contributed to the development of models with extraordinary inference capabilities. Despite the public availability of a large quantity of datasets, it is often necessary to generate a new set of labelled data to address specific requirements. In addition, the production of labels is costly and sometimes it requires a specific expertise to be fulfilled. In this work, we introduce a new problem called low budget unsupervised label query that consists in a model trained to suggests to the user a set of samples to be labelled, from a completely unlabelled dataset, to maximize the classification accuracy on that dataset. We propose to adopt a domain alignment model, modified to enforce consistency, to align a known dataset (source) and the dataset to be labelled (target). Finally, we propose a novel sample selection method based on uniform entropy sampling, named UNFOLD, which is deterministic and steadily outperforms other baselines as well as competing models on a large variety of publicly available datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2020

Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation

In unsupervised domain adaptation (UDA), classifiers for the target doma...
research
07/04/2018

Multi-task Mid-level Feature Alignment Network for Unsupervised Cross-Dataset Person Re-Identification

Most existing person re-identification (Re-ID) approaches follow a super...
research
04/23/2022

Towards Data-Free Model Stealing in a Hard Label Setting

Machine learning models deployed as a service (MLaaS) are susceptible to...
research
08/04/2020

Prompt Agnostic Essay Scorer: A Domain Generalization Approach to Cross-prompt Automated Essay Scoring

Cross-prompt automated essay scoring (AES) requires the system to use no...
research
07/08/2019

Unsupervised Domain Alignment to Mitigate Low Level Dataset Biases

Dataset bias is a well-known problem in the field of computer vision. Th...
research
08/26/2022

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

End-to-end (E2E) models have become the default choice for state-of-the-...
research
12/10/2021

The Large Labelled Logo Dataset (L3D): A Multipurpose and Hand-Labelled Continuously Growing Dataset

In this work, we present the Large Labelled Logo Dataset (L3D), a multip...

Please sign up or login with your details

Forgot password? Click here to reset