-
Deep Active Learning via Open Set Recognition
In many applications, data is easy to acquire but expensive and time con...
read it
-
Task-Aware Variational Adversarial Active Learning
Deep learning has achieved remarkable performance in various tasks thank...
read it
-
Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data
The process of collecting and annotating training data may introduce dis...
read it
-
Active Learning for Visual Question Answering: An Empirical Study
We present an empirical study of active learning for Visual Question Ans...
read it
-
Active Learning under Label Shift
Distribution shift poses a challenge for active data collection in the r...
read it
-
Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring
We propose a new application of embedding techniques for problem retriev...
read it
-
Diverse Complexity Measures for Dataset Curation in Self-driving
Modern self-driving autonomy systems heavily rely on deep learning. As a...
read it
On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks
Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e.g., 99.99% of examples are negatives). In contrast, many recent datasets heuristically choose examples to ensure label balance. We show that these heuristics lead to trained models that generalize poorly: State-of-the art models trained on QQP and WikiQA each have only 2.4% average precision when evaluated on realistically imbalanced test data. We instead collect training data with active learning, using a BERT-based embedding model to efficiently retrieve uncertain points from a very large pool of unlabeled utterance pairs. By creating balanced training data with more informative negative examples, active learning greatly improves average precision to 32.5% on QQP and 20.1% on WikiQA.
READ FULL TEXT
Comments
There are no comments yet.