Temporal-aware Language Representation Learning From Crowdsourced Labels

07/15/2021
by   Yang Hao, et al.
0

Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose TACMA, a temporal-aware language representation learning heuristic for crowdsourced labels with multiple annotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at <https://github.com/CrowdsourcingMining/TACMA>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2020

NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels

Representation learning approaches require a massive amount of discrimin...
research
07/18/2019

Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study

Learning representation has been proven to be helpful in numerous machin...
research
09/23/2020

Representation Learning from Limited Educational Data with Crowdsourced Labels

Representation learning has been proven to play an important role in the...
research
03/01/2022

Towards IID representation learning and its application on biomedical data

Due to the heterogeneity of real-world data, the widely accepted indepen...
research
09/02/2022

Neighborhood-aware Scalable Temporal Network Representation Learning

Temporal networks have been widely used to model real-world complex syst...
research
02/09/2022

Learning to Bootstrap for Combating Label Noise

Deep neural networks are powerful tools for representation learning, but...
research
05/06/2021

Multi-Perspective LSTM for Joint Visual Representation Learning

We present a novel LSTM cell architecture capable of learning both intra...

Please sign up or login with your details

Forgot password? Click here to reset