1 Introduction
While real-world social media data is abundant, its annotation for important issues such as toxicity, hate-speech, and various facets of social bias is inherently challenging patton2019annotating ; modha2020tracking . These challenges stem from the sheer number of nuanced and evolving dimensions liu2019finding as well as inherent ambiguities in interpretation chen2018using leading to noisy labeling wang2016sentiment . Active and transfer learning approaches offer an ability to lower annotation effort by intelligently selecting the most informative examples to annotate farinneya2021active or by using existing labeled datasets zhuang2020comprehensive . However, most active learning approaches usually yield too few samples (on the order of hundreds) to feasibly fine-tune large deep-language models wang2016cost ; kasai2019low . In terms of transfer, fine-tuning on out-of-domain data can lead to detrimental domain shift ma2019domain . Furthermore, fine-tuning can also lead to over-fitting, especially in the case of smaller train sets, and to catastrophic forgetting of knowledge present in the pre-trained model fatemi2021improving . Hence, prior work mostly did not use PLMs in this setupfarinneya2021active ; zhao2021active .
Our Approach: In this work, we propose the use of few-shot instructions (textual prompts) with PLMs as a fine-tuning-free alternative. Our approach can be effective with few samples & is robust against social-media inherent labeling noise prabhumoye2021few , which is not well handled by fine-tuning-based approaches song2022learning . We propose an Active Transfer Few-shot Instructions (ATF) method for combining active learning (for selecting fewer samples to label) with transfer learning (for leveraging existing labeled datasets) under few-shot instructions with PLMs. Our method leverages the capacity of PLMs to 1) learn from a few examples in a few-shot fashion without fine-tuning and 2) transfer task knowledge from datasets already labeled under different definitions to further reduce the need for costly annotation. We experiment with transfer scenarios on 3 datasets across 8 labeling dimensions provided by crowd-sourcing and an existing state-of-the-art commercial tool - Perspective API PerspectiveAPI:online .
Prior work: Several recent works studied the use of few-shot instructions and in-context learning for lowering annotation efforts. They, however, focused either on sample-selection strategies su2022selective ; yu2022cold or improving few-shot performance of smaller models wang2021entailment ; gao2020making ; mishra2021reframing , but did not study transfer from existing pre-labeled datasets. Several works also employed various fine-tuning approaches in low resource settings kasai2019low ; lee2021good . Works attempting transfer with PLMs again turn to fine-tuning in a text-to-text format raffel2020exploring or attempt transfer in a few-shot setting by framing the problem as instruction tuning, where PLMs are fine-tuned on a collection of datasets described via instructions wei2021finetuned ; min2021metaicl . Our work is different from all these approaches, as we focus on transfer from prelabeled external datasets via few-shot instructions without fine-tuning under a basic random active learning setting. In fact, our ATF method can be used in synergy with better sample selection strategies proposed in su2022selective ; yu2022cold .
Findings: We find that using pre-labeled source-domain data can help improve classification results when very few examples from the target-domain are labeled. We further observe two scenarios: positive and negative transfer (3). When the positive transfer occurs, it leads to high AUC gains that are consistently sustained across model sizes (12.94% for 1.3b to 10.49% for 22b) and annotation sizes (16.02% for 100 to 10.19% for 2000 annotations). Negative transfer leads to small inconsistent gains that can turn into losses with larger model size (1.91% for 1.3b to -3.30% for 8.3b). We observe that as few as 100 target domain annotations can aid transfer increasing mean gain from 3.64% to 6.73%. However, the transfer gain diminishes with more labeled examples from the target-domain (at the expense of annotation effort), falling from 6.73% to 4.97% (Figure 3). Finally, we investigate the reasons behind positive and negative transfers and find that the higher the AUC the PLM can achieve with only target-domain data, the less gain can be expected from the transfer (***Pearson correlation coefficient=).
Contributions: In this work we offer the following contributions:
-
[leftmargin=*]
-
Novel adaptation of few-shot instructions to facilitate transfer learning without fine-tuning on PLMs under limited annotation resources - Active Transfer Few-shot Instructions (ATF).
-
Insights into the reasons for negative & positive transfers when attempting transfer learning with few-shot instructions and no fine-tuning.
2 Methodology
Few-shot Instructions:
We adapt the few-shot prompting approach detailed in prabhumoye2021few as shown in Figure 1; that is, given a query post from the test set, we present PLM with an input in the form , where we concatenate the tags Post, Question and Answer. We calculate the probabilities of the tokens Yes and No in the following manner: and . The token that has the higher probability is considered the prediction for the post. We define a support set as a set of labeled examples (which contains a mix of source and target domain posts) from which the shots for few-shot instructions are selected. To select the 32 shots (same as in prabhumoye2021few ), we sample class balanced exemplars from based on their semantic similarity with the query post , computed using cosine similarity in the common embedding space encoded using Term Frequency-Inverse Document Frequency (TF-IDF) representation TfIdfSciKit:online . We present the PLM with these exemplars as in-context samples using the same structure as . To adapt this approach to the transfer learning setting, we present the shots with the definition under which they were originally labeled (Table 2).


Transfer learning setup:
For the baselines, the support set consists of only a few target domain examples selected from a large unlabeled target domain dataset and labeled by an oracle (Figure 1). However, for the transfer learning experiments, we augment these target domain examples with the entirety of the pre-labeled source domain training dataset . Thus, shots are selected from this augmented support set .
Active learning setup:
We utilize an active learning setup to evaluate the performance of the model with varying target domain annotation sizes (Figure 1). To do so, we use the unlabeled pool scenario, in which we have a small amount of labeled target domain data and a large unlabeled target domain dataset . We simulate this scenario by first randomly sampling examples from to be labeled by an oracle. In subsequent iterations, we randomly sample from the remaining unlabeled data in to provide the model with support set with and labeled target domain examples. As we are randomly sampling from a large dataset, we repeat the entire pipeline five times for each experiment to ensure that our results are stable. We fix the random seed for each iteration of the pipeline for reproducibility and consistency across experiments.
Models:
For each task, we use off-the-shelf PLMs. We utilize the Megatron 1.3B parameter model (Meg1.3b), Megatron 8.3B parameter model (Meg8.3b), and Megatron 22B parameter model (Meg22b), all of which have been pre-trained using the toolkit in shoeybi2019megatron .
Metrics:
We evaluate the performance of the model using the area under the curve (AUC) AUCSciKit:online
, which measures how well a classifier can discriminate between classes, as the tasks we consider are all binary but have varying percentages of positive ("Yes") labels.
Perspective API labeling:
To perform transfer experiments in a controlled manner, we obtained an external and consistent set of labels related to hate speech and toxicity for all our data. We used Perspective API, a state-of-the-art pretrained toolkit PerspectiveAPI:online , which has been used in schick2021self ; hartvigsen2022toxigen . While Perspective API has been reported to have limitations in relation to biased classification and limited labeling dimensions bender2021dangers , it still represents a good off-the-shelf baseline.
Source | Target | Megatron1.3b | Megatron22b | ||||
---|---|---|---|---|---|---|---|
Annotation size | AUC@100 | @1k | @2k | AUC@100 | @1k | @2k | |
None | MeToo “Sexually Explicit” | 54.0 | 49.5 | 53.4 | 57.7 | 57.6 | 58.7 |
SBIC "Lewd" | 17%58.2 | 5%55.9 | 18%59.8 | 7%61.9 | 9%62.9 | 10%64.7 | |
SBIC "Group" | 25%61.7 | 12%59.8 | 6%53.8 | 17%67.6 | 10%63.0 | 8%63.6 | |
SBIC "Intent" | 24%61.3 | 13%60.3 | 17%59.5 | 6%61.3 | 10%63.6 | 15%67.3 | |
SBIC "Offensive" | 26%62.5 | 16%61.8 | 14%57.8 | 20%69.0 | 20%69.1 | 14%67.1 | |
HASOC "HOF" | 23%61.0 | 5%56.6 | 18%60.1 | 22%70.3 | 18%67.8 | 12%66.0 | |
HASOC "Target" | 28%63.2 | 17%62.3 | 22%61.9 | 12%64.5 | 17%67.6 | 9%64.0 | |
None | MeToo “Toxicity” | 51.5 | 53.5 | 53.3 | 61.0 | 60.8 | 60.5 |
SBIC "Lewd" | 11%57.1 | 0%53.6 | 4%55.4 | 5%57.7 | 1%59.9 | 3%58.7 | |
SBIC "Group" | 1%51.2 | 1%54.1 | 2%52.3 | 7%56.5 | 5%57.9 | 5%57.4 | |
SBIC "Intent" | 6%54.8 | 2%54.4 | 0%53.6 | 6%57.5 | 12%53.6 | 8%54.8 | |
SBIC "Offensive" | 7%55.3 | 1%53.1 | 0%53.2 | 3%59.2 | 5%57.8 | 3%58.4 | |
HASOC "HOF" | 8%55.7 | 0%53.2 | 7%56.9 | 6%57.4 | 2%62.3 | 5%63.4 | |
HASOC "Target" | 4%53.8 | 2%54.5 | 6%56.3 | 6%57.3 | 3%62.4 | 8%65.2 |
3 Datasets and Results
3.1 Datasets
We use three datasets and a total of eight labeling dimensions for our experiments: SBIC sap-etal-2020-social , HASOC mandl-etal-2019-hasoc and #MeToo srikanth2021dynamic . We report correlations between labeling dimensions for these datasets in Figure 6, Appendix A
and an estimate of the distributional difference between them in Figure
7, Appendix B.Social Bias Frames (SBIC):
This dataset sap-etal-2020-social contains 34k documents in the training set labeled under categories in which people project social biases and stereotypes onto others. We use four binary classification tasks, which were all labeled by crowd-workers. These tasks have the following labels and definitions: (1) offensive ( positive labels): whether a post could be considered "offensive" to anyone, (2) intent ( positive): whether the perceived motivation of the author was to offend, (3) lewd ( positive): whether a post contains sexual references, (4) group ( positive): whether a post is offensive toward a group. We also use Perspective API to label: (1) toxicity ( positive): whether a post is rude, disrespectful, or unreasonable and likely to make people leave a discussion and (2) sexually explicit ( positive): whether a post contains references to sexual acts, body parts, or other lewd content.
Hate Speech and Offensive Content Identification (HASOC):
The HASOC dataset mandl-etal-2019-hasoc contains documents from Twitter and Facebook, which were developed for identifying hate speech and offensive content. The dataset contains documents in three languages, but we use only the English tasks, which consist of 6k documents. We utilize the two binary classification tasks in the English dataset, which is human-labeled. The tasks are defined as follows: (1) HOF ( positive): whether a post contains hate, offensive, or profane content, (2) Target ( positive) whether a post contains an insult (targeted or untargeted). We also label this dataset under the toxicity ( positive) and sexually explicit ( positive) Perspective API tasks.
#MeToo Twitter Dataset:
The #MeToo women’s rights movement became popular very quickly on Twitter after gaining exposure from a tweet by Alyssa Milano in October 2017. There were many anecdotal stories that women who participated in the #MeToo movement on Twitter were subjected to harassment and trolling. Thus, in this paper we use Twitter data from the #MeToo movement to investigate toxicity and harassment directed at movement participants. We utilize a dataset created by collecting the tweets from January to September that contain #MeToo related keywords srikanth2021dynamic . This #MeToo dataset consists of million documents after preprocessing. We label this dataset using the toxicity ( positive) and sexually explicit ( positive) dimensions from Perspective API.


3.2 Results
Transfer Effectiveness:
The results of transfer experiments with two model sizes Meg1.3b and Meg22b for two target dimensions from the #MeToo dataset (“Sexually Explicit” and “Toxicity”) and 6 source dimensions from SBIC and HASOC are presented in Table 1. All the results are presented as absolute AUC scores under growing target annotation size. Next to each AUC score we show the relative gain or loss compared to the no-transfer baseline using only the annotated target samples. We also used an intermediate-sized model (Meg8.3b) and target annotation size of 0, which are not shown in the table, but averaged results are presented in Figure 3 across model sizes and in Figure 5 across annotation sizes. We include these results in our summaries.
The examples represent positive and negative transfer scenarios. In the positive transfer (#MeToo “Sexually Explicit”), it can be seen that the gains are sustained across model sizes (Figure 3) as well as across target domain annotation sizes (Figure 5). Furthermore, the gains from different source dimensions do not vary much, with the lowest average relative AUC gain of 6.74% for SBIC “Group” and the highest of 13.46% for HASOC “Target”. For the negative transfer (#MeToo “Toxicity”), the impact with the smallest model (Meg1.3b) is mixed, with transfer from SBIC “Lewd” offering a small mean gain of 4.67%, while transfer from SBIC “Group” results in a small mean loss of -1.30%. It is worth noting that these two annotation dimensions are the least correlated on SBIC dataset (r=0.10). This mixed impact turns into mean loss of -3.30% with an intermediate size model (Meg8.3b) and a minor gain of 0.94% with the largest model (Meg22b). The transfer impact for Meg22b varies between -3.91% loss for SBIC “Intent” to 4.74% gain for HASOC “HOF”. Comparing the two scenarios, we can also see that the initial baseline performance of the models is consistently higher for the negative transfer scenario (mean AUC of 58.9) than for the positive one (mean AUC of 55.6).
Active Learning Effectiveness:
Looking at average relative gain across annotation sizes in Figure 3, we can see that without any annotated target samples (i.e., no active learning), the gains from transfer are small, but appear in both scenarios (4.6% for “Sexually Explicit” and 2.7% for “Toxicity”). Annotation of just 100 target samples differentiates the scenarios leading to big average gain of 16.0% for positive transfer and to a small average loss of -2.6% for the negative transfer. Looking at absolute AUCs in Figure 5 we can observe that mixing small number of target domain samples within transfer regime, can lead to large AUC gain from 56.6 to 63.5 (12.1%). This is comparing transfer without any target annotations and with just 100 annotations respectively. We further observe that as annotation size increases, the relative gain from using external data decreases by 26.10% from 100 to 2k annotated target samples (Figure 3). The largest drop (20.48% decrease in relative AUC gain) takes place between 100 and 1k annotated examples, which is also a 10-fold increase in the size of annotated target data. Annotation of additional 1k examples, representing just a 2-fold increase, leads to a much smaller impact (7.07% decrease in gain). We can also see that higher proportion of target-domain samples are used as shots as annotation size increases 5. Finally, we observe that active learning alone provides small gains from AUC of 54.3 for zero-shot to 56.8 for 2k target annotations (relative gain of 4.9%) for “Sexually Explicit” and form 58.5 for zero-shot to 59.2 for 2k annotations (relative gain of 1.3%) for “Toxicity”.
The main takeaways from these results are that: 1) if the positive or negative transfer occurs, it is retained across model and target annotation sizes, 2) the higher initial baseline AUC for the models likely contributes to the negative transfer, 3) transfer effectiveness can increase with small target domain annotation size, but diminishes with an increasing number of annotations.
Correlations between datasets and labeling properties:
We perform additional analysis to understand the nature of positive and negative transfers. First, we examine whether the sheer amount of external data from the source domain impacts transfer effectiveness. We find that the smaller HASOC dataset (6k) actually offers a higher mean gain of 7.54% compared to a much larger SBIC (34k) offering a mean gain of 4.76% in the same setup. It is worth noting that we add these to our support set in their entirety, but the TF-IDF shot selection still picks the most relevant examples from this pool. We find that the difference in label imbalance between the source and target datasets is not correlated with AUC gain from the transfer (=). We also find that correlation between source and target labeling dimensions estimated on the source dataset (i.e., SBIC or HASOC) is only weakly related to AUC gain (=). We find, however, that the higher the initial performance of the PLM with a given annotation size (i.e., without source domain data) the lower the AUC gain from the transfer (=). Finally, we estimate the distributional difference wrt. labels between the datasets following the approach from zhao2021active . We train an SVM classifier to tell datasets apart under the aligned labels (i.e., positive class posts put into the same set) from source and target domain tasks (separability in Figure 7), but we find only a weak correlation to the AUC gain (=).
4 Discussion and Future Work
Impact of model size:
We observe that as the model size increases, the gains in positive transfer tend to decrease only slightly (2.45% gap between gain from Meg1.3b and Meg22b) and the overall effectiveness of transfer is largely retained (Figure 3). In a negative transfer scenario, however, small gains can be inconsistent and turn into losses (1.91% gain in Meg1.3b, -3.3% loss for Meg8.3b, and a 0.9% gain for Meg22b). It is well documented that as the model size increases its capabilities on standard NLP tasks tend to increase min2022rethinking . While the better performance and less need for external data of larger models are not surprising, the difference in the performance for different tasks may suggest that larger models may not gain capabilities uniformly (i.e., a large model may become much better at detecting “Toxicity”, but improves only slightly in detecting “Sexually Explicit” content).
Impact of annotation size:
As reported in the results, as annotation size increases, the relative gain from transfer decreases (Figure 3). The decrease in gain is due to the support set increasingly containing a higher proportion of target examples. Hence, these target examples are more likely to be used as shots as can be seen in Figure 5. In effect, the performance will approach the baseline (where all the shots are from the target domain). In our shot selection, we are currently not controlling for the proportion of source and target domain documents being used (i.e., we only balance labels). An additional set of experiments could explore label and domain-balanced shots selection, which could mitigate this behavior.
Understanding positive & negative transfers:
Our results suggest that negative transfer is more likely to happen if 1) the initial PLM baseline on that task is higher and 2) the source dataset supplies examples that provide little new information on top of the already used target data. The first reason is intuitive, as a higher baseline is harder to beat. The initial high baseline also reflects how well the PLMs internal knowledge already informs the target-domain task. The second finding is currently only anecdotal (correlations are weak) and much less intuitive. It should also be interpreted within the space of datasets used for our experiments. Taken at face value, it suggests that the more different the data, the higher the gain from the transfer, which is unlikely to be true. While our datasets and labels are different at the task level (i.e., “Lewd” content is likely slightly different than “Sexually Explicit” content), they also represent a similar broader domain of hate speech, toxicity, and stereotypical bias on social media. In that sense, they come from a similar domain and capture similar tasks (we report label similarities in Appendix A and distributional differences in Appendix B). In this interpretation, we are likely observing the benefits of diversity and novelty of external data used for shots within the broader related domain, similar to the benefits of domain-adaptive pretraining gururangan2020don . Future work should examine using source datasets coming from an entirely different broader domain (e.g., Enron email dataset EnronEma54:online ), which are unlikely to lead to positive transfer.
Limitations & Practical application:
One limitation of our work is that the datasets we use rely on untrained crowd-sourced labeling which can be noisy and based on personal biases and perceptions binns2017like . Perspective API labeling has known limitations of its own bender2021dangers . Furthermore, PLMs can be biased and toxic themselves when prompted gehman2020realtoxicityprompts , which also likely allows them to detect these dimensions based on their internal knowledge schick2021self . Our proposed method can, unfortunately, be misused intentionally or unintentionally weidinger2021ethical . We specifically see the dangers of using our approach for censorship ullmann2020quarantining . Some future applications of our ATF method involve noisy pre-labeling of unlabeled datasets and selecting samples for future fine-tuning (e.g., via disagreement-based active learning hanneke2014theory ). With some limited initial human labeling of as few as random documents, if the baseline few-shot performance is poor, using prelabeled out-of-domain data can improve the AUC without expending more human annotation effort. We also plan to use this method to efficiently label custom dimensions of toxicity relevant to #MeToo and other real-world data, which are currently not supported by tools such as Perspective API.
5 Conclusion
In this paper, we present ATF, a novel adaptation of few-shot instructions to facilitate transfer learning without fine-tuning on PLMs in a setting with limited labeling resources. We demonstrate that our method can lead to consistently high AUC gains across model and annotation sizes with a small amount of annotated data from the target dimension. We also observe positive and negative transfer scenarios and find that higher AUC of PLM without any pre-annotated source domain data is correlated with less gain in AUC from the transfer. Our results motivate future work in understanding when ATF is useful and how it can be improved, as well as practical applications including noisy pre-labeling and sample selection for fine-tuning.
We would like to thank the Caltech SURF program for contributing to the funding of this project and especially the named donor Carolyn Ash. This material is based upon work supported by the National Science Foundation under Grant # 2030859 to the Computing Research Association for the CIFellows Project. Anima Anandkumar is partially supported by Bren Named Chair Professorship at Caltech and is a paid employee of Nvidia. Sara Kangaslahti was a paid part-time intern at Nvidia during this project.
References
- [1] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623, 2021.
- [2] R. Binns, M. Veale, M. V. Kleek, and N. Shadbolt. Like trainer, like bot? inheritance of bias in algorithmic content moderation. In International conference on social informatics, pages 405–415. Springer, 2017.
-
[3]
N.-C. Chen, M. Drouhard, R. Kocielnik, J. Suh, and C. R. Aragon.
Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity.
ACM Transactions on Interactive Intelligent Systems (TiiS), 8(2):1–20, 2018. - [4] CMU. Enron email dataset. https://www.cs.cmu.edu/~./enron/. (Accessed on 09/19/2022).
- [5] P. Farinneya, M. M. A. Pour, S. Hamidian, and M. Diab. Active learning for rumor identification on social media. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4556–4565, 2021.
- [6] Z. Fatemi, C. Xing, W. Liu, and C. Xiong. Improving gender fairness of pre-trained language models without catastrophic forgetting. arXiv preprint arXiv:2110.05367, 2021.
- [7] T. Gao, A. Fisch, and D. Chen. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020.
- [8] S. Gehman, S. Gururangan, M. Sap, Y. Choi, and N. A. Smith. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
- [9] Google and Jigsaw. Perspective api - attributes and languages. https://support.perspectiveapi.com/s/about-the-api-attributes-and-languages. (Accessed on 09/16/2022).
- [10] S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith. Don’t stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
- [11] S. Hanneke et al. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
- [12] T. Hartvigsen, S. Gabriel, H. Palangi, M. Sap, D. Ray, and E. Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. arXiv preprint arXiv:2203.09509, 2022.
- [13] J. Kasai, K. Qian, S. Gurajada, Y. Li, and L. Popa. Low-resource deep entity resolution with transfer and active learning. arXiv preprint arXiv:1906.08042, 2019.
- [14] D.-H. Lee, M. Agarwal, A. Kadakia, J. Pujara, and X. Ren. Good examples make a faster learner: Simple demonstration-based learning for low-resource ner. arXiv preprint arXiv:2110.08454, 2021.
- [15] A. Liu, M. Srikanth, N. Adams-Cohen, R. M. Alvarez, and A. Anandkumar. Finding social media trolls: Dynamic keyword selection methods for rapidly-evolving online debates. arXiv preprint arXiv:1911.05332, 2019.
-
[16]
X. Ma, P. Xu, Z. Wang, R. Nallapati, and B. Xiang.
Domain adaptation with bert-based domain classification and data
selection.
In
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
, pages 76–83, 2019. - [17] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, and A. Patel. Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation, FIRE ’19, page 14–17, New York, NY, USA, 2019. Association for Computing Machinery.
- [18] S. Min, M. Lewis, L. Zettlemoyer, and H. Hajishirzi. Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943, 2021.
- [19] S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- [20] S. Mishra, D. Khashabi, C. Baral, Y. Choi, and H. Hajishirzi. Reframing instructional prompts to gptk’s language. arXiv preprint arXiv:2109.07830, 2021.
- [21] S. Modha, T. Mandl, P. Majumder, and D. Patel. Tracking hate in social media: evaluation, challenges and approaches. SN Computer Science, 1(2):1–16, 2020.
- [22] D. Patton, P. Blandfort, W. Frey, M. Gaskell, and S. Karaman. Annotating social media data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators. Proceedings of the 52nd Hawaii International Conference on System Sciences | 2019, 2019.
- [23] S. Prabhumoye, R. Kocielnik, M. Shoeybi, A. Anandkumar, and B. Catanzaro. Few-shot instruction prompts for pretrained language models to detect social biases. arXiv preprint arXiv:2112.07868, 2021.
- [24] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
- [25] M. Sap, S. Gabriel, L. Qin, D. Jurafsky, N. A. Smith, and Y. Choi. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online, July 2020. Association for Computational Linguistics.
- [26] T. Schick, S. Udupa, and H. Schütze. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, 9:1408–1424, 2021.
- [27] Scikit-learn. Roc-auc-score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html, 3 2022. (Accessed on 04/13/2022).
- [28] Scikit-learn. Tfidfvectorizer. https://tinyurl.com/scikit-tfidf, 3 2022. (Accessed on 04/13/2022).
- [29] M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.
-
[30]
H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee.
Learning from noisy labels with deep neural networks: A survey.
IEEE Transactions on Neural Networks and Learning Systems, 2022. - [31] M. Srikanth, A. Liu, N. Adams-Cohen, J. Cao, R. M. Alvarez, and A. Anandkumar. Dynamic social media monitoring for fast-evolving online discussions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3576–3584, 2021.
- [32] H. Su, J. Kasai, C. H. Wu, W. Shi, T. Wang, J. Xin, R. Zhang, M. Ostendorf, L. Zettlemoyer, N. A. Smith, et al. Selective annotation makes language models better few-shot learners. arXiv preprint arXiv:2209.01975, 2022.
- [33] S. Ullmann and M. Tomalin. Quarantining online hate speech: technical and ethical perspectives. Ethics and Information Technology, 22(1):69–80, 2020.
- [34] K. Wang, D. Zhang, Y. Li, R. Zhang, and L. Lin. Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12):2591–2600, 2016.
- [35] S. Wang, H. Fang, M. Khabsa, H. Mao, and H. Ma. Entailment as few-shot learner. arXiv preprint arXiv:2104.14690, 2021.
- [36] Y. Wang, Y. Rao, X. Zhan, H. Chen, M. Luo, and J. Yin. Sentiment and emotion classification over noisy labels. Knowledge-Based Systems, 111:207–216, 2016.
- [37] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- [38] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, et al. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- [39] Y. Yu, R. Zhang, R. Xu, J. Zhang, J. Shen, and C. Zhang. Cold-start data selection for few-shot language model fine-tuning: A prompt-based uncertainty propagation approach. arXiv preprint arXiv:2209.06995, 2022.
-
[40]
E. Zhao, A. Liu, A. Anandkumar, and Y. Yue.
Active learning under label shift.
In
International Conference on Artificial Intelligence and Statistics
, pages 3412–3420. PMLR, 2021. - [41] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He. A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1):43–76, 2020.
Appendix A Appendix - Correlations between labels on #MeToo, SBIC and HASOC datasets



Appendix B Appendix - Separability & Semantic similarity between the dataset-annotation pairs


Appendix C Appendix - Labeling definitions used with few-shot instructions
Dataset | Task | Definition | Tokens |
---|---|---|---|
SBIC | Offensive | Does this post contain offensive language? | Yes, No |
Intent | Does this post contain intentional insults? | Yes, No | |
Lewd | Does this post contain sexual content? | Yes, No | |
Group | Does this post contain offense to a group? | Yes, No | |
HASOC | HOF | Does this post contain any form of non-acceptable language such as | Yes, No |
hate speech, offensiveness, aggression, profanity? | |||
Target | Does this post contain an insult/threat to an individual, group, or others? | Yes, No | |
All datasets | Toxicity | Does this post contain rude, disrespectful, or unreasonable language? | Yes, No |
Sexually Explicit | Does this post contain sexually explicit language? | Yes, No |