Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

by   Mohamad Ballout, et al.
Universität Osnabrück

Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained language models to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained language models perform better on the Listops dataset, with an average accuracy of 58.7%, compared to transformers trained from scratch, which have an average accuracy of 29.0%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained language models does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results.


Are Pre-trained Language Models Knowledgeable to Ground Open Domain Dialogues?

We study knowledge-grounded dialogue generation with pre-trained languag...

Cross-Domain Generalization and Knowledge Transfer in Transformers Trained on Legal Data

We analyze the ability of pre-trained language models to transfer knowle...

ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown

The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of d...

Large Language Models and the Reverse Turing Test

Large Language Models (LLMs) have been transformative. They are pre-trai...

Optimizing Deeper Transformers on Small Datasets: An Application on Text-to-SQL Semantic Parsing

Due to the common belief that training deep transformers from scratch re...

Do Transformers Parse while Predicting the Masked Word?

Pre-trained language models have been shown to encode linguistic structu...

Please sign up or login with your details

Forgot password? Click here to reset