ClimateBert: A Pretrained Language Model for Climate-Related Text

10/22/2021
by   Nicolas Webersinke, et al.
3

Over the recent years, large pretrained language models (LM) have revolutionized the field of natural language processing (NLP). However, while pretraining on general language has been shown to work very well for common language, it has been observed that niche language poses problems. In particular, climate-related texts include specific language that common LMs can not represent accurately. We argue that this shortcoming of today's LMs limits the applicability of modern NLP to the broad field of text processing of climate-related texts. As a remedy, we propose ClimateBert, a transformer-based language model that is further pretrained on over 1.6 million paragraphs of climate-related texts, crawled from various sources such as common news, research articles, and climate reporting of companies. We find that ClimateBertleads to a 46 which, in turn, leads to lowering error rates by 3.57 climate-related downstream tasks like text classification, sentiment analysis, and fact-checking.

READ FULL TEXT
research
11/07/2022

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

In recent years, multilingual pre-trained language models have gained pr...
research
11/21/2022

Deanthropomorphising NLP: Can a Language Model Be Conscious?

This work is intended as a voice in the discussion over the recent claim...
research
12/01/2022

CliMedBERT: A Pre-trained Language Model for Climate and Health-related Text

Climate change is threatening human health in unprecedented orders and m...
research
03/31/2023

Enhancing Large Language Models with Climate Resources

Large language models (LLMs) have significantly transformed the landscap...
research
05/10/2022

Towards Climate Awareness in NLP Research

The climate impact of AI, and NLP research in particular, has become a s...

Please sign up or login with your details

Forgot password? Click here to reset