Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

06/08/2023
by   Cheng Deng, et al.
0

Large language models (LLMs)have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience, with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBenchmark, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pretrained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on over 1 million pieces of geoscience literature and utilize GeoSignal's supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Experiments conducted on the GeoBenchmark demonstrate the the effectiveness of our approach and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2023

Efficient Finetuning Large Language Models For Vietnamese Chatbot

Large language models (LLMs), such as GPT-4, PaLM, and LLaMa, have been ...
research
07/28/2023

TrafficSafetyGPT: Tuning a Pre-trained Large Language Model to a Domain-Specific Expert in Transportation Safety

Large Language Models (LLMs) have shown remarkable effectiveness in vari...
research
04/28/2020

DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis

This paper focuses on learning domain-oriented language models driven by...
research
08/22/2023

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

Error prediction in large language models often relies on domain-specifi...
research
07/22/2023

FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models

Financial risk prediction plays a crucial role in the financial sector. ...
research
05/19/2023

Self-QA: Unsupervised Knowledge Guided Language Model Alignment

Large-scale language models like ChatGPT and GPT-4 have gained attention...
research
07/28/2023

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

This paper presents the development and evaluation of ChatHome, a domain...

Please sign up or login with your details

Forgot password? Click here to reset