Domain-specific ChatBots for Science using Embeddings

06/15/2023
by   Kevin G. Yager, et al.
0

Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across publication figures. These results confirm that LLMs are already suitable for use by physical scientists in accelerating their research efforts.

READ FULL TEXT

page 7

page 15

page 17

page 25

page 29

page 31

page 32

research
12/24/2021

Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling

The scientific world is changing at a rapid pace, with new technology be...
research
02/13/2023

Evaluation of Word Embeddings for the Social Sciences

Word embeddings are an essential instrument in many NLP tasks. Most avai...
research
04/05/2020

Improved Pretraining for Domain-specific Contextual Embedding Models

We investigate methods to mitigate catastrophic forgetting during domain...
research
09/05/2019

Fusing Vector Space Models for Domain-Specific Applications

We address the problem of tuning word embeddings for specific use cases ...
research
08/25/2023

DARWIN Series: Domain Specific Large Language Models for Natural Science

Emerging tools bring forth fresh approaches to work, and the field of na...
research
06/03/2018

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

Topic models are among the most widely used methods in natural language ...
research
07/05/2023

MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain

Scientific publications follow conventionalized rhetorical structures. C...

Please sign up or login with your details

Forgot password? Click here to reset