Towards Healthy AI: Large Language Models Need Therapists Too

04/02/2023
by   Baihan Lin, et al.
8

Recent advances in large language models (LLMs) have led to the development of powerful AI chatbots capable of engaging in natural and human-like conversations. However, these chatbots can be potentially harmful, exhibiting manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to be safe, trustworthy and ethical. To create healthy AI systems, we present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors in AI chatbots. The framework involves four types of AI agents: a Chatbot, a "User," a "Therapist," and a "Critic." We demonstrate the effectiveness of SafeguardGPT through a working example of simulating a social conversation. Our results show that the framework can improve the quality of conversations between AI chatbots and humans. Although there are still several challenges and directions to be addressed in the future, SafeguardGPT provides a promising approach to improving the alignment between AI chatbots and human values. By incorporating psychotherapy and reinforcement learning techniques, the framework enables AI chatbots to learn and adapt to human preferences and values in a safe and ethical way, contributing to the development of a more human-centric and responsible AI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Science in the Era of ChatGPT, Large Language Models and AI: Challenges for Research Ethics Review and How to Respond

Large language models of artificial intelligence (AI) such as ChatGPT fi...
research
08/01/2023

SurveyLM: A platform to explore emerging value perspectives in augmented language models' behaviors

This white paper presents our work on SurveyLM, a platform for analyzing...
research
06/20/2023

TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models

Large Language Models (LLMs) such as ChatGPT, have gained significant at...
research
06/05/2023

AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms

While demands for change and accountability for harmful AI consequences ...
research
02/02/2023

Conditioning Predictive Models: Risks and Strategies

Our intention is to provide a definitive reference on what it would take...
research
04/19/2023

Fundamental Limitations of Alignment in Large Language Models

An important aspect in developing language models that interact with hum...
research
08/05/2020

Aligning AI With Shared Human Values

We show how to assess a language model's knowledge of basic concepts of ...

Please sign up or login with your details

Forgot password? Click here to reset