Safety without alignment

02/27/2023
by   András Kornai, et al.
0

Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational ones will have a significant evolutionary advantage) so an approach that ties their ethics to their rationality has clear long-term advantages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2019

Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution

Recent developments in artificial intelligence and machine learning have...
research
11/26/2020

Transdisciplinary AI Observatory – Retrospective Analyses and Future-Oriented Contradistinctions

In the last years, AI safety gained international recognition in the lig...
research
07/20/2023

Deceptive Alignment Monitoring

As the capabilities of large machine learning models continue to grow, a...
research
04/27/2023

Appropriateness is all you need!

The strive to make AI applications "safe" has led to the development of ...
research
02/12/2020

AI safety: state of the field through quantitative lens

Last decade has seen major improvements in the performance of artificial...
research
08/08/2017

Robust Computer Algebra, Theorem Proving, and Oracle AI

In the context of superintelligent AI systems, the term "oracle" has two...
research
08/12/2023

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Safety lies at the core of the development of Large Language Models (LLM...

Please sign up or login with your details

Forgot password? Click here to reset