Enter the matrix: Safely interruptible autonomous systems via virtualization

Mark O. Riedl, Brent Harrison

Research output: Contribution to journalConference articlepeer-review


Autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability that learn online using reinforcement learning to discover that the kill switch deprives it of long-term reward and thus learn to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique that prevents a reinforcement learning agent from learning to disable the kill switch. We introduce an interruption process in which the agent’s sensors and effectors are redirected to a virtual simulation where it continues to believe it is receiving reward. We illustrate our technique in a simple grid world environment.

Original languageEnglish
JournalCEUR Workshop Proceedings
StatePublished - 2019
Event2019 AAAI Workshop on Artificial Intelligence Safety, SafeAI 2019 - Honolulu, United States
Duration: Jan 27 2019 → …

Bibliographical note

Publisher Copyright:
© 2019 CEUR-WS. All rights reserved.

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Enter the matrix: Safely interruptible autonomous systems via virtualization'. Together they form a unique fingerprint.

Cite this