Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

Md Sultan Al Nahian, Spencer Frazier, Mark Riedl, Brent Harrison

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Value alignment is a property of intelligent agents wherein they solely pursue non-harmful behaviors or human-beneficial goals. We introduce an approach to value-aligned reinforcement learning (RL), in which we train an agent with two reward signals: a standard task performance reward plus a normative behavior reward. The normative behavior reward is derived from a value-aligned prior model that we train using naturally occurring stories. These stories encode societal norms and can be used to classify text as normative or nonnormative. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as more normative. We test our value-alignment technique on three interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.

Original languageEnglish
Pages (from-to)3350-3361
Number of pages12
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number7
DOIs
StatePublished - 2024

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Autonomous agents
  • natural language processing
  • reinforcement learning (RL)

ASJC Scopus subject areas

  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior'. Together they form a unique fingerprint.

Cite this