Policy shaping in domains with multiple optimal policies

Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles Isbell, Andrea Thomaz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we investigate whether humans have a preference towards providing feedback along one optimal policy over the other in two gridworld domains. We find that for the domain with significant risk to exploration, 60% of our participants prefer to discourage the agent's exploration along the risky portion of the state space, while 40% state that they have no preference. We also use the interactive reinforcement learning algorithm Policy Shaping to evaluate the performance of simulated oracles with a number of feedback strategies. We find that certain domain traits, such as risk during exploration and number of optimal policies play an important role in determining the best performing feedback strategy.

Original languageEnglish
Title of host publicationAAMAS 2016 - Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems
Pages1455-1456
Number of pages2
ISBN (Electronic)9781450342391
StatePublished - 2016
Event15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2016 - Singapore, Singapore
Duration: May 9 2016May 13 2016

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

Conference15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2016
Country/TerritorySingapore
CitySingapore
Period5/9/165/13/16

Bibliographical note

Publisher Copyright:
Copyright © 2016, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Keywords

  • Interactive machine learning
  • Learning from critique
  • Policy shaping
  • Reinforcement learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Policy shaping in domains with multiple optimal policies'. Together they form a unique fingerprint.

Cite this