Resumen
In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we investigate whether humans have a preference towards providing feedback along one optimal policy over the other in two gridworld domains. We find that for the domain with significant risk to exploration, 60% of our participants prefer to discourage the agent's exploration along the risky portion of the state space, while 40% state that they have no preference. We also use the interactive reinforcement learning algorithm Policy Shaping to evaluate the performance of simulated oracles with a number of feedback strategies. We find that certain domain traits, such as risk during exploration and number of optimal policies play an important role in determining the best performing feedback strategy.
| Idioma original | English |
|---|---|
| Título de la publicación alojada | AAMAS 2016 - Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems |
| Páginas | 1455-1456 |
| Número de páginas | 2 |
| ISBN (versión digital) | 9781450342391 |
| Estado | Published - 2016 |
| Evento | 15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2016 - Singapore, Singapore Duración: may 9 2016 → may 13 2016 |
Serie de la publicación
| Nombre | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
|---|---|
| ISSN (versión impresa) | 1548-8403 |
| ISSN (versión digital) | 1558-2914 |
Conference
| Conference | 15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2016 |
|---|---|
| País/Territorio | Singapore |
| Ciudad | Singapore |
| Período | 5/9/16 → 5/13/16 |
Nota bibliográfica
Publisher Copyright:Copyright © 2016, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Financiación
This work was funded under ONR grant number N000141410003
| Financiadores | Número del financiador |
|---|---|
| Office of Naval Research | N000141410003 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
Huella
Profundice en los temas de investigación de 'Policy shaping in domains with multiple optimal policies'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver