Abstract
We present a novel technique for learning behaviors from a human provided feedback signal that is distorted by systematic bias. Our technique, which we refer to as BASIL, models the feedback signal as being separable into a heuristic evaluation of the utility of an action and a bias value that is drawn from a parametric distribution probabilistically, where the distribution is defined by unknown parameters. We present the general form of the technique as well as a specific algorithm for integrating the technique with the TAMER algorithm for bias values drawn from a normal distribution. We test our algorithm against standard TAMER in the domain of Tetris using a synthetic oracle that provides feedback under varying levels of distortion. We find our algorithm can learn very quickly under bias distortions that entirely stymie the learning of classic TAMER.
Original language | English |
---|---|
Journal | Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS |
Volume | 34 |
DOIs | |
State | Published - 2021 |
Event | 34th International Florida Artificial Intelligence Research Society Conference, FLAIRS-34 2021 - North Miami Beach, United States Duration: May 16 2021 → May 19 2021 |
Bibliographical note
Publisher Copyright:© 2021by the authors. All rights reserved.
ASJC Scopus subject areas
- Artificial Intelligence
- Software