We present a novel technique for learning behaviors from a human provided feedback signal that is distorted by systematic bias. Our technique, which we refer to as BASIL, models the feedback signal as being separable into a heuristic evaluation of the utility of an action and a bias value that is drawn from a parametric distribution probabilistically, where the distribution is defined by unknown parameters. We present the general form of the technique as well as a specific algorithm for integrating the technique with the TAMER algorithm for bias values drawn from a normal distribution. We test our algorithm against standard TAMER in the domain of Tetris using a synthetic oracle that provides feedback under varying levels of distortion. We find our algorithm can learn very quickly under bias distortions that entirely stymie the learning of classic TAMER.
|Journal||Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS|
|State||Published - 2021|
|Event||34th International Florida Artificial Intelligence Research Society Conference, FLAIRS-34 2021 - North Miami Beach, United States|
Duration: May 16 2021 → May 19 2021
Bibliographical notePublisher Copyright:
© 2021by the authors. All rights reserved.
ASJC Scopus subject areas
- Artificial Intelligence