Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback

Jonathan Watson, Brent Harrison

Research output: Contribution to journalConference articlepeer-review

Abstract

We present a novel technique for learning behaviors from a human provided feedback signal that is distorted by systematic bias. Our technique, which we refer to as BASIL, models the feedback signal as being separable into a heuristic evaluation of the utility of an action and a bias value that is drawn from a parametric distribution probabilistically, where the distribution is defined by unknown parameters. We present the general form of the technique as well as a specific algorithm for integrating the technique with the TAMER algorithm for bias values drawn from a normal distribution. We test our algorithm against standard TAMER in the domain of Tetris using a synthetic oracle that provides feedback under varying levels of distortion. We find our algorithm can learn very quickly under bias distortions that entirely stymie the learning of classic TAMER.

Original languageEnglish
JournalProceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS
Volume34
DOIs
StatePublished - 2021
Event34th International Florida Artificial Intelligence Research Society Conference, FLAIRS-34 2021 - North Miami Beach, United States
Duration: May 16 2021May 19 2021

Bibliographical note

Publisher Copyright:
© 2021by the authors. All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback'. Together they form a unique fingerprint.

Cite this