Learning From Explanations Using Sentiment and Advice in RL

Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, Andrea Thomaz

Research output: Contribution to journalArticlepeer-review

50 Scopus citations

Abstract

In order for robots to learn from people with no machine learning expertise, robots should learn from natural human instruction. Most machine learning techniques that incorporate explanations require people to use a limited vocabulary and provide state information, even if it is not intuitive. This paper discusses a software agent that learned to play the Mario Bros. game using explanations. Our goals to improve learning from explanations were twofold: 1) to filter explanations into advice and warnings and 2) to learn policies from sentences without state information. We used sentiment analysis to filter explanations into advice of what to do and warnings of what to avoid. We developed object-focused advice to represent what actions the agent should take when dealing with objects. A reinforcement learning agent used object-focused advice to learn policies that maximized its reward. After mitigating false negatives, using sentiment as a filter was approximately 85% accurate. object-focused advice performed better than when no advice was given, the agent learned where to apply the advice, and the agent could recover from adversarial advice. We also found the method of interaction should be designed to ease the cognitive load of the human teacher or the advice may be of poor quality.

Original languageEnglish
Article number7742965
Pages (from-to)44-55
Number of pages12
JournalIEEE Transactions on Cognitive and Developmental Systems
Volume9
Issue number1
DOIs
StatePublished - Mar 2017

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Keywords

  • Advice
  • reinforcement learning (RL)
  • sentiment

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning From Explanations Using Sentiment and Advice in RL'. Together they form a unique fingerprint.

Cite this