S&AS: FND: COLLAB: Learning from Stories: Practical Value Alignment and Taskability for Autonomous Systems

Grants and Contracts Details

Description

As artificial intelligence (AI) systems and other types of autonomous systems become more common, it becomes more important that they can safely interact with humans. When AI systems work in close proximity to other humans, the risk of unintentional harm increases. This harm can be physical, such as an autonomous vehicle gets in collision, or it can be psychological, such as a conversational agent talking about an unsettling or upsetting subject. In this work, we plan to explore how stories can be used to train autonomous agents to perform tasks in safe, humanlike ways. To do this, we have identified the following subtasks that will be completed in collaboration with researchers at the Georgia Institute of Technology. 1) Create a hybrid model of value aligned story events: While stories contain a great deal of cultural value information, it can be difficult to determine which actions in a story are moral and which are not. During this task, we will create supervised models of ethical behavior using a dataset comprised of Goofus and Gallant comics and combine it with an unsupervised model of story event relationships to construct activity scripts that detail the ethical way to complete various tasks. 2) Learn value aligned policies: Using this hybrid model, we will convert its event predictions into reward functions that can be used to train reinforcement learning agents. This will involve addressing the correspondence problem as our reward function will exist in the space of story events and natural language while the agent will be operating in its own environment of states and actions. 3) Incorporate a human-in-the loop: Naturally occurring story corpora are likely to be noisy. Many times, characters in stories will perform actions that could be potentially harmful to humans. To help reduce the likelihood that agents trained using stories learn these behaviors, we plan to incorporate a human in the loop by using an arbiter, a system that will maintain two policies and dictate which policy an agent should follow at a given time. The two policies in this work will be a policy trained offline by reading stories, and another trained online using human feedback. 4) Investigate adversarial training attacks: Training techniques that use stories are especially susceptible to poisoning attacks. If a malicious user injects stories that describe undesirable behavior into the training set, then it is possible that our method could teach agents to exhibit these wrong behaviors. The last part of our proposed work is to evaluate how robust our system is to these types of poisoning attacks and how the combination of human feedback and attention mechanisms can be used to purge poisoned samples from the training set.
StatusFinished
Effective start/end date6/1/195/31/23

Funding

  • National Science Foundation: $291,307.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.