The complexity of policy evaluation for finite-horizon partially-observable markov decision processes

Martin Mundhenk, Judy Goldsmith, Eric Allender

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. We consider several flavors of finite-horizon POMDPs. Our results concern the complexity of the policy evaluation and policy existence problems, which are characterized in terms of completeness for complexity classes. We prove a new upper bound for the policy evaluation problem for POMDPs, showing it is complete for Probabilistic Logspace. From this, we prove policy existence problems for several variants of unobservable, succinctly represented MDPs to be complete for NPPP , a class for which not many natural problems are known to be complete.

Original languageEnglish
Title of host publicationMathematical Foundations of Computer Science 1997 - 22nd International Symposium, MFCS 1997, Proceedings
EditorsIgor Privara, Peter Ruzicka
Pages129-138
Number of pages10
DOIs
StatePublished - 1997
Event22nd International Symposium on Mathematical Foundations of Computer Science, MFCS 1997 - Bratislava, Slovakia
Duration: Aug 25 1997Aug 29 1997

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1295
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Symposium on Mathematical Foundations of Computer Science, MFCS 1997
Country/TerritorySlovakia
CityBratislava
Period8/25/978/29/97

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1997.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'The complexity of policy evaluation for finite-horizon partially-observable markov decision processes'. Together they form a unique fingerprint.

Cite this