Finding best k policies

Peng Dai, Judy Goldsmith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.

Original languageEnglish
Title of host publicationAlgorithmic Decision Theory - First International Conference, ADT 2009, Proceedings
Pages144-155
Number of pages12
DOIs
StatePublished - 2009
Event1st International Conference on Algorithmic Decision Theory, ADT 2009 - Venice, Italy
Duration: Oct 20 2009Oct 23 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5783 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Conference on Algorithmic Decision Theory, ADT 2009
Country/TerritoryItaly
CityVenice
Period10/20/0910/23/09

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Finding best k policies'. Together they form a unique fingerprint.

Cite this