Ranking policies in discrete Markov decision processes

Peng Dai, Judy Goldsmith

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i <k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.

Original languageEnglish
Pages (from-to)107-123
Number of pages17
JournalAnnals of Mathematics and Artificial Intelligence
Volume59
Issue number1
DOIs
StatePublished - Jun 2010

Bibliographical note

Funding Information:
Acknowledgements Dai was partially supported by Office of Naval Research grant N00014-06-1-0147.

Keywords

  • Markov decision process
  • Policy ranking
  • Probabilistic planning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Ranking policies in discrete Markov decision processes'. Together they form a unique fingerprint.

Cite this