Abstract
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i <k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.
Original language | English |
---|---|
Pages (from-to) | 107-123 |
Number of pages | 17 |
Journal | Annals of Mathematics and Artificial Intelligence |
Volume | 59 |
Issue number | 1 |
DOIs | |
State | Published - Jun 2010 |
Bibliographical note
Funding Information:Acknowledgements Dai was partially supported by Office of Naval Research grant N00014-06-1-0147.
Keywords
- Markov decision process
- Policy ranking
- Probabilistic planning
ASJC Scopus subject areas
- Artificial Intelligence
- Applied Mathematics