Informace o publikaci

Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward

Autoři

KŘETÍNSKÝ Jan MICHEL Fabian MICHEL Lukáš PÉREZ Guillermo A

Rok publikování 2020
Druh Článek ve sborníku
Konference Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2020, virtual online, August 3-6, 2020
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www
Popis We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info