人工智能导论——重要知识

Finite-time Analysis of the Multiarmed Bandit Problem Abstract Reinforcement learning policies face the exploration versus exploitation dilemma, i.e.