Exercise 5.2: MAB-EW

  • Due Apr 14, 2022 at 3:30pm
  • Points 20
  • Questions 3
  • Available Apr 14, 2022 at 12pm - Apr 14, 2022 at 5pm 5 hours
  • Time Limit None

Instructions

Setup:

  • payoffs in [0,h]
  • apply multi-armed-bandit reduction to exponential weights alg
  • recall theorem:

LaTeX: \mathbf{E}\left[\text{MAB}\right] \geq (1-2\epsilon)\text{OPT} - \tfrac{h\,k}{\epsilon^2} \ln k

  • optimally tune the learning rate ε for n rounds

Questions: Analyze the per-round regret, what is dependence on maximum payoff h? Number of rounds n? Number of actions k?

Only registered, enrolled users can take graded quizzes