You need to have JavaScript enabled in order to access this site.

Exercise 5.2: MAB-EW

Due Apr 14, 2022 at 3:30pm
Points 20
Questions 3
Available Apr 14, 2022 at 12pm - Apr 14, 2022 at 5pm 5 hours
Time Limit None

Instructions

Setup:

payoffs in [0,h]
apply multi-armed-bandit reduction to exponential weights alg
recall theorem:

$LaTeX: \mathbf{E}\left[\text{MAB}\right] \geq (1-2\epsilon)\text{OPT} - \tfrac{h\,k}{\epsilon^2} \ln k$

optimally tune the learning rate ε for n rounds

Questions: Analyze the per-round regret, what is dependence on maximum payoff h? Number of rounds n? Number of actions k?

Only registered, enrolled users can take graded quizzes