Exercise 5.2: MAB-EW
- Due Apr 14, 2022 at 3:30pm
- Points 20
- Questions 3
- Available Apr 14, 2022 at 12pm - Apr 14, 2022 at 5pm 5 hours
- Time Limit None
Instructions
Setup:
- payoffs in [0,h]
- apply multi-armed-bandit reduction to exponential weights alg
- recall theorem:
- optimally tune the learning rate ε for n rounds
Questions: Analyze the per-round regret, what is dependence on maximum payoff h? Number of rounds n? Number of actions k?
Only registered, enrolled users can take graded quizzes