Knowledge-Pushed Higher Confidence Bounds with Close to-Optimum Remorse for Heavy-Tailed Bandits
Authors: Ambrus Tamás, Szabolcs Szentpéteri, Balázs Csanád Csáji
Summary: Stochastic multi-armed bandits (MABs) present a elementary reinforcement studying mannequin to review sequential choice making in unsure environments. The higher confidence bounds (UCB) algorithm gave beginning to the renaissance of bandit algorithms, because it achieves near-optimal remorse charges below varied second assumptions. Up till just lately most UCB strategies relied on focus inequalities resulting in confidence bounds which rely on second parameters, such because the variance proxy, which can be often unknown in apply. On this paper, we suggest a brand new distribution-free, data-driven UCB algorithm for symmetric reward distributions, which wants no second info. The important thing thought is to mix a refined, one-sided model of the just lately developed resampled median-of-means (RMM) methodology with UCB. We show a near-optimal remorse sure for the proposed anytime, parameter-free RMM-UCB methodology, even for heavy-tailed distributions