跳转至

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

张驰豪,4 月 14 日 11:30

We will talk about the stochastic multi-armed bandit problem in the \(P\)-pass streaming model. In this problem, the \(n\) arms are present in a stream and at most \(m<n\) arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of \(m, n\) and \(P\). Specifically, we design an algorithm with \(\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)\) regret and complement it with a \(\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)\) lower bound when the number of rounds \(T\) is sufficiently large. Our results are tight up to a logarithmic factor in \(n\) and \(P\).

Back