Motivated by practical problems, I explore alternatives to the classical UCB and Thompson Sampling algorithms in Multi-Armed Bandits. I mainly explored two families of algorithms. The first one includes sub-sampling approaches (SDA), allowing for “greedy” comparisons between arms with strong theoretical guarantees in the MAB and non-stationary MAB settings. The second one is a generalization of the Thompson Sampling for bounded distribution, oriented towards practicioners by considering risk-awareness and the question of the sensitivity to model misspecification.
I am also generally interested in reinforcement learning, statistics, and machine learning.
PhD Student, 2019-now
CNRS/Inria Lille, SequeL/ScooL team
MSc Mathématiques Vision Apprentissage (MVA), 2019
MSc in Statistics and Computer Science, 2019