If you’re interested in the full write-up, see Master Thesis.pdf.
- Title: Bayesian Bandits for Algorithm Selection: Latent-State Modeling and Spatial Reward Structures
- Supervisors: Prof. Christian Brownlees, Prof. David Rossell
- Institution: Universitat Pompeu Fabra & Barcelona School of Economics
- Objective: Develop and evaluate bandit algorithms that dynamically select forecasting models under non-stationary and spatially structured rewards.
- Team Members: Marvin Ernst, Oriol Gelabert, Melisa Vadenja
Real-world decision environments are frequently non-stationary and spatially correlated. We design bandit methods that adapt to hidden latent-state dynamics and spatial smoothness among arms.
-
Setting: At each time
$t$ , pick one arm (algorithm) out of$K$ ; observe reward. -
Goal: Minimize cumulative regret
$R_T$ relative to the best dynamic/spatial policy. -
Extensions:
- Dynamic Bandits (hidden regimes / HMMs)
- Spatial Bandits (arms in a continuous space with smooth reward structure)
-
Dynamic Bandits
- Two Bayesian latent-state models:
- M1: Arm-specific HMMs
- M2: Globally shared latent HMM
- Baselines: AR bandits, classical UCB/TS
- Result: M1-TS adapts best to regime switches and dominates across settings.
- Two Bayesian latent-state models:
-
Spatial Bandits
- Benchmarked GP-UCB / GP-TS vs. Zoom-In (tree-based region refinement for Lipschitz bandits) and classical UCB.
- Show kernel/length-scale sensitivity for GPs; quantify robustness of Zoom-In.
-
Hybrid Strategies (Future)
- Combine GP exploration with Zoom-In style refinement to balance accuracy and scalability.
- Cumulative Regret
- Instantaneous Regret
- Euclidean Distance to Best Arm (spatial)
- Bayesian inference (JAGS) for HMM-style models (M1/M2)
- Autoregressive bandits
- GP-UCB, GP-TS
- Zoom-In (UCB0/UCB1 and TS scoring)
- Classical UCB and Thompson Sampling
All figures below are pulled directly from the
imgs/folder used in the presentation.
Global latent-state (M2)
Arm-specific latent-state (M1)
Arm space and setup
Zoom-In vs. Standard (UCB / TS)
Zoom-In vs. GP methods (K = 1000)
Arm space (ℓ = 0.05)
Zoom-In vs. GP-UCB (ℓ = 0.05)
Varying length scales (ℓ = 1.0 vs. 0.05)
- Desautels et al. (2014) - Gaussian Process Bandits with Exploration-Exploitation
- Chowdhury & Gopalan (2017) - Bandit Optimization with Gaussian Processes
- Salgia et al. (2021) - Lipschitz Bandits without the Lipschitz Constant
- Kandasamy et al. (2018) - Parallelised Bayesian Optimisation via TS
- Kleinberg et al. (2008) - Regret Bounds for Restless/Lipschitz Bandits
- Lattimore & Szepesvári (2020) - Bandit Algorithms
- Hamilton (1994) - Time Series Analysis
We provide empirical evidence for structured bandits in non-stationary and spatial settings, highlighting:
- Bayesian models: high fidelity, high compute
- GP methods: strong but scale poorly with
$K,T$ - Zoom-In: competitive and robust in Lipschitz settings
See the full thesis for details: Master Thesis.pdf


























