Bandits for Algorithm Selection

If you’re interested in the full write-up, see Master Thesis.pdf.

Project Overview

Title: Bayesian Bandits for Algorithm Selection: Latent-State Modeling and Spatial Reward Structures
Supervisors: Prof. Christian Brownlees, Prof. David Rossell
Institution: Universitat Pompeu Fabra & Barcelona School of Economics
Objective: Develop and evaluate bandit algorithms that dynamically select forecasting models under non-stationary and spatially structured rewards.
Team Members: Marvin Ernst, Oriol Gelabert, Melisa Vadenja

Motivation

Real-world decision environments are frequently non-stationary and spatially correlated. We design bandit methods that adapt to hidden latent-state dynamics and spatial smoothness among arms.

Problem Formulation

Setting: At each time $t$, pick one arm (algorithm) out of $K$; observe reward.
Goal: Minimize cumulative regret $R_T$ relative to the best dynamic/spatial policy.
Extensions:
- Dynamic Bandits (hidden regimes / HMMs)
- Spatial Bandits (arms in a continuous space with smooth reward structure)

Contributions

Dynamic Bandits
- Two Bayesian latent-state models:
  - M1: Arm-specific HMMs
  - M2: Globally shared latent HMM
- Baselines: AR bandits, classical UCB/TS
- Result: M1-TS adapts best to regime switches and dominates across settings.
Spatial Bandits
- Benchmarked GP-UCB / GP-TS vs. Zoom-In (tree-based region refinement for Lipschitz bandits) and classical UCB.
- Show kernel/length-scale sensitivity for GPs; quantify robustness of Zoom-In.
Hybrid Strategies (Future)
- Combine GP exploration with Zoom-In style refinement to balance accuracy and scalability.

Evaluation Metrics

Cumulative Regret
Instantaneous Regret
Euclidean Distance to Best Arm (spatial)

Key Algorithms Implemented

Bayesian inference (JAGS) for HMM-style models (M1/M2)
Autoregressive bandits
GP-UCB, GP-TS
Zoom-In (UCB0/UCB1 and TS scoring)
Classical UCB and Thompson Sampling

Figures from the Presentation

All figures below are pulled directly from the imgs/ folder used in the presentation.

Plate Diagrams (Dynamic Models)

Dynamic Bandits - Results (Global vs. Local HMMs)

Global latent-state (M2)

Arm-specific latent-state (M1)

Baseline (Static) Comparison

Spatial Bandits - Lipschitz Setting

Arm space and setup

Zoom-In vs. Standard (UCB / TS)

Zoom-In vs. GP methods (K = 1000)

Spatial Bandits - Gaussian Process Setting

Arm space (ℓ = 0.05)

Zoom-In vs. GP-UCB (ℓ = 0.05)

Varying length scales (ℓ = 1.0 vs. 0.05)

GP Bandits - Misspecification Studies

GP-TS vs. GP-UCB+

Literature

Desautels et al. (2014) - Gaussian Process Bandits with Exploration-Exploitation
Chowdhury & Gopalan (2017) - Bandit Optimization with Gaussian Processes
Salgia et al. (2021) - Lipschitz Bandits without the Lipschitz Constant
Kandasamy et al. (2018) - Parallelised Bayesian Optimisation via TS
Kleinberg et al. (2008) - Regret Bounds for Restless/Lipschitz Bandits
Lattimore & Szepesvári (2020) - Bandit Algorithms
Hamilton (1994) - Time Series Analysis

Final Remarks

We provide empirical evidence for structured bandits in non-stationary and spatial settings, highlighting:

Bayesian models: high fidelity, high compute
GP methods: strong but scale poorly with $K,T$
Zoom-In: competitive and robust in Lipschitz settings

See the full thesis for details: Master Thesis.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.Rproj.user		.Rproj.user
adv_sampling		adv_sampling
appendix		appendix
baseline		baseline
hmm		hmm
spatial		spatial
src		src
switching		switching
switching_models		switching_models
.DS_Store		.DS_Store
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
.python-version		.python-version
Master_Thesis.pdf		Master_Thesis.pdf
README.md		README.md
bandits_algorithms.Rproj		bandits_algorithms.Rproj
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!