Continuous Contextual bandit experiments for the SLOPE estimator
The estimator is developed in the paper titled Adaptive Estimator Selection for Off-Policy Evaluation and this repository was used for the experiments presented in section 4.
The repository contains a simulator for continuous contextual bandits (CB), some estimators for off policy evaluation in the continuous CB setting, and some scripts for running the experiments.
If you are building on this repository, please cite as: Adaptive estimator selection for off-policy evaluation. Yi Su, Pavithra Srinath, Akshay Krishnamurthy. arXiv 2020.