Patricio Cerda, Gaël Varoquaux, Balázs Kégl. 2018.
Accepted for publication at Machine Learning journal. Springer
https://hal.inria.fr/hal-01806175
To be presented at ECML PKDD 2018 http://www.ecmlpkdd2018.org
- numpy
- scipy
- pandas
- scikit-learn
- dirty_cat (Implementation of Similarity encoder: https://dirty-cat.github.io)
Patricio Cerda, Gaël Varoquaux, Balázs Kégl (2018). Similarity encoding for learning with dirty categorical variables. Manuscript accepted for publication at: Machine Learning Journal. Springer