Awesome Decision Science

Une liste de ressources triées sur le volet à propos de tout ce qui touche à la prise de décision : vidéos, tutoriels, livres, documents, thèses, articles, datasets et libs open source. Cliquez sur le menu hamburger 🍔 pour naviguer plus facilement.

👍 Vous aimez ? Un like, un partage aiderait grandement le projet ! Partageons les connaissances !

⚠️ Disclaimer : la quasi-totalité des ressources est dispo gratuitement et légalement. Je ne touche rien sur les ventes des rares ressources payantes, qui sont simplement référencées car j'estime que ce sont des ressources de valeur.

À propos

Moi c'est Miguel 👋 J'aide les leaders B2B (conseil, bancassurance, PME) en transformant la prise de décision en science exacte ! Des entreprises comme Accuracy, le Crédit Agricole et Lizeo font déjà des choix 95% plus sûrs grâce à mes méthodes de prévision financière. ☎️ Et si on discutait de comment valoriser votre patrimoine Data ?

Sommaire

Awesome Decision Science

🤖 Artificial Intelligence, Computational Intelligence, and Machine Learning

Books

Computational Intelligence

Engelbrecht, Andries P. Computational intelligence: an introduction. John Wiley & Sons, 2007. [Link]

Deep Learning

Bishop, Christopher M., and Hugh Bishop. "Deep learning: foundations and concepts." Springer, 2024. [Link]
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016. [Link]
Grohs, Philipp, and Gitta Kutyniok, eds. Mathematical aspects of deep learning. Cambridge University Press, 2022. [Link]
Prince, Simon JD. Understanding Deep Learning. MIT press, 2023. [Link]
Zhang, Aston, et al. Dive into deep learning. Cambridge University Press, 2023. [Link]

Explainable AI

Biecek, Przemyslaw, and Tomasz Burzykowski. Explanatory model analysis: explore, explain, and examine predictive models. CRC Press, 2021. [Link]
Hall, Curtis and Pandey. Machine Learning for High-Risk Applications. O'Reilly, 2023. [Link]
Molnar, Christoph. Interpretable machine learning. Lulu. com, 2020. [Link]

Machine Learning

Bishop, Christopher M., and Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4. No. 4. New York: Springer, 2006. [Link]
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
Efron, Bradley, and Trevor Hastie. Computer age statistical inference, student edition: algorithms, evidence, and data science. Vol. 6. Cambridge University Press, 2021. [Link]
Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015. [Link]
Huber, Martin. Causal analysis: Impact evaluation and Causal Machine Learning with applications in R. MIT Press, 2023. [Link]
James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023. [Link]
Katsov, Ilya. Introduction to algorithmic marketing: Artificial intelligence for marketing operations. Grid Dynamics, 2017. [Link]
MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003. [Link]
Murphy, Kevin P. Probabilistic machine learning: Advanced topics. MIT Press, 2023. [Link]
Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT Press, 2022. [Link]
Siddiqi, Naeem. Intelligent credit scoring: Building and implementing better credit risk scorecards. John Wiley & Sons, 2017. [Link]

Courses and lecture notes, posts

Deep Learning

Lippe, Phillip. UvA Deep Learning Tutorials. 2022. [Link]
Ollion, Charles, and Olivier Grisel. Deep Learning course: lecture slides and lab notebooks. Institut Polytechnique de Paris, 2017. [Link]

Explainable AI

Galli, Soledad. Interpreting Machine Learning Models [Link]
Lakkaraju, Hima, et al. Explainable Artificial Intelligence: From Simple Predictors to Complex Generative Models. Harvard University, 2023. [Link]

Machine Learning

Christensen, Henrik I. Support Vector Machines - SVM & RVM. Georgia Insitute of Technology. [Link]
Inria. Machine learning in Python with scikit-learn. FUN, 2023. [Link]
MLU-Explain Team. MLU-Explain. Amazon (2021). [Link]

Reinforcement Learning and Control Theory

Dimitry Bertsekas. Reinforcement Learning and Optimal Control. [Link]
Elad Hazan, Karan Singh. Introduction to Online Nonstochastic Control. [Link]

Datasets

Andreas Luttens, et al. Large-scale Docking Datasets for Machine Learning. 2, Zenodo, 8 May 2023. [Link]
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, page 36. [Paper] [Code]

Packages

Data loading

mlx-data. Efficient framework-agnostic data loading. Apple, 2023. [Link]

Explainable AI

Alibi explain. Open-source interpretability library supporting black box, white box, global and local interpratability methods. [Link]
Dalex. Responsible Machine Learning in Python. [Link]
Scikit-explain. User-friendly Python module for machine learning explainability with a comprehensive toolset of interpretability methods. [Link]
Shapash. Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models. MAIF, 2021.[Link]
Sudjianto, Agus, et al. "PiML Toolbox for Interpretable Machine Learning Model Development and Validation." arXiv preprint arXiv:2305.04214

Feature Engineering

Feature_engine. Feature engineering package with sklearn like functionality. [Link]

Hyperparameter optimization

Optuna. A hyperparameter optimization framework. [Link]

Machine Learning techniques

Catboost. A fast, scalable, high-performance Gradient Boosting on Decision Trees library used for ranking, classification, regression, and other machine learning tasks for Python, R, Java, and C++. Supports computation on CPU and GPU. [Link]
Khuat, Thanh Tung, and Bogdan Gabrys. "hyperbox-brain: A Toolbox for Hyperbox-based Machine Learning Algorithms." arXiv preprint arXiv:2210.02704 (2022). [Link]
quantile-forest. Quantile Regression Forests compatible with scikit-learn. [Link]

Papers

Deep Learning

Bayesian approaches

Arbel, Julyan, et al. A Primer on Bayesian Neural Networks: Review and Debates. arXiv preprint arXiv:2309.16314 (2023). [Link]
Hellström, Fredrik, et al. Generalization bounds: perspectives from information theory and PAC-Bayes. arXiv preprint arXiv:2309.04381 (2023). [Link]
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013). [Link]
Nalisnick, Eric, and Padhraic Smyth. "Stick-breaking variational autoencoders." arXiv preprint arXiv:1605.06197 (2016). [Link]

Generative aspects

Coste, Simon. Diffusion. University of Paris, 2023. [Link]
Galerne, Bruno, and Valentin De Bortoli. Generative Modelling. ENS Paris-Saclay, 2023. [Link]

Mathematical aspects: approximation and generalization

Bartlett, Peter L., Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. Acta numerica 30 (2021): 87-201. [Link]
Berner, Julius, et al. The modern mathematics of deep learning. arXiv preprint arXiv:2105.04026 (2021): 86-114. [Link]
DeVore, Ronald, Boris Hanin, and Guergana Petrova. Neural network approximation. Acta Numerica 30 (2021): 327-444. [Link]
Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: Convergence and generalization in neural networks." Advances in neural information processing systems 31 (2018). [Link]
Hornik, Kurt. "Approximation capabilities of multilayer feedforward networks." Neural networks 4.2 (1991): 251-257. [Link]
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2.5 (1989): 359-366. [Link]
Petersen, Philipp Christian. Neural network theory. University of Vienna 535 (2020). [Link]

Mathematical aspects: optimization

Khaled, Ahmed, and Peter Richtárik. "Better theory for SGD in the nonconvex world." arXiv preprint arXiv:2002.03329 (2020). [Link]
Sun, Ruoyu. Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019). [Link]

Machine Learning

Conformal Prediction

Angelopoulos, Anastasios N., and Stephen Bates. "A gentle introduction to conformal prediction and distribution-free uncertainty quantification." arXiv preprint arXiv:2107.07511 (2021). [Link]
Fontana, Matteo, Gianluca Zeni, and Simone Vantini. "Conformal prediction: a unified review of theory and new challenges." arXiv preprint arXiv:2005.07972 (2020). [Link]

Explainable AI

Bilodeau, Blair, et al. "Impossibility theorems for feature attribution." Proceedings of the National Academy of Sciences 121.2 (2024): e2304406120. [Link]
Ibrahim Amoukou, Salim. Trustworthy machine learning: explainability and distribution-free uncertainty quantification. Diss. université Paris-Saclay, 2023. [Link]
Huang, Xuanxiang, and Joao Marques-Silva. "The inadequacy of Shapley values for explainability." arXiv preprint arXiv:2302.08160 (2023). (2023). [Link]

Fuzzy sets

Khuat, Thanh Tung, Dymitr Ruta, and Bogdan Gabrys. "Hyperbox-based machine learning algorithms: a comprehensive survey." Soft Computing 25.2 (2021): 1325-1363. [Link]

Imbalanced data problems

Elor, Yotam, and Hadar Averbuch-Elor. "To SMOTE, or not to SMOTE?." arXiv preprint arXiv:2201.08528 (2022). [Link]
van den Goorbergh, Ruben, et al. "The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression." Journal of the American Medical Informatics Association 29.9 (2022): 1525-1534. [Link]

Training ML models

Mirzasoleiman, Baharan, Jeff Bilmes, and Jure Leskovec. "Coresets for data-efficient training of machine learning models." International Conference on Machine Learning. PMLR, 2020. [Link]

Posts and threads

Explainable AI (XAI)

Of Models and Meanings. SHAP is the Blockchain of xAI. Of Models and Meanings, 2022. [Link]
Of Models and Meanings. What You Could Do with the Shapley Computation. Of Models and Meanings, 2022. [Link]

Imbalanced data problems

Mougan, Carl. Why SMOTE is not used in prize-winning Kaggle solutions?. Data Science, 2021. [Link]

Talks, conferences, and videos

Dieng, Adji B. Learning From Data: The Two Cultures. Association for Computing Machinery, 2021. [Link]
Rich, DJ. Mutual Information. True Theta LLC, 2020. [Link]

📊 Business Intelligence, Data Visualization, Communicating and Reporting

Books

Duarte, Nancy. Resonate: Present visual stories that transform audiences. John Wiley & Sons, 2013. [Link]
Duarte, Nancy. Slide: ology: The art and science of creating great presentations. Vol. 1. Sebastapol: O'Reilly Media, 2008. [Link]
Knaflic, Cole Nussbaumer. Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons, 2015. [Link]
Knaflic, Cole Nussbaumer. Storytelling with data: let's practice!. John Wiley & Sons, 2019. [Link]
Wexler, Steve, Jeffrey Shaffer, and Andy Cotgreave. The big book of dashboards: visualizing your data using real-world business scenarios. John Wiley & Sons, 2017. [Link]
Wilke, Claus O. Fundamentals of data visualization: a primer on making informative and compelling figures. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Datasets

Packages

Data structures

Python

Polars. Dataframes powered by a multithreaded, vectorized query engine, written in Rust. [Link]

Data Visualization and Reporting

Julia

Genie. 🧞The highly productive Julia web framework. [Link]

Python

Marimo. marimo is an open-source reactive notebook for Python — reproducible, git-friendly, executable as a script, and shareable as an app. [Link]
PyGWalker. Turn your pandas dataframe into an interactive UI for visual analysis. [Link]
Streamlit. A faster way to build and share data apps. [Link]
Vizro. Vizro is a toolkit for creating modular data visualization applications. [Link]

Papers

Posts and threads

Talks, conferences, and videos

💻 Computer Science and Software Engineering

Books

Algorithmics, data structures, and programming languages

Downey, Allen. Think complexity: complexity science and computational modeling. " O'Reilly Media, Inc.", 2018. [Link]
Downey, Allen. Think data structures: algorithms and information retrieval in Java. " O'Reilly Media, Inc.", 2017. [Link]
Downey, Allen. Think Python. " O'Reilly Media, Inc.", 2012. [Link]
Johnston, Nathaniel, and Dave Greene. Conway's Game of Life: Mathematics and Construction. Self-published, 2022. [Link]
Miller, Brad, and David Ranum. Problem-solving with algorithms and data structures. University of Auckland, 2013. [Link] [Website]
Nipkow, Tobias. "Functional Data Structures and Algorithms A Proof Assistant Approach." (2023). [Link]

Scientific programming

Blondel, Mathieu, and Vincent Roulet. "The Elements of Differentiable Programming." arXiv preprint arXiv:2403.14606 (2024). [Link]

Software development

Chacon, Scott, and Ben Straub. Pro git. Springer Nature, 2014. [Link]

Databases

Petrov, Alex. Database Internals: A deep dive into how distributed data systems work. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Algorithms

Roughgarden, Tim. Lecture Notes. Columbia University. [Link]

Scientific programming

Raschka, Sebastian. Scientific Computing in Python: Introduction to NumPy and Matplotlib. sebastianraschka.com, 2020. [Link]

Software engineering

Atlassian. Gitflow workflow. [Link]
Atlassian. Trunk-based development. [[Link]](Trunk-based development)
Shvets, Alexander. Refactoring Guru. 2014. [Link]

Packages

Python

Data processing

Bytewax. Python Stream Processing. [Link]

GUI

Textual. The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser. [Link]

Papers

Posts and threads

Talks, conferences, and videos

🗺️ Geospatial Analysis

Books

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. Geocomputation with R. CRC Press, 2019. [Link]
Moraga, Paula. Geospatial health data: Modeling and visualization with R-INLA and shiny. CRC Press, 2019. [Link]
Moraga, Paula. Spatial Statistics for Data Science: Theory and Practice with R. CRC Press, 2023. [Link]

Courses and lecture notes, posts

Datasets

Packages

Papers

Posts and threads

Talks, conferences, and videos

👩‍🔬 Mathematics, Operations Research, Game Theory, and Simulations

Books

Algebra

Axler, Sheldon. Linear algebra done right. Springer Nature, 2023. [Link]

Applied Mathematics

Isoz, Vincent. Opera Magistris (Elements of Applied Mathematics). Sciences.ch, 2016. [Link]

Game Theory and Simulations

Downey, Allen B. Modeling and Simulation in Python: An Introduction for Scientists and Engineers. No Starch Press, 2023. [Link]

Graph Theory

McNulty, Keith. Handbook of graphs and networks in people analytics: with examples in R and Python. CRC Press, 2022. [Link]
Sargent, Thomas J., and John Stachurski. Economic Networks: Theory and Computation. QuantEcon, 2022. [Link]

Optimization

Boumal, Nicolas. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. [Link]
Boyd, Stephen P., and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004. [Link]
Kwon, Changhyun. Julia Programming for Operations Research. Changhyun Kwon, 2019. [Link]
Martins, J. R. R. A. and Ning, A., Engineering Design Optimization, Cambridge University Press, 2022. [Link]
Nesterov, Yurii. Lectures on convex optimization. Vol. 137. Berlin: Springer, 2018. [Link]
Sargent, Thomas J., and John Stachurski. Dynamic Programming Volume 1. QuantEcon, 2023. [Link]

Sequential Problems

Powell, Warren B. Sequential decision analytics and modeling: modeling with Python. Now, 2022. [Link]

Courses and lecture notes, posts

Mathematical Finance

Kempthorne, Peter, et al. "Topics in mathematics with applications in finance." Massachusetts Institute of Technology: MIT OpenCouseWare, 2013. [Link]
Roncalli, Thierry, Course 2023-2024 in Portfolio Allocation and Asset Management. SSRN, 2024. [Link]

Probability

Arya, Nisha. Learn Probability in Computer Science with Stanford University for FREE. KDNuggets, 2023. [Link]

Datasets

Packages

Optimization

Diamond, Steven, and Stephen Boyd. "CVXPY: A Python-embedded modeling language for convex optimization." Journal of Machine Learning Research 17.83 (2016): 1-5. [Link to the paper] [Link to the package]
PyPortfolioOpt. Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity. [Link]
scikit-portfolio. A portfolio optimization tool with scikit-learn interface. Hyperparameters selection and easy plotting of efficient frontiers. [Link]

Sensitivity analysis

SALib. Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods. [Link]

Papers

Posts and threads

Optimization

Jones, Andy. Natural gradients. Andy Jones. [Link]

Talks, conferences, and videos

MATLAB. Why Padé Approximations Are Great! | Control Systems in Practice. YouTube, 2022. [Link]

🤯 Methodology, interactions, and philosophical aspects of Science

Building theories

Jaccard, James, and Jacob Jacoby. Theory construction and model-building skills: A practical guide for social scientists. Guilford publications, 2019. [Link] [Website]

Computational Science

Judd, Kenneth. The Potential Partnership Between Economics and Computational Science. PyData Chicago, 2021. [Link]

Machine Learning and Statistics

Breiman, Leo. "Statistical modeling: The two cultures (with comments and a rejoinder by the author)." Statistical science 16.3 (2001): 199-231. [Link]
Harrell, Frank. "Classification vs. Prediction". Statistical Thinking, 2017. [Link]

Mathematics

Polya, George. How to solve it: A new aspect of mathematical method. Vol. 85. Princeton university press, 2004. [Link]

Scientific approaches

Wolfram, Stephen. A new kind of science. Vol. 5. Champaign, IL: Wolfram media, 2002. [Link]

📈 Statistics, Econometrics, and Data Mining

Books

Clustering

Govaert, Gérard, and Mohamed Nadif. Co-clustering: models, algorithms and applications. John Wiley & Sons, 2013. [Link]
Scrucca, Luca, et al. Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC, 2023. [Link]

Econometrics

Ding, Peng. "Linear Model and Extensions." arXiv preprint arXiv:2401.00649 (2024). [Link]
Evans, Richard W., Computational Methods for Economists using Python, Open access Jupyter Book, v#.#.#, 2023. [Link]
Wooldridge, Jeffrey M.. Introductory Econometrics: A Modern Approach. Brésil, Cengage Learning, 2020. [Link]

Statistics

Bayesian Statistics

Martin, Osvaldo A., Ravin Kumar, and Junpeng Lao. Bayesian modeling and computation in Python. CRC Press, 2021. [Link]
McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2020. [Link]

Exponential family

Agresti, Alan. Categorical data analysis. Vol. 792. John Wiley & Sons, 2012. [Link]
Efron, Bradley. Exponential families in theory and practice. Cambridge University Press, 2022. [Link]

Historical aspects

Fischer, Hans. A history of the central limit theorem: from classical to modern probability theory. Vol. 4. New York: Springer, 2011. [Link]

Inference and mathematical aspects

Soch, Joram, et al. StatProofBook/StatProofBook.Github.Io: StatProofBook 2021. 2021, Zenodo, 2022. [Link]
Wasserman, Larry. All of nonparametric statistics. Springer Science & Business Media, 2006. [Link]
Wasserman, Larry. All of statistics: a concise course in statistical inference. Vol. 26. New York: Springer, 2004. [Link]

Missing data

Van Buuren, Stef. Flexible imputation of missing data. CRC Press, 2018. [Link]

Regression modeling

McNulty, Keith. Handbook of regression modeling in people analytics: with examples in R and Python. CRC Press, 2021. [Link]

Statistical software

Kuhn, Max, and Julia Silge. Tidy modeling with R. " O'Reilly Media, Inc.", 2022. [Link]
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. " O'Reilly Media, Inc.". [Link]

Time Series

Cochrane, John H. "Time series for macroeconomics and finance." (1997). [Link]
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. [Link]
Neusser, Klaus. Time series econometrics. Springer publication, 2016. [Link]

Courses and lecture notes, posts

Causal Inference

Cunningham, Scott et al. Mixtape Sessions: Causal Inference. 2022. [Link]
Ding, Peng. "A First Course in Causal Inference." arXiv preprint arXiv:2305.18793 (2023). [Link]

Econometrics

Canay, Ivan. Econ 480-3 - Introduction to Econometrics. Northwestern University, 2021. [Link]
De Haan, Monique. ECON4150 - Introductory Econometrics. University of Oslo, 2018. [Link]

Statistics & Probability

Dunn, Peter K. The Theory of Distributions, 2023. [Link]
Kozyrkov, Cassie. Statistical Thinking. YouTube, 2019. [Link]
Kunin, Daniel, et al. Seeing Theory. Brown University, 2016. [Link]

Forecasting

Manani, Galli. Feature Engineering for Time Series Forecasting, 2022. [Link]

Datasets

Forecasting

Godahewa, Rakshitha, et al. "Monash time series forecasting archive." arXiv preprint arXiv:2105.06643 (2021). [Link]
Lotsa Data. Salesforce, Hugging Face (2024). [Link]

Marketing applications

"6 Free, High-Quality, Marketing Mix Modeling Datasets | Forecastegy." Web. 10/14/2023 [Link]
Gaël Bernard and Periklis Andritsos. Datasets Simulating Customer Journeys. [Link]

Packages

Python

Time Series

Alexandrov, Alexander, et al. "Gluonts: Probabilistic and neural time series modeling in python." The Journal of Machine Learning Research 21.1 (2020): 4629-4634. [Link]
Salvador, Stan, and Philip Chan. "Toward accurate dynamic time warping in linear time and space." Intelligent Data Analysis 11.5 (2007): 561-580. [Link]
Fold. Fast Adaptive Time Series ML Engine. [Link]
Functime. Time-series machine learning at scale. Built on Polars for embarrassingly parallel feature engineering and forecasts. [Link]
HierarchicalForecast. Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods. [Link]
MFLES. A Specific implementation from ThymeBoost written with the help of Numba. [Link]
mlforecast. Scalable machine 🤖 learning for time series forecasting. [Link]
NeuralForecast. Scalable and user-friendly neural 🧠 forecasting algorithms. [Link]
SKForecast. Simplifies using sklearn models to do single and multistep forecasting and backtesting. [Link]
StatsForecast. Lightning ⚡️ fast forecasting with statistical and econometric models. [Link]
ThymeBoost. Forecasting with Gradient Boosted Time Series Decomposition. [Link]
vectorbt. Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research. [Link]

R

Ross, Gordon J., and Dean Markwick. "dirichletprocess: An R package for fitting complex Bayesian nonparametric models." (2018). [Link]
van Buuren, S., and K. Groothuis-Oudshoorn. “Mice: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software, vol. 45, no. 3, Dec. 2011, pp. 1-67, doi:10.18637/jss.v045.i03. [Paper] [Package]

Papers

Clustering

Keribin, Christine, Gilles Celeux, and Valérie Robert. "The latent block model: a useful model for high dimensional data." ISI 2017-61st world statistics congress. 2017. [Link]
Pham, Tung, et al. "Fast support vector clustering." Vietnam Journal of Computer Science 4 (2017): 13-21. [Link]
Pham, Tung, Trung Le, and Hang Dang. "Scalable support vector clustering using budget." arXiv preprint arXiv:1709.06444 (2017).

Probabilistic Graphical Models and associated optimization techniques

Blei, David M. Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Application 1 (2014): 203-232. [Link]
Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American Statistical Association 112.518 (2017): 859-877. [Link]
Dieng, Adji Bousso. Deep Probabilistic Graphical Modeling. Columbia University, 2020. [Link]
Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. "Implicit reparameterization gradients." Advances in neural information processing systems 31 (2018). [Link]
Gelman, Andrew, Xiao-Li Meng, and Hal Stern. "Posterior predictive assessment of model fitness via realized discrepancies." Statistica sinica (1996): 733-760. [Link]
Kim, Kyurae, et al. "Black-Box Variational Inference Converges." arXiv preprint arXiv:2305.15349 (2023). [Link]

Statistics

Bayesian Statistics

Clarke, Bertrand, and Yuling Yao. "A Cheat Sheet for Bayesian Prediction." arXiv preprint arXiv:2304.12218 (2023). [Link]

Causality

Assaad, Charles K., Emilie Devijver, and Eric Gaussier. "Survey and evaluation of causal discovery methods for time series." Journal of Artificial Intelligence Research 73 (2022): 767-819. [Link]

Distributions

Leemis, Lawrence M., and Jacquelyn T. McQueston. "Univariate distribution relationships." The American Statistician 62.1 (2008): 45-53. [Paper] [Website].
Olszewski, Adrian. Challenging the cult of the prevalent normal distribution in nature. 2KMM, 2023. [Link]

Statistical hypothesis testing (NHST)

Gelman, Andrew. “Commentary: P Values and Statistical Practice.” Epidemiology, vol. 24, no. 1, 2013, pp. 69–72. JSTOR. Accessed 10 Dec. 2023. [Link]
Greenland, Sander et al. “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.” European journal of epidemiology vol. 31,4 (2016): 337-50. doi:10.1007/s10654-016-0149-3 [Link]
Lakens, Daniël. “Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses.” Social psychological and personality science vol. 8,4 (2017): 355-362. doi:10.1177/1948550617697177 [Link]
Lin, Mingfeng, et al. “Research Commentary: Too Big to Fail: Large Samples and the p-Value Problem.” Information Systems Research, vol. 24, no. 4, 2013, pp. 906–17. JSTOR. Accessed 10 Dec. 2023. [Link]
Lumley, Thomas et al. “The importance of the normality assumption in large public health data sets.” Annual review of public health vol. 23 (2002): 151-69. doi:10.1146/annurev.publhealth.23.100901.140546 [Link]
Mohd Razali, Nornadiah, and Bee Yap. ‘Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests’. J. Stat. Model. Analytics, vol. 2, 01 2011. [Link]
Morey, Richard D et al. “The fallacy of placing confidence in confidence intervals.” Psychonomic bulletin & review vol. 23,1 (2016): 103-23. doi:10.3758/s13423-015-0947-8 [Link]
Olzsewski, Adrian. Mann-Whitney (Wilcoxon) and Kruskal-Wallis FAIL to compare medians in general. Quantile regression should be used to compare medians instead. [Link]
Olszewski, Adrian. On the p-values - links library significance ditching. Adrian Olszewski, 2022. [Link]
Olzsewski, Adrian. Testing hypotheses through statistical models opens a universe of new possibilities. Learn how to improve your daily work with this approach. [Link]Pernet, Cyril. “Null hypothesis significance testing: a short tutorial.” F1000Research vol. 4 621. 25 Aug. 2015, doi:10.12688/f1000research.6963.3 [Link]
Serdar, Ceyhan Ceran et al. “Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies.” Biochemia medica vol. 31,1 (2021): 010502. doi:10.11613/BM.2021.010502 [Link]
The American Statistician, Volume 73, Issue sup1 (2019) [Link]
Verhagen, Arianne P., et al. ‘Is the p Value Really so Significant?*’. Australian Journal of Physiotherapy, vol. 50, no. 4, 2004, pp. 261–262. [Link]

Posts and threads

Bayesian Statistics

Camara-Escudero, Mauro. Variational Auto-Encoders and the Expectation-Maximization Algorithm. Mauro Camara-Escudero, 2020. [Link]
Patacchiola, Massimiliano. Evidence, KL-divergence, and ELBO. Massimiliano Patacchiola, 2021. [Link]
Yao, Yuling. Bayes is guaranteed to overfit, for any model, any prior, and every data point. Yuling Yao, 2023. [Link]

General topics

Harrell, Frank. Classification vs. Prediction. Statistical Thinking, 2017. [Link]

Variable selection / Feature selection

gung Reinstate Monica (https://stats.stackexchange.com/users/7290/gung-reinstate monica). Algorithms for Automatic Model Selection. Cross Validated, https://stats.stackexchange.com/q/20856. [Link]
Shtoff, Alex. “Are polynomial features the root of all evil?". Alex Shtoff, 2024. [Link]
Sribney, Bill. What are some of the problems with stepwise regression? StataCorp, 1996. [Link]

Talks, conferences, and videos

Bayesian Statistics

Chopin, Nicolas, et al. "Bayesian Causal Inference for Real World Interactive Systems." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. [Link]
Jordan, Michael. Nonparametric Bayesian Methods: Models, Algorithms, and Applications II. UC Berkeley, 2017 [Link]
Maxim Kochurov. State of Bayes Lecture Series. PyMC Labs, 2023. [Link]
Pragmatic Data Scientists. Making Informed Decisions with Bayesianism: A Conversation with Kenneth, Statistician at Meta. Pragmatic Data Scientist, 2023. [Link]

Stochastic Processes

Hakenes, Hendrik. Ito's Lemma -- Some intuitive explanations on the solution of stochastic differential equations. University of Bonn, 2021. [Link]

📄 Text Mining and Natural Language Processing

Books

Silge, Julia, and David Robinson. Text mining with R: A tidy approach. " O'Reilly Media, Inc.", 2017. [Link]

Courses and lecture notes, posts

Datasets

Horwood, Ghraham V. Humanitarian Assistance and Disaster Relief (HA/DR) Articles and Lexicon. V1, Harvard Dataverse, 2017, doi:10.7910/DVN/TGOPRU. [Link]

Packages

Papers

Goldberg, Yoav. "A primer on neural network models for natural language processing." Journal of Artificial Intelligence Research 57 (2016): 345-420. [Link]
Minaee, Shervin, et al. "Large Language Models: A Survey." arXiv preprint arXiv:2402.06196 (2024). [Link]

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
UNDER-REVIEW.md		UNDER-REVIEW.md

License

RoyAalekh/Awesome-Decision-Science

Folders and files

Latest commit

History

Repository files navigation

Awesome Decision Science

À propos

Sommaire

🤖 Artificial Intelligence, Computational Intelligence, and Machine Learning

Books

Computational Intelligence

Deep Learning

Explainable AI

Machine Learning

Courses and lecture notes, posts

Deep Learning

Explainable AI

Machine Learning

Reinforcement Learning and Control Theory

Datasets

Packages

Data loading

Explainable AI

Feature Engineering

Hyperparameter optimization

Machine Learning techniques

Papers

Deep Learning

Bayesian approaches

Generative aspects

Mathematical aspects: approximation and generalization

Mathematical aspects: optimization

Machine Learning

Conformal Prediction

Explainable AI

Fuzzy sets

Imbalanced data problems

Training ML models

Posts and threads

Explainable AI (XAI)

Imbalanced data problems

Talks, conferences, and videos

📊 Business Intelligence, Data Visualization, Communicating and Reporting

Books

Courses and lecture notes, posts

Datasets

Packages

Data structures

Python

Data Visualization and Reporting

Julia

Python

Papers

Posts and threads

Talks, conferences, and videos

💻 Computer Science and Software Engineering

Books

Algorithmics, data structures, and programming languages

Scientific programming

Software development

Databases

Courses and lecture notes, posts

Algorithms

Scientific programming

Software engineering

Packages

Python

Data processing

GUI

Papers

Posts and threads

Talks, conferences, and videos

🗺️ Geospatial Analysis

Books

Courses and lecture notes, posts

Datasets

Packages

Papers

Posts and threads