Skip to content

Une liste de ressources sur tout ce qui touche à la prise de décision : vidéos, tutoriels, livres, documents, thèses, articles, datasets et libs open source.

License

Notifications You must be signed in to change notification settings

RoyAalekh/Awesome-Decision-Science

 
 

Repository files navigation

Awesome Decision Science Awesome

Une liste de ressources triées sur le volet à propos de tout ce qui touche à la prise de décision : vidéos, tutoriels, livres, documents, thèses, articles, datasets et libs open source. Cliquez sur le menu hamburger 🍔 pour naviguer plus facilement.

👍 Vous aimez ? Un like, un partage aiderait grandement le projet ! Partageons les connaissances !

⚠️ Disclaimer : la quasi-totalité des ressources est dispo gratuitement et légalement. Je ne touche rien sur les ventes des rares ressources payantes, qui sont simplement référencées car j'estime que ce sont des ressources de valeur.

À propos

Moi c'est Miguel 👋 J'aide les leaders B2B (conseil, bancassurance, PME) en transformant la prise de décision en science exacte ! Des entreprises comme Accuracy, le Crédit Agricole et Lizeo font déjà des choix 95% plus sûrs grâce à mes méthodes de prévision financière. ☎️ Et si on discutait de comment valoriser votre patrimoine Data ?

Sommaire

🤖 Artificial Intelligence, Computational Intelligence, and Machine Learning

Books

Computational Intelligence

  • Engelbrecht, Andries P. Computational intelligence: an introduction. John Wiley & Sons, 2007. [Link]

Deep Learning

  • Bishop, Christopher M., and Hugh Bishop. "Deep learning: foundations and concepts." Springer, 2024. [Link]
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
  • Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016. [Link]
  • Grohs, Philipp, and Gitta Kutyniok, eds. Mathematical aspects of deep learning. Cambridge University Press, 2022. [Link]
  • Prince, Simon JD. Understanding Deep Learning. MIT press, 2023. [Link]
  • Zhang, Aston, et al. Dive into deep learning. Cambridge University Press, 2023. [Link]

Explainable AI

  • Biecek, Przemyslaw, and Tomasz Burzykowski. Explanatory model analysis: explore, explain, and examine predictive models. CRC Press, 2021. [Link]
  • Hall, Curtis and Pandey. Machine Learning for High-Risk Applications. O'Reilly, 2023. [Link]
  • Molnar, Christoph. Interpretable machine learning. Lulu. com, 2020. [Link]

Machine Learning

  • Bishop, Christopher M., and Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4. No. 4. New York: Springer, 2006. [Link]
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
  • Efron, Bradley, and Trevor Hastie. Computer age statistical inference, student edition: algorithms, evidence, and data science. Vol. 6. Cambridge University Press, 2021. [Link]
  • Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015. [Link]
  • Huber, Martin. Causal analysis: Impact evaluation and Causal Machine Learning with applications in R. MIT Press, 2023. [Link]
  • James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023. [Link]
  • Katsov, Ilya. Introduction to algorithmic marketing: Artificial intelligence for marketing operations. Grid Dynamics, 2017. [Link]
  • MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003. [Link]
  • Murphy, Kevin P. Probabilistic machine learning: Advanced topics. MIT Press, 2023. [Link]
  • Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT Press, 2022. [Link]
  • Siddiqi, Naeem. Intelligent credit scoring: Building and implementing better credit risk scorecards. John Wiley & Sons, 2017. [Link]

Courses and lecture notes, posts

Deep Learning

  • Lippe, Phillip. UvA Deep Learning Tutorials. 2022. [Link]
  • Ollion, Charles, and Olivier Grisel. Deep Learning course: lecture slides and lab notebooks. Institut Polytechnique de Paris, 2017. [Link]

Explainable AI

  • Galli, Soledad. Interpreting Machine Learning Models [Link]
  • Lakkaraju, Hima, et al. Explainable Artificial Intelligence: From Simple Predictors to Complex Generative Models. Harvard University, 2023. [Link]

Machine Learning

  • Christensen, Henrik I. Support Vector Machines - SVM & RVM. Georgia Insitute of Technology. [Link]
  • Inria. Machine learning in Python with scikit-learn. FUN, 2023. [Link]
  • MLU-Explain Team. MLU-Explain. Amazon (2021). [Link]

Reinforcement Learning and Control Theory

  • Dimitry Bertsekas. Reinforcement Learning and Optimal Control. [Link]
  • Elad Hazan, Karan Singh. Introduction to Online Nonstochastic Control. [Link]

Datasets

  • Andreas Luttens, et al. Large-scale Docking Datasets for Machine Learning. 2, Zenodo, 8 May 2023. [Link]
  • Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, page 36. [Paper] [Code]

Packages

Data loading

  • mlx-data. Efficient framework-agnostic data loading. Apple, 2023. [Link]

Explainable AI

  • Alibi explain. Open-source interpretability library supporting black box, white box, global and local interpratability methods. [Link]
  • Dalex. Responsible Machine Learning in Python. [Link]
  • Scikit-explain. User-friendly Python module for machine learning explainability with a comprehensive toolset of interpretability methods. [Link]
  • Shapash. Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models. MAIF, 2021.[Link]
  • Sudjianto, Agus, et al. "PiML Toolbox for Interpretable Machine Learning Model Development and Validation." arXiv preprint arXiv:2305.04214

Feature Engineering

  • Feature_engine. Feature engineering package with sklearn like functionality. [Link]

Hyperparameter optimization

  • Optuna. A hyperparameter optimization framework. [Link]

Machine Learning techniques

  • Catboost. A fast, scalable, high-performance Gradient Boosting on Decision Trees library used for ranking, classification, regression, and other machine learning tasks for Python, R, Java, and C++. Supports computation on CPU and GPU. [Link]
  • Khuat, Thanh Tung, and Bogdan Gabrys. "hyperbox-brain: A Toolbox for Hyperbox-based Machine Learning Algorithms." arXiv preprint arXiv:2210.02704 (2022). [Link]
  • quantile-forest. Quantile Regression Forests compatible with scikit-learn. [Link]

Papers

Deep Learning

Bayesian approaches
  • Arbel, Julyan, et al. A Primer on Bayesian Neural Networks: Review and Debates. arXiv preprint arXiv:2309.16314 (2023). [Link]
  • Hellström, Fredrik, et al. Generalization bounds: perspectives from information theory and PAC-Bayes. arXiv preprint arXiv:2309.04381 (2023). [Link]
  • Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013). [Link]
  • Nalisnick, Eric, and Padhraic Smyth. "Stick-breaking variational autoencoders." arXiv preprint arXiv:1605.06197 (2016). [Link]
Generative aspects
  • Coste, Simon. Diffusion. University of Paris, 2023. [Link]
  • Galerne, Bruno, and Valentin De Bortoli. Generative Modelling. ENS Paris-Saclay, 2023. [Link]
Mathematical aspects: approximation and generalization
  • Bartlett, Peter L., Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. Acta numerica 30 (2021): 87-201. [Link]
  • Berner, Julius, et al. The modern mathematics of deep learning. arXiv preprint arXiv:2105.04026 (2021): 86-114. [Link]
  • DeVore, Ronald, Boris Hanin, and Guergana Petrova. Neural network approximation. Acta Numerica 30 (2021): 327-444. [Link]
  • Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: Convergence and generalization in neural networks." Advances in neural information processing systems 31 (2018). [Link]
  • Hornik, Kurt. "Approximation capabilities of multilayer feedforward networks." Neural networks 4.2 (1991): 251-257. [Link]
  • Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2.5 (1989): 359-366. [Link]
  • Petersen, Philipp Christian. Neural network theory. University of Vienna 535 (2020). [Link]
Mathematical aspects: optimization
  • Khaled, Ahmed, and Peter Richtárik. "Better theory for SGD in the nonconvex world." arXiv preprint arXiv:2002.03329 (2020). [Link]
  • Sun, Ruoyu. Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019). [Link]

Machine Learning

Conformal Prediction
  • Angelopoulos, Anastasios N., and Stephen Bates. "A gentle introduction to conformal prediction and distribution-free uncertainty quantification." arXiv preprint arXiv:2107.07511 (2021). [Link]
  • Fontana, Matteo, Gianluca Zeni, and Simone Vantini. "Conformal prediction: a unified review of theory and new challenges." arXiv preprint arXiv:2005.07972 (2020). [Link]
Explainable AI
  • Bilodeau, Blair, et al. "Impossibility theorems for feature attribution." Proceedings of the National Academy of Sciences 121.2 (2024): e2304406120. [Link]
  • Ibrahim Amoukou, Salim. Trustworthy machine learning: explainability and distribution-free uncertainty quantification. Diss. université Paris-Saclay, 2023. [Link]
  • Huang, Xuanxiang, and Joao Marques-Silva. "The inadequacy of Shapley values for explainability." arXiv preprint arXiv:2302.08160 (2023). (2023). [Link]
Fuzzy sets
  • Khuat, Thanh Tung, Dymitr Ruta, and Bogdan Gabrys. "Hyperbox-based machine learning algorithms: a comprehensive survey." Soft Computing 25.2 (2021): 1325-1363. [Link]
Imbalanced data problems
  • Elor, Yotam, and Hadar Averbuch-Elor. "To SMOTE, or not to SMOTE?." arXiv preprint arXiv:2201.08528 (2022). [Link]
  • van den Goorbergh, Ruben, et al. "The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression." Journal of the American Medical Informatics Association 29.9 (2022): 1525-1534. [Link]
Training ML models
  • Mirzasoleiman, Baharan, Jeff Bilmes, and Jure Leskovec. "Coresets for data-efficient training of machine learning models." International Conference on Machine Learning. PMLR, 2020. [Link]

Posts and threads

Explainable AI (XAI)

  • Of Models and Meanings. SHAP is the Blockchain of xAI. Of Models and Meanings, 2022. [Link]
  • Of Models and Meanings. What You Could Do with the Shapley Computation. Of Models and Meanings, 2022. [Link]

Imbalanced data problems

  • Mougan, Carl. Why SMOTE is not used in prize-winning Kaggle solutions?. Data Science, 2021. [Link]

Talks, conferences, and videos

  • Dieng, Adji B. Learning From Data: The Two Cultures. Association for Computing Machinery, 2021. [Link]
  • Rich, DJ. Mutual Information. True Theta LLC, 2020. [Link]

📊 Business Intelligence, Data Visualization, Communicating and Reporting

Books

  • Duarte, Nancy. Resonate: Present visual stories that transform audiences. John Wiley & Sons, 2013. [Link]
  • Duarte, Nancy. Slide: ology: The art and science of creating great presentations. Vol. 1. Sebastapol: O'Reilly Media, 2008. [Link]
  • Knaflic, Cole Nussbaumer. Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons, 2015. [Link]
  • Knaflic, Cole Nussbaumer. Storytelling with data: let's practice!. John Wiley & Sons, 2019. [Link]
  • Wexler, Steve, Jeffrey Shaffer, and Andy Cotgreave. The big book of dashboards: visualizing your data using real-world business scenarios. John Wiley & Sons, 2017. [Link]
  • Wilke, Claus O. Fundamentals of data visualization: a primer on making informative and compelling figures. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Datasets

Packages

Data structures

Python
  • Polars. Dataframes powered by a multithreaded, vectorized query engine, written in Rust. [Link]

Data Visualization and Reporting

Julia
  • Genie. 🧞The highly productive Julia web framework. [Link]
Python
  • Marimo. marimo is an open-source reactive notebook for Python — reproducible, git-friendly, executable as a script, and shareable as an app. [Link]
  • PyGWalker. Turn your pandas dataframe into an interactive UI for visual analysis. [Link]
  • Streamlit. A faster way to build and share data apps. [Link]
  • Vizro. Vizro is a toolkit for creating modular data visualization applications. [Link]

Papers

Posts and threads

Talks, conferences, and videos

💻 Computer Science and Software Engineering

Books

Algorithmics, data structures, and programming languages

  • Downey, Allen. Think complexity: complexity science and computational modeling. " O'Reilly Media, Inc.", 2018. [Link]
  • Downey, Allen. Think data structures: algorithms and information retrieval in Java. " O'Reilly Media, Inc.", 2017. [Link]
  • Downey, Allen. Think Python. " O'Reilly Media, Inc.", 2012. [Link]
  • Johnston, Nathaniel, and Dave Greene. Conway's Game of Life: Mathematics and Construction. Self-published, 2022. [Link]
  • Miller, Brad, and David Ranum. Problem-solving with algorithms and data structures. University of Auckland, 2013. [Link] [Website]
  • Nipkow, Tobias. "Functional Data Structures and Algorithms A Proof Assistant Approach." (2023). [Link]

Scientific programming

  • Blondel, Mathieu, and Vincent Roulet. "The Elements of Differentiable Programming." arXiv preprint arXiv:2403.14606 (2024). [Link]

Software development

  • Chacon, Scott, and Ben Straub. Pro git. Springer Nature, 2014. [Link]

Databases

  • Petrov, Alex. Database Internals: A deep dive into how distributed data systems work. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Algorithms

  • Roughgarden, Tim. Lecture Notes. Columbia University. [Link]

Scientific programming

  • Raschka, Sebastian. Scientific Computing in Python: Introduction to NumPy and Matplotlib. sebastianraschka.com, 2020. [Link]

Software engineering

  • Atlassian. Gitflow workflow. [Link]
  • Atlassian. Trunk-based development. [[Link]](Trunk-based development)
  • Shvets, Alexander. Refactoring Guru. 2014. [Link]

Packages

Python

Data processing
  • Bytewax. Python Stream Processing. [Link]
GUI
  • Textual. The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser. [Link]

Papers

Posts and threads

Talks, conferences, and videos

🗺️ Geospatial Analysis

Books

  • Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. Geocomputation with R. CRC Press, 2019. [Link]
  • Moraga, Paula. Geospatial health data: Modeling and visualization with R-INLA and shiny. CRC Press, 2019. [Link]
  • Moraga, Paula. Spatial Statistics for Data Science: Theory and Practice with R. CRC Press, 2023. [Link]

Courses and lecture notes, posts

Datasets

Packages

Papers

Posts and threads

Talks, conferences, and videos

👩‍🔬 Mathematics, Operations Research, Game Theory, and Simulations

Books

Algebra

  • Axler, Sheldon. Linear algebra done right. Springer Nature, 2023. [Link]

Applied Mathematics

  • Isoz, Vincent. Opera Magistris (Elements of Applied Mathematics). Sciences.ch, 2016. [Link]

Game Theory and Simulations

  • Downey, Allen B. Modeling and Simulation in Python: An Introduction for Scientists and Engineers. No Starch Press, 2023. [Link]

Graph Theory

  • McNulty, Keith. Handbook of graphs and networks in people analytics: with examples in R and Python. CRC Press, 2022. [Link]
  • Sargent, Thomas J., and John Stachurski. Economic Networks: Theory and Computation. QuantEcon, 2022. [Link]

Optimization

  • Boumal, Nicolas. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. [Link]
  • Boyd, Stephen P., and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004. [Link]
  • Kwon, Changhyun. Julia Programming for Operations Research. Changhyun Kwon, 2019. [Link]
  • Martins, J. R. R. A. and Ning, A., Engineering Design Optimization, Cambridge University Press, 2022. [Link]
  • Nesterov, Yurii. Lectures on convex optimization. Vol. 137. Berlin: Springer, 2018. [Link]
  • Sargent, Thomas J., and John Stachurski. Dynamic Programming Volume 1. QuantEcon, 2023. [Link]

Sequential Problems

  • Powell, Warren B. Sequential decision analytics and modeling: modeling with Python. Now, 2022. [Link]

Courses and lecture notes, posts

Mathematical Finance

  • Kempthorne, Peter, et al. "Topics in mathematics with applications in finance." Massachusetts Institute of Technology: MIT OpenCouseWare, 2013. [Link]
  • Roncalli, Thierry, Course 2023-2024 in Portfolio Allocation and Asset Management. SSRN, 2024. [Link]

Probability

  • Arya, Nisha. Learn Probability in Computer Science with Stanford University for FREE. KDNuggets, 2023. [Link]

Datasets

Packages

Optimization

  • Diamond, Steven, and Stephen Boyd. "CVXPY: A Python-embedded modeling language for convex optimization." Journal of Machine Learning Research 17.83 (2016): 1-5. [Link to the paper] [Link to the package]
  • PyPortfolioOpt. Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity. [Link]
  • scikit-portfolio. A portfolio optimization tool with scikit-learn interface. Hyperparameters selection and easy plotting of efficient frontiers. [Link]

Sensitivity analysis

  • SALib. Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods. [Link]

Papers

Posts and threads

Optimization

  • Jones, Andy. Natural gradients. Andy Jones. [Link]

Talks, conferences, and videos

  • MATLAB. Why Padé Approximations Are Great! | Control Systems in Practice. YouTube, 2022. [Link]

🤯 Methodology, interactions, and philosophical aspects of Science

Building theories

  • Jaccard, James, and Jacob Jacoby. Theory construction and model-building skills: A practical guide for social scientists. Guilford publications, 2019. [Link] [Website]

Computational Science

  • Judd, Kenneth. The Potential Partnership Between Economics and Computational Science. PyData Chicago, 2021. [Link]

Machine Learning and Statistics

  • Breiman, Leo. "Statistical modeling: The two cultures (with comments and a rejoinder by the author)." Statistical science 16.3 (2001): 199-231. [Link]
  • Harrell, Frank. "Classification vs. Prediction". Statistical Thinking, 2017. [Link]

Mathematics

  • Polya, George. How to solve it: A new aspect of mathematical method. Vol. 85. Princeton university press, 2004. [Link]

Scientific approaches

  • Wolfram, Stephen. A new kind of science. Vol. 5. Champaign, IL: Wolfram media, 2002. [Link]

📈 Statistics, Econometrics, and Data Mining

Books

Clustering

  • Govaert, Gérard, and Mohamed Nadif. Co-clustering: models, algorithms and applications. John Wiley & Sons, 2013. [Link]
  • Scrucca, Luca, et al. Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC, 2023. [Link]

Econometrics

  • Ding, Peng. "Linear Model and Extensions." arXiv preprint arXiv:2401.00649 (2024). [Link]
  • Evans, Richard W., Computational Methods for Economists using Python, Open access Jupyter Book, v#.#.#, 2023. [Link]
  • Wooldridge, Jeffrey M.. Introductory Econometrics: A Modern Approach. Brésil, Cengage Learning, 2020. [Link]

Statistics

Bayesian Statistics
  • Martin, Osvaldo A., Ravin Kumar, and Junpeng Lao. Bayesian modeling and computation in Python. CRC Press, 2021. [Link]
  • McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2020. [Link]
Exponential family
  • Agresti, Alan. Categorical data analysis. Vol. 792. John Wiley & Sons, 2012. [Link]
  • Efron, Bradley. Exponential families in theory and practice. Cambridge University Press, 2022. [Link]
Historical aspects
  • Fischer, Hans. A history of the central limit theorem: from classical to modern probability theory. Vol. 4. New York: Springer, 2011. [Link]
Inference and mathematical aspects
  • Soch, Joram, et al. StatProofBook/StatProofBook.Github.Io: StatProofBook 2021. 2021, Zenodo, 2022. [Link]
  • Wasserman, Larry. All of nonparametric statistics. Springer Science & Business Media, 2006. [Link]
  • Wasserman, Larry. All of statistics: a concise course in statistical inference. Vol. 26. New York: Springer, 2004. [Link]
Missing data
  • Van Buuren, Stef. Flexible imputation of missing data. CRC Press, 2018. [Link]
Regression modeling
  • McNulty, Keith. Handbook of regression modeling in people analytics: with examples in R and Python. CRC Press, 2021. [Link]
Statistical software
  • Kuhn, Max, and Julia Silge. Tidy modeling with R. " O'Reilly Media, Inc.", 2022. [Link]
  • Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. " O'Reilly Media, Inc.". [Link]

Time Series

  • Cochrane, John H. "Time series for macroeconomics and finance." (1997). [Link]
  • Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. [Link]
  • Neusser, Klaus. Time series econometrics. Springer publication, 2016. [Link]

Courses and lecture notes, posts

Causal Inference

  • Cunningham, Scott et al. Mixtape Sessions: Causal Inference. 2022. [Link]
  • Ding, Peng. "A First Course in Causal Inference." arXiv preprint arXiv:2305.18793 (2023). [Link]

Econometrics

  • Canay, Ivan. Econ 480-3 - Introduction to Econometrics. Northwestern University, 2021. [Link]
  • De Haan, Monique. ECON4150 - Introductory Econometrics. University of Oslo, 2018. [Link]

Statistics & Probability

  • Dunn, Peter  K. The Theory of Distributions, 2023. [Link]
  • Kozyrkov, Cassie. Statistical Thinking. YouTube, 2019. [Link]
  • Kunin, Daniel, et al. Seeing Theory. Brown University, 2016. [Link]

Forecasting

  • Manani, Galli. Feature Engineering for Time Series Forecasting, 2022. [Link]

Datasets

Forecasting

  • Godahewa, Rakshitha, et al. "Monash time series forecasting archive." arXiv preprint arXiv:2105.06643 (2021). [Link]
  • Lotsa Data. Salesforce, Hugging Face (2024). [Link]

Marketing applications

  • "6 Free, High-Quality, Marketing Mix Modeling Datasets | Forecastegy." Web. 10/14/2023 [Link]
  • Gaël Bernard and Periklis Andritsos. Datasets Simulating Customer Journeys. [Link]

Packages

Python

Time Series
  • Alexandrov, Alexander, et al. "Gluonts: Probabilistic and neural time series modeling in python." The Journal of Machine Learning Research 21.1 (2020): 4629-4634. [Link]
  • Salvador, Stan, and Philip Chan. "Toward accurate dynamic time warping in linear time and space." Intelligent Data Analysis 11.5 (2007): 561-580. [Link]
  • Fold. Fast Adaptive Time Series ML Engine. [Link]
  • Functime. Time-series machine learning at scale. Built on Polars for embarrassingly parallel feature engineering and forecasts. [Link]
  • HierarchicalForecast. Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods. [Link]
  • MFLES. A Specific implementation from ThymeBoost written with the help of Numba. [Link]
  • mlforecast. Scalable machine 🤖 learning for time series forecasting. [Link]
  • NeuralForecast. Scalable and user-friendly neural 🧠 forecasting algorithms. [Link]
  • SKForecast. Simplifies using sklearn models to do single and multistep forecasting and backtesting. [Link]
  • StatsForecast. Lightning ⚡️ fast forecasting with statistical and econometric models. [Link]
  • ThymeBoost. Forecasting with Gradient Boosted Time Series Decomposition. [Link]
  • vectorbt. Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research. [Link]

R

  • Ross, Gordon J., and Dean Markwick. "dirichletprocess: An R package for fitting complex Bayesian nonparametric models." (2018). [Link]
  • van Buuren, S., and K. Groothuis-Oudshoorn. “Mice: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software, vol. 45, no. 3, Dec. 2011, pp. 1-67, doi:10.18637/jss.v045.i03. [Paper] [Package]

Papers

Clustering

  • Keribin, Christine, Gilles Celeux, and Valérie Robert. "The latent block model: a useful model for high dimensional data." ISI 2017-61st world statistics congress. 2017. [Link]
  • Pham, Tung, et al. "Fast support vector clustering." Vietnam Journal of Computer Science 4 (2017): 13-21. [Link]
  • Pham, Tung, Trung Le, and Hang Dang. "Scalable support vector clustering using budget." arXiv preprint arXiv:1709.06444 (2017).

Probabilistic Graphical Models and associated optimization techniques

  • Blei, David M. Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Application 1 (2014): 203-232. [Link]
  • Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American Statistical Association 112.518 (2017): 859-877. [Link]
  • Dieng, Adji Bousso. Deep Probabilistic Graphical Modeling. Columbia University, 2020. [Link]
  • Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. "Implicit reparameterization gradients." Advances in neural information processing systems 31 (2018). [Link]
  • Gelman, Andrew, Xiao-Li Meng, and Hal Stern. "Posterior predictive assessment of model fitness via realized discrepancies." Statistica sinica (1996): 733-760. [Link]
  • Kim, Kyurae, et al. "Black-Box Variational Inference Converges." arXiv preprint arXiv:2305.15349 (2023). [Link]

Statistics

Bayesian Statistics
  • Clarke, Bertrand, and Yuling Yao. "A Cheat Sheet for Bayesian Prediction." arXiv preprint arXiv:2304.12218 (2023). [Link]
Causality
  • Assaad, Charles K., Emilie Devijver, and Eric Gaussier. "Survey and evaluation of causal discovery methods for time series." Journal of Artificial Intelligence Research 73 (2022): 767-819. [Link]
Distributions
  • Leemis, Lawrence M., and Jacquelyn T. McQueston. "Univariate distribution relationships." The American Statistician 62.1 (2008): 45-53. [Paper] [Website].
  • Olszewski, Adrian. Challenging the cult of the prevalent normal distribution in nature. 2KMM, 2023. [Link]
Statistical hypothesis testing (NHST)
  • Gelman, Andrew. “Commentary: P Values and Statistical Practice.” Epidemiology, vol. 24, no. 1, 2013, pp. 69–72. JSTOR. Accessed 10 Dec. 2023. [Link]
  • Greenland, Sander et al. “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.” European journal of epidemiology vol. 31,4 (2016): 337-50. doi:10.1007/s10654-016-0149-3 [Link]
  • Lakens, Daniël. “Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses.” Social psychological and personality science vol. 8,4 (2017): 355-362. doi:10.1177/1948550617697177 [Link]
  • Lin, Mingfeng, et al. “Research Commentary: Too Big to Fail: Large Samples and the p-Value Problem.” Information Systems Research, vol. 24, no. 4, 2013, pp. 906–17. JSTOR. Accessed 10 Dec. 2023. [Link]
  • Lumley, Thomas et al. “The importance of the normality assumption in large public health data sets.” Annual review of public health vol. 23 (2002): 151-69. doi:10.1146/annurev.publhealth.23.100901.140546 [Link]
  • Mohd Razali, Nornadiah, and Bee Yap. ‘Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests’. J. Stat. Model. Analytics, vol. 2, 01 2011. [Link]
  • Morey, Richard D et al. “The fallacy of placing confidence in confidence intervals.” Psychonomic bulletin & review vol. 23,1 (2016): 103-23. doi:10.3758/s13423-015-0947-8 [Link]
  • Olzsewski, Adrian. Mann-Whitney (Wilcoxon) and Kruskal-Wallis FAIL to compare medians in general. Quantile regression should be used to compare medians instead. [Link]
  • Olszewski, Adrian. On the p-values - links library significance ditching. Adrian Olszewski, 2022. [Link]
  • Olzsewski, Adrian. Testing hypotheses through statistical models opens a universe of new possibilities. Learn how to improve your daily work with this approach. [Link]Pernet, Cyril. “Null hypothesis significance testing: a short tutorial.” F1000Research vol. 4 621. 25 Aug. 2015, doi:10.12688/f1000research.6963.3 [Link]
  • Serdar, Ceyhan Ceran et al. “Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies.” Biochemia medica vol. 31,1 (2021): 010502. doi:10.11613/BM.2021.010502 [Link]
  • The American Statistician, Volume 73, Issue sup1 (2019) [Link]
  • Verhagen, Arianne P., et al. ‘Is the p Value Really so Significant?*’. Australian Journal of Physiotherapy, vol. 50, no. 4, 2004, pp. 261–262. [Link]

Posts and threads

Bayesian Statistics

  • Camara-Escudero, Mauro. Variational Auto-Encoders and the Expectation-Maximization Algorithm. Mauro Camara-Escudero, 2020. [Link]
  • Patacchiola, Massimiliano. Evidence, KL-divergence, and ELBO. Massimiliano Patacchiola, 2021. [Link]
  • Yao, Yuling. Bayes is guaranteed to overfit, for any model, any prior, and every data point. Yuling Yao, 2023. [Link]

General topics

  • Harrell, Frank. Classification vs. Prediction. Statistical Thinking, 2017. [Link]

Variable selection / Feature selection

Talks, conferences, and videos

Bayesian Statistics

  • Chopin, Nicolas, et al. "Bayesian Causal Inference for Real World Interactive Systems." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. [Link]
  • Jordan, Michael. Nonparametric Bayesian Methods: Models, Algorithms, and Applications II. UC Berkeley, 2017 [Link]
  • Maxim Kochurov. State of Bayes Lecture Series. PyMC Labs, 2023. [Link]
  • Pragmatic Data Scientists. Making Informed Decisions with Bayesianism: A Conversation with Kenneth, Statistician at Meta. Pragmatic Data Scientist, 2023. [Link]

Stochastic Processes

  • Hakenes, Hendrik. Ito's Lemma -- Some intuitive explanations on the solution of stochastic differential equations. University of Bonn, 2021. [Link]

📄 Text Mining and Natural Language Processing

Books

  • Silge, Julia, and David Robinson. Text mining with R: A tidy approach. " O'Reilly Media, Inc.", 2017. [Link]

Courses and lecture notes, posts

Datasets

  • Horwood, Ghraham V. Humanitarian Assistance and Disaster Relief (HA/DR) Articles and Lexicon. V1, Harvard Dataverse, 2017, doi:10.7910/DVN/TGOPRU. [Link]

Packages

Papers

  • Goldberg, Yoav. "A primer on neural network models for natural language processing." Journal of Artificial Intelligence Research 57 (2016): 345-420. [Link]
  • Minaee, Shervin, et al. "Large Language Models: A Survey." arXiv preprint arXiv:2402.06196 (2024). [Link]

Posts and threads

Talks, conferences, and videos

About

Une liste de ressources sur tout ce qui touche à la prise de décision : vidéos, tutoriels, livres, documents, thèses, articles, datasets et libs open source.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published