From 0386df97c05448930d2d35d6c17a412db6ea783d Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Tue, 13 May 2025 13:53:48 -0500 Subject: [PATCH 1/6] add NLP workshop --- index.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/index.md b/index.md index a3246178..f0a52968 100644 --- a/index.md +++ b/index.md @@ -25,6 +25,9 @@ The [Introduction to machine learning in Python with scikit-learn lesson](https: introduces practical machine learning using Python. It is a good lesson to follow in preparation for this lesson, since basic knowledge of machine learning and Python programming skills are required for this lesson. +#### [Introduction to text analysis and natural language processing in Python](https://carpentries-incubator.github.io/python-text-analysis/index.html) +This lesson provides a practical introduction to working with unstructured text data, such as survey responses, clinical notes, academic papers, or historical documents. It covers key natural language processing (NLP) techniques including preprocessing, tokenization, feature extraction (e.g., bag-of-words and TF-IDF), and basic topic modeling. Learners will also be introduced to Word2Vec and simple neural networks in the context of language modeling. The skills taught in this lesson offer a strong foundation for more advanced topics such as knowledge extraction, working with large text corpora, and building applications that involve large language models (LLMs). + :::::::::::::::::: checklist ## Prerequisites From 021652afe773d1ac507e5876907c51a7f7b73115 Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Tue, 13 May 2025 14:00:03 -0500 Subject: [PATCH 2/6] add trustworthy AI --- index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/index.md b/index.md index f0a52968..c0abb4a0 100644 --- a/index.md +++ b/index.md @@ -26,7 +26,10 @@ introduces practical machine learning using Python. It is a good lesson to follo since basic knowledge of machine learning and Python programming skills are required for this lesson. #### [Introduction to text analysis and natural language processing in Python](https://carpentries-incubator.github.io/python-text-analysis/index.html) -This lesson provides a practical introduction to working with unstructured text data, such as survey responses, clinical notes, academic papers, or historical documents. It covers key natural language processing (NLP) techniques including preprocessing, tokenization, feature extraction (e.g., bag-of-words and TF-IDF), and basic topic modeling. Learners will also be introduced to Word2Vec and simple neural networks in the context of language modeling. The skills taught in this lesson offer a strong foundation for more advanced topics such as knowledge extraction, working with large text corpora, and building applications that involve large language models (LLMs). +This lesson provides a practical introduction to working with unstructured text data, such as survey responses, clinical notes, academic papers, or historical documents. It covers key natural language processing (NLP) techniques including preprocessing, tokenization, feature extraction (e.g., TF-IDF, word2vec, and BERT), and basic topic modeling. The skills taught in this lesson offer a strong foundation for more advanced topics such as knowledge extraction, working with large text corpora, and building applications that involve large language models (LLMs). + +#### [Trustworthy AI: Validity, fairness, explainability, and uncertainty assessments](https://carpentries-incubator.github.io/fair-explainable-ml/index.html) +This lesson introduces tools and practices for building and evaluating machine learning models that are fair, transparent, and reliable across multiple data types, including tabular data, text, and images. Learners explore model evaluation, fairness audits, explainability methods (such as linear probes and GradCAM), and strategies for handling uncertainty and detecting out-of-distribution (OOD) data. It is especially relevant for researchers working with NLP, computer vision, or structured data who are interested in integrating ethical and reproducible ML practices into their workflows—including those working with large language models (LLMs) or planning to release models for public or collaborative use. :::::::::::::::::: checklist From 56f213233665683bdf6705a80eb190376c83bb63 Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Tue, 13 May 2025 14:04:50 -0500 Subject: [PATCH 3/6] add AWS --- index.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/index.md b/index.md index c0abb4a0..a17ed232 100644 --- a/index.md +++ b/index.md @@ -31,6 +31,9 @@ This lesson provides a practical introduction to working with unstructured text #### [Trustworthy AI: Validity, fairness, explainability, and uncertainty assessments](https://carpentries-incubator.github.io/fair-explainable-ml/index.html) This lesson introduces tools and practices for building and evaluating machine learning models that are fair, transparent, and reliable across multiple data types, including tabular data, text, and images. Learners explore model evaluation, fairness audits, explainability methods (such as linear probes and GradCAM), and strategies for handling uncertainty and detecting out-of-distribution (OOD) data. It is especially relevant for researchers working with NLP, computer vision, or structured data who are interested in integrating ethical and reproducible ML practices into their workflows—including those working with large language models (LLMs) or planning to release models for public or collaborative use. +#### [Intro to AWS SageMaker for predictive ML/AI](https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/index.html) +This lesson focuses on training and tuning neural networks (and other ML models) using Amazon SageMaker, and is a natural next step for learners who've outgrown local setups. If your deep learning models are becoming too large or slow to run on a laptop, SageMaker provides scalable infrastructure with access to GPUs and support for parallelized hyperparameter tuning. Participants learn to use SageMaker notebooks to manage data via S3, launch training jobs, monitor compute usage, and keep experiments cost-effective. While the examples center on small to mid-sized models, the workflow is directly applicable to scaling up deep learning and LLM-related experiments in research. + :::::::::::::::::: checklist ## Prerequisites From 287e0be6033f090b386d1d03569219fff8207d8b Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Tue, 13 May 2025 14:09:16 -0500 Subject: [PATCH 4/6] fix headers/link formatting to match pre-existing examples --- index.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/index.md b/index.md index a17ed232..a47dc9d3 100644 --- a/index.md +++ b/index.md @@ -25,14 +25,14 @@ The [Introduction to machine learning in Python with scikit-learn lesson](https: introduces practical machine learning using Python. It is a good lesson to follow in preparation for this lesson, since basic knowledge of machine learning and Python programming skills are required for this lesson. -#### [Introduction to text analysis and natural language processing in Python](https://carpentries-incubator.github.io/python-text-analysis/index.html) -This lesson provides a practical introduction to working with unstructured text data, such as survey responses, clinical notes, academic papers, or historical documents. It covers key natural language processing (NLP) techniques including preprocessing, tokenization, feature extraction (e.g., TF-IDF, word2vec, and BERT), and basic topic modeling. The skills taught in this lesson offer a strong foundation for more advanced topics such as knowledge extraction, working with large text corpora, and building applications that involve large language models (LLMs). +#### Introduction to text analysis and natural language processing (NLP) in Python +The [Introduction to text analysis and natural language processing in Python](https://carpentries-incubator.github.io/python-text-analysis/index.html) lesson provides a practical introduction to working with unstructured text data, such as survey responses, clinical notes, academic papers, or historical documents. It covers key natural language processing (NLP) techniques including preprocessing, tokenization, feature extraction (e.g., TF-IDF, word2vec, and BERT), and basic topic modeling. The skills taught in this lesson offer a strong foundation for more advanced topics such as knowledge extraction, working with large text corpora, and building applications that involve large language models (LLMs). -#### [Trustworthy AI: Validity, fairness, explainability, and uncertainty assessments](https://carpentries-incubator.github.io/fair-explainable-ml/index.html) -This lesson introduces tools and practices for building and evaluating machine learning models that are fair, transparent, and reliable across multiple data types, including tabular data, text, and images. Learners explore model evaluation, fairness audits, explainability methods (such as linear probes and GradCAM), and strategies for handling uncertainty and detecting out-of-distribution (OOD) data. It is especially relevant for researchers working with NLP, computer vision, or structured data who are interested in integrating ethical and reproducible ML practices into their workflows—including those working with large language models (LLMs) or planning to release models for public or collaborative use. +#### Trustworthy AI: Validity, fairness, explainability, and uncertainty assessments +The [Trustworthy AI](https://carpentries-incubator.github.io/fair-explainable-ml/index.html) lesson introduces tools and practices for building and evaluating machine learning models that are fair, transparent, and reliable across multiple data types, including tabular data, text, and images. Learners explore model evaluation, fairness audits, explainability methods (such as linear probes and GradCAM), and strategies for handling uncertainty and detecting out-of-distribution (OOD) data. It is especially relevant for researchers working with NLP, computer vision, or structured data who are interested in integrating ethical and reproducible ML practices into their workflows—including those working with large language models (LLMs) or planning to release models for public or collaborative use. -#### [Intro to AWS SageMaker for predictive ML/AI](https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/index.html) -This lesson focuses on training and tuning neural networks (and other ML models) using Amazon SageMaker, and is a natural next step for learners who've outgrown local setups. If your deep learning models are becoming too large or slow to run on a laptop, SageMaker provides scalable infrastructure with access to GPUs and support for parallelized hyperparameter tuning. Participants learn to use SageMaker notebooks to manage data via S3, launch training jobs, monitor compute usage, and keep experiments cost-effective. While the examples center on small to mid-sized models, the workflow is directly applicable to scaling up deep learning and LLM-related experiments in research. +#### Intro to AWS SageMaker for predictive ML/AI +The [Intro to AWS SageMaker for predictive ML/AI](https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/index.html) lesson focuses on training and tuning neural networks (and other ML models) using Amazon SageMaker, and is a natural next step for learners who've outgrown local setups. If your deep learning models are becoming too large or slow to run on a laptop, SageMaker provides scalable infrastructure with access to GPUs and support for parallelized hyperparameter tuning. Participants learn to use SageMaker notebooks to manage data via S3, launch training jobs, monitor compute usage, and keep experiments cost-effective. While the examples center on small to mid-sized models, the workflow is directly applicable to scaling up deep learning and LLM-related experiments in research. :::::::::::::::::: checklist From d978106872432e551ffe8f4dc6aabff3d4eadd1a Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Wed, 14 May 2025 08:21:02 -0500 Subject: [PATCH 5/6] move prereqs before other related lessons --- index.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/index.md b/index.md index a47dc9d3..a884694d 100644 --- a/index.md +++ b/index.md @@ -14,6 +14,16 @@ We start with explaining the basic concepts of neural networks, and then go thro Learners will learn how to prepare data for deep learning, how to implement a basic deep learning model in Python with Keras, how to monitor and troubleshoot the training process and how to implement different layer types such as convolutional layers. +:::::::::::::::::: checklist + +## Prerequisites +Learners are expected to have the following knowledge: + +- Basic Python programming skills and familiarity with the Pandas package. +- Basic knowledge on machine learning, including the following concepts: Data cleaning, train & test split, type of problems (regression, classification), overfitting & underfitting, metrics (accuracy, recall, etc.). + +:::::::::::::::::::::::::::: + ### Other related lessons #### Introduction to artificial neural networks in Python The [Introduction to artificial neural networks in Python lesson](https://carpentries-incubator.github.io/machine-learning-neural-python/) @@ -34,16 +44,6 @@ The [Trustworthy AI](https://carpentries-incubator.github.io/fair-explainable-ml #### Intro to AWS SageMaker for predictive ML/AI The [Intro to AWS SageMaker for predictive ML/AI](https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/index.html) lesson focuses on training and tuning neural networks (and other ML models) using Amazon SageMaker, and is a natural next step for learners who've outgrown local setups. If your deep learning models are becoming too large or slow to run on a laptop, SageMaker provides scalable infrastructure with access to GPUs and support for parallelized hyperparameter tuning. Participants learn to use SageMaker notebooks to manage data via S3, launch training jobs, monitor compute usage, and keep experiments cost-effective. While the examples center on small to mid-sized models, the workflow is directly applicable to scaling up deep learning and LLM-related experiments in research. -:::::::::::::::::: checklist - -## Prerequisites -Learners are expected to have the following knowledge: - -- Basic Python programming skills and familiarity with the Pandas package. -- Basic knowledge on machine learning, including the following concepts: Data cleaning, train & test split, type of problems (regression, classification), overfitting & underfitting, metrics (accuracy, recall, etc.). - -:::::::::::::::::::::::::::: - ::: instructor ## We can help you out with teaching this lesson From 75a2e6d439e7f23d4c5195e961c74549e8d26f6d Mon Sep 17 00:00:00 2001 From: Chris Endemann Date: Wed, 14 May 2025 08:22:13 -0500 Subject: [PATCH 6/6] add spoiler for other related lessons --- index.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/index.md b/index.md index a884694d..9b8374b8 100644 --- a/index.md +++ b/index.md @@ -24,6 +24,8 @@ Learners are expected to have the following knowledge: :::::::::::::::::::::::::::: +::: spoiler + ### Other related lessons #### Introduction to artificial neural networks in Python The [Introduction to artificial neural networks in Python lesson](https://carpentries-incubator.github.io/machine-learning-neural-python/) @@ -44,6 +46,8 @@ The [Trustworthy AI](https://carpentries-incubator.github.io/fair-explainable-ml #### Intro to AWS SageMaker for predictive ML/AI The [Intro to AWS SageMaker for predictive ML/AI](https://carpentries-incubator.github.io/ML_with_AWS_SageMaker/index.html) lesson focuses on training and tuning neural networks (and other ML models) using Amazon SageMaker, and is a natural next step for learners who've outgrown local setups. If your deep learning models are becoming too large or slow to run on a laptop, SageMaker provides scalable infrastructure with access to GPUs and support for parallelized hyperparameter tuning. Participants learn to use SageMaker notebooks to manage data via S3, launch training jobs, monitor compute usage, and keep experiments cost-effective. While the examples center on small to mid-sized models, the workflow is directly applicable to scaling up deep learning and LLM-related experiments in research. +::: + ::: instructor ## We can help you out with teaching this lesson