ml-agent-skills

a collection of machine learning skills for ai agents. this repository provides structured instructions (SKILL.md files) that enable ai agents to perform end-to-end ml workflows correctly.

philosophy

"teach the agent, don't script it."

SKILL.md files contain senior data scientist knowledge: best practices, code patterns, and decision rules
agents write their own code following the instructions
flexible and adaptable to any codebase or context

quick start

# clone the repository
git clone https://github.com/your-username/ml-agent-skills.git
cd ml-agent-skills

# install python dependencies
pip install pandas numpy scikit-learn xgboost matplotlib seaborn joblib

skills overview

skill	purpose	what it teaches
ml-eda-viz	exploratory data analysis	distributions, correlations, leakage detection
ml-data-prep	data cleaning & splitting	stratified splits, imputation, encoding
ml-train-tabular	model training	pipelines, cross-validation, early stopping
ml-evaluate	model evaluation	metrics selection, threshold tuning, diagnostics

recommended workflow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  ml-eda-viz │ ──▶ │ml-data-prep │ ──▶ │ml-train-    │ ──▶ │ ml-evaluate │
│  (explore)  │     │  (clean &   │     │  tabular    │     │   (test)    │
│             │     │   split)    │     │  (train)    │     │             │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

for ai agents

if you're an ai agent (claude code, cursor, etc.), read the AGENTS.md file for:

workflow order and trigger phrases
critical rules to prevent data leakage
output conventions

each skill folder contains a SKILL.md with:

detailed best practices and code patterns
example code you can adapt
common pitfalls to avoid
checklists to verify correct implementation

best practices embedded

this repository encodes senior data scientist knowledge:

data leakage prevention: split before computing any statistics
stratified splitting: preserve class distributions (default 70/15/15)
cross-validation: use stratified k-fold within training
pipelines: preprocessing inside cv, not before
early stopping: prevent overfitting in boosting models
proper evaluation: never evaluate on training data; use f1/roc-auc for imbalanced datasets
reproducibility: always set random_state

requirements

python 3.10+
pandas, numpy, scikit-learn, xgboost, matplotlib, seaborn, joblib

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-agent-skills

philosophy

quick start

skills overview

recommended workflow

for ai agents

best practices embedded

requirements

license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ml-agent-skills

philosophy

quick start

skills overview

recommended workflow

for ai agents

best practices embedded

requirements

license

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages