Skip to content

marklikesyou/ML-Agent-Skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ml-agent-skills

a collection of machine learning skills for ai agents. this repository provides structured instructions (SKILL.md files) that enable ai agents to perform end-to-end ml workflows correctly.

philosophy

"teach the agent, don't script it."

  • SKILL.md files contain senior data scientist knowledge: best practices, code patterns, and decision rules
  • agents write their own code following the instructions
  • flexible and adaptable to any codebase or context

quick start

# clone the repository
git clone https://github.com/your-username/ml-agent-skills.git
cd ml-agent-skills

# install python dependencies
pip install pandas numpy scikit-learn xgboost matplotlib seaborn joblib

skills overview

skill purpose what it teaches
ml-eda-viz exploratory data analysis distributions, correlations, leakage detection
ml-data-prep data cleaning & splitting stratified splits, imputation, encoding
ml-train-tabular model training pipelines, cross-validation, early stopping
ml-evaluate model evaluation metrics selection, threshold tuning, diagnostics

recommended workflow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  ml-eda-viz │ ──▶ │ml-data-prep │ ──▶ │ml-train-    │ ──▶ │ ml-evaluate │
│  (explore)  │     │  (clean &   │     │  tabular    │     │   (test)    │
│             │     │   split)    │     │  (train)    │     │             │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

for ai agents

if you're an ai agent (claude code, cursor, etc.), read the AGENTS.md file for:

  • workflow order and trigger phrases
  • critical rules to prevent data leakage
  • output conventions

each skill folder contains a SKILL.md with:

  • detailed best practices and code patterns
  • example code you can adapt
  • common pitfalls to avoid
  • checklists to verify correct implementation

best practices embedded

this repository encodes senior data scientist knowledge:

  • data leakage prevention: split before computing any statistics
  • stratified splitting: preserve class distributions (default 70/15/15)
  • cross-validation: use stratified k-fold within training
  • pipelines: preprocessing inside cv, not before
  • early stopping: prevent overfitting in boosting models
  • proper evaluation: never evaluate on training data; use f1/roc-auc for imbalanced datasets
  • reproducibility: always set random_state

requirements

  • python 3.10+
  • pandas, numpy, scikit-learn, xgboost, matplotlib, seaborn, joblib

license

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors