A repository to consolidate my explorations and learnings in the field of artificial intelligence.
- CNN: Backpropagation Applied to Handwritten Zip Code Recognition
- AlexNet: ImageNet Classification with Deep Convolutional Neural Networks
- Transformer Architecture: Attention Is All You Need
- Tensor Parallelism (TP)
- Sequence/Context Parallelism (SP/CP)
- Expert Parallelism (EP)
- Pipeline Parallelism (PP)
- Data Parallelism (DP)
- Fully Sharded Data Parallelism (FSDP)
- Zero Redundancy Optimizer (ZeRO)
- Supervised Fine-Tuning (SFT)
- Parameter-Efficient Fine-Tuning (PEFT)
- Low-Rank Adaptation (LoRA)
- Reinforcement Learning (RL)
- Reinforcement Learning from Human Feedback (RLHF)
- Proximal Policy Optimization (PPO)
- Group Relative Policy Optimization (GRPO)
- Direct Preference Optimization (DPO)
- KV Cache
- PagedAttention: Efficient Memory Management for Large Language Model Serving with PagedAttention
- Speculative Decoding: Speculative Decoding for Efficient LLM Inference, Better & Faster Large Language Models via Multi-token Prediction
- Batching
- Quantization
- Effective context engineering for AI agents
- Context Engineering
- Context Engineering 2.0: The Context of Context Engineering
- Context Engineering Guide
- Context Engineering for AI Agents: Lessons from Building Manus
- A Comprehensive Mechanistic Interpretability Explainer & Glossary
- Neuronpedia
- Circuit-Tracer
- SAELens
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
- Tiny Recursive Models (TRS): TinyRecursiveModels
- MuonClip Optimizer: Kimi-K2
- Linear Attention: Kimi-Linear
- Sparse Attention: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- Compressed Attention (MLA): FlashMLA, DeepSeek-V2
- Attention Residuals: Attention-Residuals
- State Space Models (SSM): Mamba
- Joint Embedding Predictive Architecture (JEPA): JEPA, I-JEPA
- Ouro: Scaling Latent Reasoning via Looped Language Models
- 1-bit LLMs: BitNet
- Repeat Your Self (RYS): LLM Neuroanatomy: How I Topped the AI Leaderboard Without Changing a Single Weight