This repository contains code for the O'Reilly Live Online Training for Deploying GPT & LLMs
This course is designed to equip software engineers, data scientists, and machine learning professionals with the skills and knowledge needed to deploy AI models effectively in production environments. As AI continues to revolutionize industries, the ability to deploy, manage, and optimize AI applications at scale is becoming increasingly crucial. This course covers the full spectrum of deployment considerations, from leveraging cutting-edge tools like Kubernetes, llama.cpp, and GGUF, to mastering cost management, compute optimization, and model quantization.
Introduction to LLMs and Prompting
-
Introduction to 3rd Party Providers - Using Together.ai, HuggingFace, and Groq to run LLMs
-
Prompt Injection Examples - See how three kinds of prompt innjection attacks can attempt to jailbreak an LLM
Cleaning Data and Monitoring Drift
-
Cleaning Data using Deep Learning - Using AUM and Cosine Similarity to clean data
-
Combating AI drift - Using Online Learning to combat drift
Evaluating Agents
- Evaluating AI Agents: Task Automation and Tool Integration - A basic case study on tool selection accuracy
- Positional Bias on Agent Response Evaluation - Identifying and evaluating positional bias on multiple LLMs
Advanced Deployment Techniques
-
Speculative Decoding - Using an assistant model to aid token decoding
-
Prompt Caching Llama 3 - Replicating prompt caching with HuggingFace tools
-
Distilling BERT - Distilling models to optimize for speed/memory
-
Quantizing Llama-3 dynamically - Using bitsandbytes to quantize nearly any LLM on HuggingFace
-
Working with GGUF (no GPU) - Using Llama.cpp to work with models
-
Working with GGUF (with a GPU) - Using Llama.cpp to work with models
-
DeepSeek Model on GGUF - Running a DeepSeek Distilled Llama model using Llama.cpp
-
See this directory for a K8s demo of using embedding models and Llama 3 with GGUF on a GPU
-
Fine-Tuning LLMs
-
Prompt Engineering
-
RAG
Sinan Ozdemir is the Founder and CTO of LoopGenius where he uses State of the art AI to help people create and run their businesses. Sinan is a former lecturer of Data Science at Johns Hopkins University and the author of multiple textbooks on data science and machine learning. Additionally, he is the founder of the recently acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a master’s degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco, CA.