Deploying GPT & Large Language Models

This repository contains code for the O'Reilly Live Online Training for Deploying GPT & LLMs

This course is designed to equip software engineers, data scientists, and machine learning professionals with the skills and knowledge needed to deploy AI models effectively in production environments. As AI continues to revolutionize industries, the ability to deploy, manage, and optimize AI applications at scale is becoming increasingly crucial. This course covers the full spectrum of deployment considerations, from leveraging cutting-edge tools like Kubernetes, llama.cpp, and GGUF, to mastering cost management, compute optimization, and model quantization.

Base Notebooks

Introduction to LLMs and Prompting

Introduction to 3rd Party Providers - Using Together.ai, HuggingFace, and Groq to run LLMs
Prompt Injection Examples - See how three kinds of prompt innjection attacks can attempt to jailbreak an LLM

Cleaning Data and Monitoring Drift

Cleaning Data using Deep Learning - Using AUM and Cosine Similarity to clean data
Combating AI drift - Using Online Learning to combat drift

Evaluating Agents

Evaluating AI Agents: Task Automation and Tool Integration - A basic case study on tool selection accuracy
- Positional Bias on Agent Response Evaluation - Identifying and evaluating positional bias on multiple LLMs

Advanced Deployment Techniques

Speculative Decoding - Using an assistant model to aid token decoding
Prompt Caching Llama 3 - Replicating prompt caching with HuggingFace tools
Distilling BERT - Distilling models to optimize for speed/memory
Quantizing Llama-3 dynamically - Using bitsandbytes to quantize nearly any LLM on HuggingFace
Working with GGUF (no GPU) - Using Llama.cpp to work with models
Working with GGUF (with a GPU) - Using Llama.cpp to work with models
DeepSeek Model on GGUF - Running a DeepSeek Distilled Llama model using Llama.cpp
See this directory for a K8s demo of using embedding models and Llama 3 with GGUF on a GPU

More

Fine-Tuning LLMs
Prompt Engineering
- Introduction to Prompt Engineering
- Advanced to Prompt Engineering
RAG
- Semantic Search
- A basic RAG Bot using GPT and Pinecone

Instructor

Sinan Ozdemir is the Founder and CTO of LoopGenius where he uses State of the art AI to help people create and run their businesses. Sinan is a former lecturer of Data Science at Johns Hopkins University and the author of multiple textbooks on data science and machine learning. Additionally, he is the founder of the recently acquired Kylie.ai, an enterprise-grade conversational AI platform with RPA capabilities. He holds a master’s degree in Pure Mathematics from Johns Hopkins University and is based in San Francisco, CA.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
images		images
llama_cpp		llama_cpp
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deploying GPT & Large Language Models

Base Notebooks

More

Instructor

About

Releases

Packages

Languages

sinanuozdemir/oreilly-hands-on-gpt-llm

Folders and files

Latest commit

History

Repository files navigation

Deploying GPT & Large Language Models

Base Notebooks

More

Instructor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages