OCI AI Blueprints

Deploy, scale, and monitor AI workloads with the OCI AI Blueprints platform, and reduce your GPU onboarding time from weeks to minutes.

OCI AI Blueprints is a streamlined, no-code solution for deploying and managing Generative AI workloads on Kubernetes Engine (OKE). By providing opinionated hardware recommendations, pre-packaged software stacks, and out-of-the-box observability tooling, OCI AI Blueprints helps you get your AI applications running quickly and efficiently—without wrestling with the complexities of infrastructure decisions, software compatibility, and MLOps best practices.

Getting Started

Install OCI AI Blueprints by clicking on the button below:

Blueprints

Blueprints go beyond basic Terraform templates. Each blueprint:

Offers validated hardware suggestions (e.g., optimal shapes, CPU/GPU configurations),
Includes end-to-end application stacks customized for different GenAI use cases, and
Comes with monitoring, logging, and auto-scaling configured out of the box.

After you install OCI AI Blueprints to an OKE cluster in your tenancy, you can deploy these pre-built blueprints:

Blueprint	Description
LLM & VLM Inference with vLLM	Deploy Llama 2/3/3.1 7B/8B models using NVIDIA GPU shapes and the vLLM inference engine with auto-scaling.
Fine-Tuning Benchmarking	Run MLCommons quantized Llama-2 70B LoRA finetuning on A100 for performance benchmarking.
LoRA Fine-Tuning	LoRA fine-tuning of custom or HuggingFace models using any dataset. Includes flexible hyperparameter tuning.
Health Check	Comprehensive evaluation of GPU performance to ensure optimal hardware readiness before initiating any intensive computational workload.
CPU Inference	Leverage Ollama to test CPU-based inference with models like Mistral, Gemma, and more.
Multi-node Inference with RDMA and vLLM	Deploy Llama-405B sized LLMs across multiple nodes with RDMA using H100 nodes with vLLM and LeaderWorkerSet.
Autoscaling Inference with vLLM	Serve LLMs with auto-scaling using KEDA, which scales to multiple GPUs and nodes using application metrics like inference latency.
LLM Inference with MIG	Deploy LLMs to a fraction of a GPU with Nvidia’s multi-instance GPUs and serve them with vLLM.
Job Queuing	Take advantage of job queuing and enforce resource quotas and fair sharing between teams.

Support & Contact

If you have any questions, issues, or feedback, contact [email protected] or [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
cluster_creation_terraform		cluster_creation_terraform
docs		docs
oci_ai_blueprints_terraform		oci_ai_blueprints_terraform
.gitignore		.gitignore
GETTING_STARTED_README.md		GETTING_STARTED_README.md
INSTALLING_ONTO_EXISTING_CLUSTER_README.md		INSTALLING_ONTO_EXISTING_CLUSTER_README.md
LICENSE.txt		LICENSE.txt
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCI AI Blueprints

Table of Contents

Getting Started

Blueprints

Support & Contact

About

Uh oh!

Releases 21

Packages

Uh oh!

Contributors 5

Languages

License

oracle-quickstart/oci-ai-blueprints

Folders and files

Latest commit

History

Repository files navigation

OCI AI Blueprints

Table of Contents

Getting Started

Blueprints

Support & Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors 5

Languages

Packages