Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 26 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,42 @@
## Introduction
This repo showcases different ways NVIDIA NIMs can be deployed. This repo contains reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments.
This repo is intended to aggregate and showcase different ways NVIDIA NIMs can be deployed. It contains reference implementations, deployment guides, examples, and architecture guidance that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments. Many of the most common NIM deployment and lifecycle scenarios addressed here may be addressed by capabilities afforded by the [NVIDIA NIM Operator](https://github.com/NVIDIA/k8s-nim-operator) as it progresses.

> **Note**
> The content in this repository is designed to provide reference architectures and best-practices for production-grade deployments and product integrations; however the code is not validated on all platforms and does not come with any level of enterprise support. While the deployments should perform well, please treat this codebase as experimental and a collaborative sandbox. For long-term production deployments that require enterprise support from NVIDIA, looks to the official releases on [NVIDIA NGC](https://ngc.nvidia.com/) which are based on the code in this repo.

# Deployment Options

| Category | Deployment Option | Description |
**Tools & Guides**
| Category | Type | Description |
|------------------------------------|-------------------------------------------------------------|-------------|
| **On-premise Deployments** | **Helm** | |
| | | [LLM NIM](https://github.com/NVIDIA/nim-deploy/tree/main/helm/nim-llm) | |
| | | LLM NIM on OpenShift Container Platform (coming soon) | |
| | **Open Source Platforms** | |
| | | [KServe](https://github.com/NVIDIA/nim-deploy/tree/main/kserve) | |
| | **Independent Software Vendors** | |
| | | Run.ai (coming soon) | |
| **Cloud Service Provider Deployments** | **Azure** | |
| | | [AKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) | |
| | | [Azure ML](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) | |
| | | [Azure prompt flow](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/promptflow) | |
| | **Amazon Web Services** | |
| | | [EKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks) | |
| | | [Amazon SageMaker](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/sagemaker) | |
| | **Google Cloud Platform** | |
| | | [GKE Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/gke) | |
| | | [Google Cloud Vertex AI](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/vertexai/python) | |
| | | [Cloud Run](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/cloudrun) | |
| | **NVIDIA DGX Cloud** | |
| | | [NVIDIA Cloud Functions](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/nvidia/nvcf) | |
| **Documents** | **Deployment Guide** | |
| | | [Hugging Face NIM Deployment](https://github.com/NVIDIA/nim-deploy/tree/main/docs/hugging-face-nim-deployment) | |
| Open Source | Helm Chart(s) | [LLM NIM](https://github.com/NVIDIA/nim-deploy/tree/main/helm/nim-llm) | |
| Open Source Platform | Deployment Guide | [KServe](https://github.com/NVIDIA/nim-deploy/tree/main/kserve) | |
| Commercial Platform | Deployment Guide | [Run.ai](https://github.com/NVIDIA/nim-deploy/tree/main/docs/runai) | |
| Commercial Platform | Deployment Guide | [Hugging Face NIM Deployment](https://github.com/NVIDIA/nim-deploy/tree/main/docs/hugging-face-nim-deployment) | |
| | | LLM NIM on OpenShift Container Platform (coming soon) | |

**Managed Cloud Services**

| Service | Type | Description |
|------------------------------------|-------------------------------------------------------------|-------------|
| | | |
| Microsoft Azure | Deployment Guide | [AKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) | |
| Microsoft Azure | Deployment Guide | [Azure ML](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) | |
| Microsoft Azure | Deployment Guide | [Azure prompt flow](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/promptflow) | |
| | | |
| Amazon Web Services | Deployment Guide | [EKS Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/eks) | |
| Amazon Web Services | Deployment Guide | [Amazon SageMaker](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/aws/sagemaker) | |
| | | |
| Google Cloud Platform | Deployment Guide | [GKE Managed Kubernetes](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/gke) | |
| Google Cloud Platform | Deployment Guide | [Google Cloud Vertex AI](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/vertexai/python) | |
| Google Cloud Platform | Deployment Guide | [Cloud Run](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/google-cloud/cloudrun) | |
| | | |
| NVIDIA DGX Cloud | Deployment Guide | [NVIDIA Cloud Functions](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/nvidia/nvcf) | |
| | | |

## Contributions
Contributions are welcome. Developers can contribute by opening a [pull request](https://help.github.com/en/articles/about-pull-requests) and agreeing to the terms in [CONTRIBUTING.MD](CONTRIBUTING.MD).


## Support and Getting Help

Please open an issue on the GitHub project for any questions. All feedback is appreciated, issues, requested features, and new deployment scenarios included.
81 changes: 81 additions & 0 deletions docs/runai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Deploy NVIDIA NIM microservices on RunAI

This document describes the procedure for deploying NIM Microservice employing helm on a RunAI cluster.

## Prerequisites
1. A conformant Kubernetes cluster ([RunAI K8s requirements](https://docs.run.ai/latest/admin/overview-administrator/))
2. RunAI installed (version \>= 2.18)
3. [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) installed
4. General NIM requirements: [NIM Prerequisites](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#prerequisites)
5. [Helm](https://helm.sh/docs/) installed locally

## Integration features

| Feature | Exists |
|------------------------------------|--------------------|
| Deploy through helm CLI | :white_check_mark: |
| Engine capabilities (Scheduling) | :white_check_mark: |
| Visibility (UI + CLI) | :white_check_mark: |
| Submit through RunAI Workload API | |
| Submit through RunAI UI | |

## Preparation

The following initial steps are required:

### RunAI

1. Create or select an existing project to deploy the NIM within - for example: `team-a`
2. Enforce RunAI Scheduler in the project's namespace: `Kubectl annotate ns runai-team-a runai/enforce-scheduler-name=true` For additional background see the [RunAI Documentation](https://docs.run.ai/v2.18/admin/runai-setup/config/default-scheduler/)

### NVIDIA NGC
1. Create API Key: please follow the guidance in the [NVIDIA NIM Getting Started](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#option-2-from-ngc) documentation to generate a properly scoped API key if you haven't already. For illustration purposes the generated key will be indicated as `XXXYYYZZZ` below.
2. Add NIM Helm repository to deploy NIM charts: `helm repo add nemo-ms "https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants" --username=\$oauthtoken --password=XXXYYYZZZ`
3. Create docker registry secret to pull NIM images: `kubectl create secret docker-registry -n runai-team-a registry-secret --docker-username=\$oauthtoken --docker-password=XXXYYYZZZ`
4. Create docker secret to pull models: `kubectl create secret generic ngc-api -n runai-team-a --from-literal=NGC_CLI_API_KEY=XXXYYYZZZ`

## Deployment

For any given NIM you desire to deploy, prepare the values.yaml file (changing as needed)
```
initContainers:
ngcInit:
imageName: nvcr.io/ohlfw0olaadg/ea-participants/nim_llm
imageTag: 24.06
secretName: ngc-api
env:
STORE_MOUNT_PATH: /model-store
NGC_CLI_ORG: ohlfw0olaadg
NGC_CLI_TEAM: ea-participants
NGC_MODEL_NAME: llama2-13b-chat
NGC_MODEL_VERSION: a100x2_fp16_24.06
NGC_EXE: ngc
DOWNLOAD_NGC_CLI: "true"
NGC_CLI_VERSION: "3.34.1"
MODEL_NAME: llama2-13b-chat

image:
repository: nvcr.io/ohlfw0olaadg/ea-participants/nim_llm
tag: 24.06

imagePullSecrets:
- name: registry-secret

model:
numGpus: 2
name: llama2-13b-chat
openai_port: 9999
```

Run the following command:
```
helm -n runai-team-a install llama2-13b-chat-nim nemo-ms/nemollm-inference -f values.yaml
```
> [!Important]
- The namespace we deploy the helm chart is the RunAI Project namespace (runai-team-a)
- For other models consult the [NVIDIA NIM Supported Models](https://docs.nvidia.com/nim/large-language-models/latest/support-matrix.html#supported-models) matrix

# View the model within the RunAI UI

![nim on runai screenshot](runai_nim.png)

Binary file added docs/runai/runai_nim.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.