Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `api_base` | `HttpUrl` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
| `base_url` | `HttpUrl \| None` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com/openai/v1) |
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |

## Sample Configuration

```yaml
api_key: ${env.AZURE_API_KEY:=}
api_base: ${env.AZURE_API_BASE:=}
base_url: ${env.AZURE_API_BASE:=}
api_version: ${env.AZURE_API_VERSION:=}
api_type: ${env.AZURE_API_TYPE:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_cerebras.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `base_url` | `str` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
| `base_url` | `HttpUrl \| None` | No | https://api.cerebras.ai/v1 | Base URL for the Cerebras API |

## Sample Configuration

```yaml
base_url: https://api.cerebras.ai
base_url: https://api.cerebras.ai/v1
api_key: ${env.CEREBRAS_API_KEY:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_databricks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Databricks inference provider for running models on Databricks' unified analytic
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `SecretStr \| None` | No | | The Databricks API token |
| `url` | `str \| None` | No | | The URL for the Databricks model serving endpoint |
| `base_url` | `HttpUrl \| None` | No | | The URL for the Databricks model serving endpoint (should include /serving-endpoints path) |

## Sample Configuration

```yaml
url: ${env.DATABRICKS_HOST:=}
base_url: ${env.DATABRICKS_HOST:=}
api_token: ${env.DATABRICKS_TOKEN:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_fireworks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
| `base_url` | `HttpUrl \| None` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |

## Sample Configuration

```yaml
url: https://api.fireworks.ai/inference/v1
base_url: https://api.fireworks.ai/inference/v1
api_key: ${env.FIREWORKS_API_KEY:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_groq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://api.groq.com | The URL for the Groq AI server |
| `base_url` | `HttpUrl \| None` | No | https://api.groq.com/openai/v1 | The URL for the Groq AI server |

## Sample Configuration

```yaml
url: https://api.groq.com
base_url: https://api.groq.com/openai/v1
api_key: ${env.GROQ_API_KEY:=}
```
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `openai_compat_api_base` | `str` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
| `base_url` | `HttpUrl \| None` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |

## Sample Configuration

```yaml
openai_compat_api_base: https://api.llama.com/compat/v1/
base_url: https://api.llama.com/compat/v1/
api_key: ${env.LLAMA_API_KEY}
```
6 changes: 2 additions & 4 deletions docs/docs/providers/inference/remote_nvidia.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,13 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
| `base_url` | `HttpUrl \| None` | No | https://integrate.api.nvidia.com/v1 | A base url for accessing the NVIDIA NIM |
| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |
| `append_api_version` | `bool` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
| `rerank_model_to_url` | `dict[str, str]` | No | `{'nv-rerank-qa-mistral-4b:1': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking', 'nvidia/nv-rerankqa-mistral-4b-v3': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking', 'nvidia/llama-3.2-nv-rerankqa-1b-v2': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking'}` | Mapping of rerank model identifiers to their API endpoints. |

## Sample Configuration

```yaml
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
api_key: ${env.NVIDIA_API_KEY:=}
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_ollama.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Ollama inference provider for running local models through the Ollama runtime.
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `url` | `str` | No | http://localhost:11434 | |
| `base_url` | `HttpUrl \| None` | No | http://localhost:11434/v1 | |

## Sample Configuration

```yaml
url: ${env.OLLAMA_URL:=http://localhost:11434}
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
```
2 changes: 1 addition & 1 deletion docs/docs/providers/inference/remote_openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `base_url` | `str` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
| `base_url` | `HttpUrl \| None` | No | https://api.openai.com/v1 | Base URL for OpenAI API |

## Sample Configuration

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_passthrough.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Passthrough inference provider for connecting to any external inference service
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | | The URL for the passthrough endpoint |
| `base_url` | `HttpUrl \| None` | No | | The URL for the passthrough endpoint |

## Sample Configuration

```yaml
url: ${env.PASSTHROUGH_URL}
base_url: ${env.PASSTHROUGH_URL}
api_key: ${env.PASSTHROUGH_API_KEY}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_runpod.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ RunPod inference provider for running models on RunPod's cloud GPU platform.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `SecretStr \| None` | No | | The API token |
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
| `base_url` | `HttpUrl \| None` | No | | The URL for the Runpod model serving endpoint |

## Sample Configuration

```yaml
url: ${env.RUNPOD_URL:=}
base_url: ${env.RUNPOD_URL:=}
api_token: ${env.RUNPOD_API_TOKEN}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_sambanova.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ SambaNova inference provider for running models on SambaNova's dataflow architec
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
| `base_url` | `HttpUrl \| None` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |

## Sample Configuration

```yaml
url: https://api.sambanova.ai/v1
base_url: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_tgi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `url` | `str` | No | | The URL for the TGI serving endpoint |
| `base_url` | `HttpUrl \| None` | No | | The URL for the TGI serving endpoint (should include /v1 path) |

## Sample Configuration

```yaml
url: ${env.TGI_URL:=}
base_url: ${env.TGI_URL:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_together.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Together AI inference provider for open-source models and collaborative AI devel
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
| `base_url` | `HttpUrl \| None` | No | https://api.together.xyz/v1 | The URL for the Together AI server |

## Sample Configuration

```yaml
url: https://api.together.xyz/v1
base_url: https://api.together.xyz/v1
api_key: ${env.TOGETHER_API_KEY:=}
```
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_vllm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ Remote vLLM inference provider for connecting to vLLM servers.
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `SecretStr \| None` | No | | The API token |
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
| `base_url` | `HttpUrl \| None` | No | | The URL for the vLLM model serving endpoint |
| `max_tokens` | `int` | No | 4096 | Maximum number of tokens to generate. |
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |

## Sample Configuration

```yaml
url: ${env.VLLM_URL:=}
base_url: ${env.VLLM_URL:=}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/providers/inference/remote_watsonx.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `str` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
| `base_url` | `HttpUrl \| None` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
| `project_id` | `str \| None` | No | | The watsonx.ai project ID |
| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |

## Sample Configuration

```yaml
url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
base_url: ${env.WATSONX_BASE_URL:=https://us-south.ml.cloud.ibm.com}
api_key: ${env.WATSONX_API_KEY:=}
project_id: ${env.WATSONX_PROJECT_ID:=}
```
4 changes: 2 additions & 2 deletions scripts/docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -287,9 +287,9 @@ start_container() {
# On macOS/Windows, use host.docker.internal to reach host from container
# On Linux with --network host, use localhost
if [[ "$(uname)" == "Darwin" ]] || [[ "$(uname)" == *"MINGW"* ]]; then
OLLAMA_URL="${OLLAMA_URL:-http://host.docker.internal:11434}"
OLLAMA_URL="${OLLAMA_URL:-http://host.docker.internal:11434/v1}"
else
OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434/v1}"
fi
DOCKER_ENV_VARS="$DOCKER_ENV_VARS -e OLLAMA_URL=$OLLAMA_URL"

Expand Down
2 changes: 1 addition & 1 deletion scripts/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ cmd=( run -d "${PLATFORM_OPTS[@]}" --name llama-stack \
--network llama-net \
-p "${PORT}:${PORT}" \
"${server_env_opts[@]}" \
-e OLLAMA_URL="http://ollama-server:${OLLAMA_PORT}" \
-e OLLAMA_URL="http://ollama-server:${OLLAMA_PORT}/v1" \
"${SERVER_IMAGE}" --port "${PORT}")

log "🦙 Starting Llama Stack..."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,32 +17,32 @@ providers:
- provider_id: ${env.CEREBRAS_API_KEY:+cerebras}
provider_type: remote::cerebras
config:
base_url: https://api.cerebras.ai
base_url: https://api.cerebras.ai/v1
api_key: ${env.CEREBRAS_API_KEY:=}
- provider_id: ${env.OLLAMA_URL:+ollama}
provider_type: remote::ollama
config:
url: ${env.OLLAMA_URL:=http://localhost:11434}
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
- provider_id: ${env.VLLM_URL:+vllm}
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=}
base_url: ${env.VLLM_URL:=}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: ${env.TGI_URL:+tgi}
provider_type: remote::tgi
config:
url: ${env.TGI_URL:=}
base_url: ${env.TGI_URL:=}
- provider_id: fireworks
provider_type: remote::fireworks
config:
url: https://api.fireworks.ai/inference/v1
base_url: https://api.fireworks.ai/inference/v1
api_key: ${env.FIREWORKS_API_KEY:=}
- provider_id: together
provider_type: remote::together
config:
url: https://api.together.xyz/v1
base_url: https://api.together.xyz/v1
api_key: ${env.TOGETHER_API_KEY:=}
- provider_id: bedrock
provider_type: remote::bedrock
Expand All @@ -52,9 +52,8 @@ providers:
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
provider_type: remote::nvidia
config:
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
api_key: ${env.NVIDIA_API_KEY:=}
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
- provider_id: openai
provider_type: remote::openai
config:
Expand All @@ -76,18 +75,18 @@ providers:
- provider_id: groq
provider_type: remote::groq
config:
url: https://api.groq.com
base_url: https://api.groq.com/openai/v1
api_key: ${env.GROQ_API_KEY:=}
- provider_id: sambanova
provider_type: remote::sambanova
config:
url: https://api.sambanova.ai/v1
base_url: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY:=}
- provider_id: ${env.AZURE_API_KEY:+azure}
provider_type: remote::azure
config:
api_key: ${env.AZURE_API_KEY:=}
api_base: ${env.AZURE_API_BASE:=}
base_url: ${env.AZURE_API_BASE:=}
api_version: ${env.AZURE_API_VERSION:=}
api_type: ${env.AZURE_API_TYPE:=}
- provider_id: sentence-transformers
Expand Down
21 changes: 10 additions & 11 deletions src/llama_stack/distributions/ci-tests/run.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,32 +17,32 @@ providers:
- provider_id: ${env.CEREBRAS_API_KEY:+cerebras}
provider_type: remote::cerebras
config:
base_url: https://api.cerebras.ai
base_url: https://api.cerebras.ai/v1
api_key: ${env.CEREBRAS_API_KEY:=}
- provider_id: ${env.OLLAMA_URL:+ollama}
provider_type: remote::ollama
config:
url: ${env.OLLAMA_URL:=http://localhost:11434}
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
- provider_id: ${env.VLLM_URL:+vllm}
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=}
base_url: ${env.VLLM_URL:=}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: ${env.TGI_URL:+tgi}
provider_type: remote::tgi
config:
url: ${env.TGI_URL:=}
base_url: ${env.TGI_URL:=}
- provider_id: fireworks
provider_type: remote::fireworks
config:
url: https://api.fireworks.ai/inference/v1
base_url: https://api.fireworks.ai/inference/v1
api_key: ${env.FIREWORKS_API_KEY:=}
- provider_id: together
provider_type: remote::together
config:
url: https://api.together.xyz/v1
base_url: https://api.together.xyz/v1
api_key: ${env.TOGETHER_API_KEY:=}
- provider_id: bedrock
provider_type: remote::bedrock
Expand All @@ -52,9 +52,8 @@ providers:
- provider_id: ${env.NVIDIA_API_KEY:+nvidia}
provider_type: remote::nvidia
config:
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
api_key: ${env.NVIDIA_API_KEY:=}
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
- provider_id: openai
provider_type: remote::openai
config:
Expand All @@ -76,18 +75,18 @@ providers:
- provider_id: groq
provider_type: remote::groq
config:
url: https://api.groq.com
base_url: https://api.groq.com/openai/v1
api_key: ${env.GROQ_API_KEY:=}
- provider_id: sambanova
provider_type: remote::sambanova
config:
url: https://api.sambanova.ai/v1
base_url: https://api.sambanova.ai/v1
api_key: ${env.SAMBANOVA_API_KEY:=}
- provider_id: ${env.AZURE_API_KEY:+azure}
provider_type: remote::azure
config:
api_key: ${env.AZURE_API_KEY:=}
api_base: ${env.AZURE_API_BASE:=}
base_url: ${env.AZURE_API_BASE:=}
api_version: ${env.AZURE_API_VERSION:=}
api_type: ${env.AZURE_API_TYPE:=}
- provider_id: sentence-transformers
Expand Down
Loading
Loading