Skip to content

Commit d5cd0ee

Browse files
authored
feat!: standardize base_url for inference (#4177)
# What does this PR do? Completes #3732 by removing runtime URL transformations and requiring users to provide full URLs in configuration. All providers now use 'base_url' consistently and respect the exact URL provided without appending paths like /v1 or /openai/v1 at runtime. BREAKING CHANGE: Users must update configs to include full URL paths (e.g., http://localhost:11434/v1 instead of http://localhost:11434). Closes #3732 ## Test Plan Existing tests should pass even with the URL changes, due to default URLs being altered. Add unit test to enforce URL standardization across remote inference providers (verifies all use 'base_url' field with HttpUrl | None type) Signed-off-by: Charlie Doern <[email protected]>
1 parent 91f1b35 commit d5cd0ee

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+282
-227
lines changed

docs/docs/providers/inference/remote_azure.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
2424
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
2525
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
2626
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
27-
| `api_base` | `HttpUrl` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
27+
| `base_url` | `HttpUrl \| None` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com/openai/v1) |
2828
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
2929
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |
3030

3131
## Sample Configuration
3232

3333
```yaml
3434
api_key: ${env.AZURE_API_KEY:=}
35-
api_base: ${env.AZURE_API_BASE:=}
35+
base_url: ${env.AZURE_API_BASE:=}
3636
api_version: ${env.AZURE_API_VERSION:=}
3737
api_type: ${env.AZURE_API_TYPE:=}
3838
```

docs/docs/providers/inference/remote_cerebras.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `base_url` | `str` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
20+
| `base_url` | `HttpUrl \| None` | No | https://api.cerebras.ai/v1 | Base URL for the Cerebras API |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
base_url: https://api.cerebras.ai
25+
base_url: https://api.cerebras.ai/v1
2626
api_key: ${env.CEREBRAS_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_databricks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Databricks inference provider for running models on Databricks' unified analytic
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_token` | `SecretStr \| None` | No | | The Databricks API token |
20-
| `url` | `str \| None` | No | | The URL for the Databricks model serving endpoint |
20+
| `base_url` | `HttpUrl \| None` | No | | The URL for the Databricks model serving endpoint (should include /serving-endpoints path) |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: ${env.DATABRICKS_HOST:=}
25+
base_url: ${env.DATABRICKS_HOST:=}
2626
api_token: ${env.DATABRICKS_TOKEN:=}
2727
```

docs/docs/providers/inference/remote_fireworks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `str` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
20+
| `base_url` | `HttpUrl \| None` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: https://api.fireworks.ai/inference/v1
25+
base_url: https://api.fireworks.ai/inference/v1
2626
api_key: ${env.FIREWORKS_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_groq.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `str` | No | https://api.groq.com | The URL for the Groq AI server |
20+
| `base_url` | `HttpUrl \| None` | No | https://api.groq.com/openai/v1 | The URL for the Groq AI server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: https://api.groq.com
25+
base_url: https://api.groq.com/openai/v1
2626
api_key: ${env.GROQ_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_llama-openai-compat.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `openai_compat_api_base` | `str` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
20+
| `base_url` | `HttpUrl \| None` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
openai_compat_api_base: https://api.llama.com/compat/v1/
25+
base_url: https://api.llama.com/compat/v1/
2626
api_key: ${env.LLAMA_API_KEY}
2727
```

docs/docs/providers/inference/remote_nvidia.mdx

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,13 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `str` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
20+
| `base_url` | `HttpUrl \| None` | No | https://integrate.api.nvidia.com/v1 | A base url for accessing the NVIDIA NIM |
2121
| `timeout` | `int` | No | 60 | Timeout for the HTTP requests |
22-
| `append_api_version` | `bool` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
2322
| `rerank_model_to_url` | `dict[str, str]` | No | `{'nv-rerank-qa-mistral-4b:1': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking', 'nvidia/nv-rerankqa-mistral-4b-v3': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking', 'nvidia/llama-3.2-nv-rerankqa-1b-v2': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking'}` | Mapping of rerank model identifiers to their API endpoints. |
2423

2524
## Sample Configuration
2625

2726
```yaml
28-
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
27+
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
2928
api_key: ${env.NVIDIA_API_KEY:=}
30-
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
3129
```

docs/docs/providers/inference/remote_ollama.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ Ollama inference provider for running local models through the Ollama runtime.
1616
|-------|------|----------|---------|-------------|
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
19-
| `url` | `str` | No | http://localhost:11434 | |
19+
| `base_url` | `HttpUrl \| None` | No | http://localhost:11434/v1 | |
2020

2121
## Sample Configuration
2222

2323
```yaml
24-
url: ${env.OLLAMA_URL:=http://localhost:11434}
24+
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
2525
```

docs/docs/providers/inference/remote_openai.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `base_url` | `str` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
20+
| `base_url` | `HttpUrl \| None` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
2121

2222
## Sample Configuration
2323

docs/docs/providers/inference/remote_passthrough.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Passthrough inference provider for connecting to any external inference service
1717
| `allowed_models` | `list[str] \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `bool` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `str` | No | | The URL for the passthrough endpoint |
20+
| `base_url` | `HttpUrl \| None` | No | | The URL for the passthrough endpoint |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: ${env.PASSTHROUGH_URL}
25+
base_url: ${env.PASSTHROUGH_URL}
2626
api_key: ${env.PASSTHROUGH_API_KEY}
2727
```

0 commit comments

Comments
 (0)