Skip to content

Commit 69b4e0b

Browse files
committed
feat!: standardize base_url for inference
Completes #3732 by removing runtime URL transformations and requiring users to provide full URLs in configuration. All providers now use 'base_url' consistently and respect the exact URL provided without appending paths like /v1 or /openai/v1 at runtime. Add unit test to enforce URL standardization across remote inference providers (verifies all use 'base_url' field with HttpUrl | None type) BREAKING CHANGE: Users must update configs to include full URL paths (e.g., http://localhost:11434/v1 instead of http://localhost:11434). Signed-off-by: Charlie Doern <[email protected]>
1 parent 0128eff commit 69b4e0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+282
-227
lines changed

docs/docs/providers/inference/remote_azure.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
2424
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
2525
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
2626
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
27-
| `api_base` | `<class 'pydantic.networks.HttpUrl'>` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
27+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com/openai/v1) |
2828
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
2929
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |
3030

3131
## Sample Configuration
3232

3333
```yaml
3434
api_key: ${env.AZURE_API_KEY:=}
35-
api_base: ${env.AZURE_API_BASE:=}
35+
base_url: ${env.AZURE_API_BASE:=}
3636
api_version: ${env.AZURE_API_VERSION:=}
3737
api_type: ${env.AZURE_API_TYPE:=}
3838
```

docs/docs/providers/inference/remote_cerebras.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://api.cerebras.ai/v1 | Base URL for the Cerebras API |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
base_url: https://api.cerebras.ai
25+
base_url: https://api.cerebras.ai/v1
2626
api_key: ${env.CEREBRAS_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_databricks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Databricks inference provider for running models on Databricks' unified analytic
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_token` | `pydantic.types.SecretStr \| None` | No | | The Databricks API token |
20-
| `url` | `str \| None` | No | | The URL for the Databricks model serving endpoint |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | | The URL for the Databricks model serving endpoint (should include /serving-endpoints path) |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: ${env.DATABRICKS_HOST:=}
25+
base_url: ${env.DATABRICKS_HOST:=}
2626
api_token: ${env.DATABRICKS_TOKEN:=}
2727
```

docs/docs/providers/inference/remote_fireworks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: https://api.fireworks.ai/inference/v1
25+
base_url: https://api.fireworks.ai/inference/v1
2626
api_key: ${env.FIREWORKS_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_groq.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://api.groq.com/openai/v1 | The URL for the Groq AI server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: https://api.groq.com
25+
base_url: https://api.groq.com/openai/v1
2626
api_key: ${env.GROQ_API_KEY:=}
2727
```

docs/docs/providers/inference/remote_llama-openai-compat.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
openai_compat_api_base: https://api.llama.com/compat/v1/
25+
base_url: https://api.llama.com/compat/v1/
2626
api_key: ${env.LLAMA_API_KEY}
2727
```

docs/docs/providers/inference/remote_nvidia.mdx

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,13 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://integrate.api.nvidia.com/v1 | A base url for accessing the NVIDIA NIM |
2121
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
22-
| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |
2322
| `rerank_model_to_url` | `dict[str, str` | No | `{'nv-rerank-qa-mistral-4b:1': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking', 'nvidia/nv-rerankqa-mistral-4b-v3': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/nv-rerankqa-mistral-4b-v3/reranking', 'nvidia/llama-3.2-nv-rerankqa-1b-v2': 'https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking'}` | Mapping of rerank model identifiers to their API endpoints. |
2423

2524
## Sample Configuration
2625

2726
```yaml
28-
url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
27+
base_url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com/v1}
2928
api_key: ${env.NVIDIA_API_KEY:=}
30-
append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
3129
```

docs/docs/providers/inference/remote_ollama.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ Ollama inference provider for running local models through the Ollama runtime.
1616
|-------|------|----------|---------|-------------|
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
19-
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
19+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | http://localhost:11434/v1 | |
2020

2121
## Sample Configuration
2222

2323
```yaml
24-
url: ${env.OLLAMA_URL:=http://localhost:11434}
24+
base_url: ${env.OLLAMA_URL:=http://localhost:11434/v1}
2525
```

docs/docs/providers/inference/remote_openai.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `base_url` | `<class 'str'>` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
2121

2222
## Sample Configuration
2323

docs/docs/providers/inference/remote_passthrough.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ Passthrough inference provider for connecting to any external inference service
1717
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
1818
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
1919
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
20-
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
20+
| `base_url` | `pydantic.networks.HttpUrl \| None` | No | | The URL for the passthrough endpoint |
2121

2222
## Sample Configuration
2323

2424
```yaml
25-
url: ${env.PASSTHROUGH_URL}
25+
base_url: ${env.PASSTHROUGH_URL}
2626
api_key: ${env.PASSTHROUGH_API_KEY}
2727
```

0 commit comments

Comments
 (0)