diff --git a/src/content/docs/workers-ai/platform/limits.mdx b/src/content/docs/workers-ai/platform/limits.mdx index 3a88f2032f1ef60..a1ed737afee39c5 100644 --- a/src/content/docs/workers-ai/platform/limits.mdx +++ b/src/content/docs/workers-ai/platform/limits.mdx @@ -3,10 +3,9 @@ pcx_content_type: configuration title: Limits sidebar: order: 2 - --- -import { Render } from "~/components" +import { Render } from "~/components"; Workers AI is now Generally Available. We've updated our rate limits to reflect this. @@ -20,48 +19,63 @@ Rate limits are default per task type, with some per-model limits defined as fol ### [Automatic Speech Recognition](/workers-ai/models/#automatic-speech-recognition) -* 720 requests per minute +- 720 requests per minute ### [Image Classification](/workers-ai/models/#image-classification) -* 3000 requests per minute +- 3000 requests per minute ### [Image-to-Text](/workers-ai/models/#image-to-text) -* 720 requests per minute +- 720 requests per minute ### [Object Detection](/workers-ai/models/#object-detection) -* 3000 requests per minute +- 3000 requests per minute ### [Summarization](/workers-ai/models/#summarization) -* 1500 requests per minute +- 1500 requests per minute ### [Text Classification](/workers-ai/models/#text-classification) -* 2000 requests per minute +- 2000 requests per minute ### [Text Embeddings](/workers-ai/models/#text-embeddings) -* 3000 requests per minute -* [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute +- 3000 requests per minute +- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute + +#### Additional limits for Embedding Models + +When using `@cf/baai/bge` embedding models, the following limits apply: + +- The maximum token limit per input is 512 tokens. +- The maximum batch size is100 inputs per request. + - The total number of tokens across all inputs in the batch must not exceed internal processing limits. + - Larger inputs (closer to 512 tokens) may reduce the maximum batch size due to these constraints. + +#### Behavior and constraints + +1. Exceeding the batch size limit:If more than 100 inputs are provided, a `400 Bad Request` error is returned. +2. Exceeding the token limit per input: If a single input exceeds 512 tokens, the request will fail with a `400 Bad Request` error. +3. Combined constraints:Requests with both a high batch size and large token inputs may fail due to exceeding the model's processing limits. ### [Text Generation](/workers-ai/models/#text-generation) -* 300 requests per minute -* [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute -* [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute -* [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute -* [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute -* [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute -* [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute +- 300 requests per minute +- [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute +- [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute +- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute +- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute +- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute +- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute ### [Text-to-Image](/workers-ai/models/#text-to-image) -* 720 requests per minute -* [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute +- 720 requests per minute +- [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute ### [Translation](/workers-ai/models/#translation) -* 720 requests per minute +- 720 requests per minute