Skip to content

[Workers AI]Add embedding model token and batch size limits #18975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

daisyfaithauma
Copy link
Contributor

@daisyfaithauma daisyfaithauma commented Dec 31, 2024

Summary

  • Add embedding model token and batch size limits

Screenshots (optional)

Documentation checklist

  • The documentation style guide has been adhered to.
  • If a larger change - such as adding a new page- an issue has been opened in relation to any incorrect or out of date information that this PR fixes.
  • Files which have changed name or location have been allocated redirects.

@github-actions github-actions bot added the product:workers-ai Workers AI: https://developers.cloudflare.com/workers-ai/ label Dec 31, 2024
@daisyfaithauma daisyfaithauma changed the title [AIG]Add embedding model token and batch size limits [Workers AI]Add embedding model token and batch size limits Dec 31, 2024
Copy link

Deploying cloudflare-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7689359
Status: ✅  Deploy successful!
Preview URL: https://fe6ae089.cloudflare-docs-7ou.pages.dev
Branch Preview URL: https://clarify-input-limits-workers.cloudflare-docs-7ou.pages.dev

View logs

Copy link
Contributor

Copy link
Contributor

@craigsdennis craigsdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this page Existed! @mchenco can you weigh in on this page? Should come from API no?

- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch more models here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to check and see if this page is still accurate/wanted. Wonder if this should come from the API?

@daisyfaithauma
Copy link
Contributor Author

Closed because its not relevant anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product:workers-ai Workers AI: https://developers.cloudflare.com/workers-ai/ size/s
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants