feat: add configurable TTFT timeout fallback#576
Open
NaNomicon wants to merge 9 commits intodecolua:masterfrom
Open
feat: add configurable TTFT timeout fallback#576NaNomicon wants to merge 9 commits intodecolua:masterfrom
NaNomicon wants to merge 9 commits intodecolua:masterfrom
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Author
|
Verification note: I validated the feature behavior on the fork branches before splitting these PRs. When re-verifying the cherry-picked upstream branches locally, the existing upstream |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ttftTimeoutMsandttftCooldownMssettings so slow streaming requests can abort before the first token and fall back to the next eligible account or combo memberttft_timeoutin fallback handling, race request-start to first-byte arrival inchatCore, and defer stream success bookkeeping until the first chunk actually arrivesWhy
Some upstream providers can hold a streaming request open for too long before the first byte arrives. When that happens today, fallback is delayed even if healthier accounts or combo members are available. This change lets operators cap time-to-first-token and recover automatically instead of waiting on a slow first response.
Notes
0keeps TTFT timeout disabled