integrate management hardening and executor usage fixes#2234
integrate management hardening and executor usage fixes#2234shenshuoyaoyouguang wants to merge 15 commits intorouter-for-me:mainfrom
Conversation
…d state access - remove request-wide StateMiddleware locking from management routes - add runtime snapshots and applyConfigMutation for config writes - migrate management config/auth/log/usage paths to short-lived state access - fix oversized logs-max-total-size-mb acceptance in PUT /config.yaml - add regression coverage for lock-free config updates and oauth session registration
Finish merging the management hardening changes on top of PR router-for-me#1926 and prevent malformed amp mapping delete requests from clearing all mappings. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates several critical improvements to the system's stability and functionality. It refactors the management API to handle configuration state more robustly, ensuring data consistency and preventing concurrency issues. Additionally, it enhances the accuracy and reliability of usage reporting across various model executors, particularly for streaming responses. A specific endpoint for managing model mappings has also been made more resilient to malformed requests, improving overall API robustness. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e851cba048
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Warning Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting |
xkonjin
left a comment
There was a problem hiding this comment.
Code Review
Large PR covering three concerns: (1) management handler hardening via runtimeSnapshot(), (2) executor usage tracking fixes, and (3) defensive DELETE endpoint for model mappings.
Positives
- Replacing direct
h.authManager/h.cfgfield access withruntimeSnapshot()is a significant reliability improvement — eliminates data races during config reloads - The
findAuthByNameextraction reduces duplicated lookup logic across 4+ handlers - DELETE model-mappings hardening is excellent: empty body, null value, invalid JSON, and missing value all properly rejected. Only
{"value":[]}clears mappings. ensurePublishedon Qwen executor stream close prevents usage under-reporting- Stream usage parser now handles both
prompt_tokens/input_tokensnaming conventions
Issues to consider
1. tokenStoreWithBaseDir has a TOCTOU race
store := snapshot.tokenStore
if store == nil {
store = sdkAuth.GetTokenStore()
h.stateMu.Lock()
if h.tokenStore == nil {
h.tokenStore = store
}
h.stateMu.Unlock()
}The snapshot read and the subsequent mutex-guarded write are not atomic. Two goroutines could both read snapshot.tokenStore == nil, both call sdkAuth.GetTokenStore(), and the second one's store gets discarded. If GetTokenStore() is idempotent and returns a singleton, this is fine. If it creates a new instance each time, you could end up with a discarded store that already has state. Worth verifying.
2. managementCallbackURL dereferences h before nil check
snapshot, err := h.runtimeSnapshot()
if h == nil || err != nil || snapshot.cfg == nil || snapshot.cfg.Port <= 0 {If h is nil, h.runtimeSnapshot() panics before the nil check runs. The original code had the same issue (h.cfg), but this is an opportunity to fix it: check h == nil first.
3. Usage parsing simplification removes intentional zero-value distinction
The old code:
if cached.Exists() {
detail.CachedTokens = cached.Int()
}Now becomes:
detail.CachedTokens = cached.Int()When the field does not exist, gjson returns 0. The old code left the field at its zero value too, so functionally equivalent. But if the struct ever gains pointer fields or the zero value becomes meaningful (e.g., "0 cached tokens" vs "caching not reported"), this distinction matters. Low risk but worth noting.
4. No test for the Qwen ensurePublished addition
The new ensurePublished call on Qwen stream close is important for usage accuracy, but there is no corresponding test. Consider adding a test that verifies usage is recorded even when no explicit usage chunk appears in the stream.
Well-structured hardening PR. The snapshot migration is the right pattern. The DELETE endpoint fixes prevent a class of accidental data loss. Worth a quick check on the TOCTOU in token store before merge.
shenshuoyaoyouguang
left a comment
There was a problem hiding this comment.
Looks good to me after the latest fixes.
Notes
- The OAuth callback routes now fail closed instead of returning a false success page when callback persistence fails.
PUT /config.yamlnow normalizeslogs-max-total-size-mbbefore persistence, so the runtime config and the saved file stay consistent.- Added regression coverage for the Qwen streaming fallback usage path when no usage chunk is emitted.
Verified
go test ./internal/apigo test ./internal/api/handlers/managementgo test ./internal/runtime/executor
Non-blocking follow-ups
state == ""on callback routes still falls through to the success-page path (pre-existing behavior).PUT /config.yamlstill usesio.ReadAllwithout a request size limit.
Approve with notes.
Fail OAuth callback routes closed when callback persistence fails, persist normalized config.yaml values so runtime and saved config stay consistent, and add regression coverage for the Qwen fallback usage path. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a20c2fcebf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Reconnect management config writes to the server runtime apply path so successful updates now mean disk state and live behavior have switched together. Also prune removed ampcode keys during YAML-preserving saves so clearing upstream API key mappings persists across reload. Constraint: Management config writes must reuse Server.UpdateClients as the single runtime side-effect entry point Constraint: Successful responses must only be returned after persisted config has been reloaded and applied Rejected: Apply the mutated in-memory clone directly | would bypass reload-time normalization and persisted-state semantics Rejected: Reimplement hot-reload side effects per endpoint | would duplicate and drift from UpdateClients behavior Directive: Keep management config commits flowing through commitConfig/reloadCommittedConfig and Server.UpdateClients Confidence: high Scope-risk: moderate Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the management API's robustness and thread safety by refactoring state management to use a snapshot-based approach. It also integrates fixes for executor usage reporting and include_usage handling in streaming requests. A critical bug that could lead to accidental deletion of all model mappings has been fixed and is now well-tested. The code changes are extensive but consistent, and the addition of targeted tests for the new logic is commendable. My review includes a suggestion to improve the specificity of error messages in list deletion handlers for better diagnostics.
| if idx < 0 || idx >= len(*items) { | ||
| return fmt.Errorf("missing index or value") | ||
| } |
There was a problem hiding this comment.
The error message "missing index or value" is a bit generic for an out-of-bounds index error. Providing a more specific message would improve diagnostics for API clients. This pattern of generic error messages for index-based deletions appears in several other handlers in this file (e.g., DeleteGeminiKey, DeleteClaudeKey).
| if idx < 0 || idx >= len(*items) { | |
| return fmt.Errorf("missing index or value") | |
| } | |
| if idx < 0 || idx >= len(*items) { | |
| return fmt.Errorf("index %d is out of bounds for list of length %d", idx, len(*items)) | |
| } |
|
@xkonjin review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb61b6f631
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := writePendingOAuthCallbackFile(provider, state, code, errStr); err != nil { | ||
| log.Errorf("persist %s oauth callback failed: %v", provider, err) | ||
| c.Header("Content-Type", "text/html; charset=utf-8") | ||
| c.String(http.StatusInternalServerError, oauthCallbackFailureHTML) |
There was a problem hiding this comment.
Mark failed OAuth callbacks as errored before returning 500
If persisting the callback file fails here (for example because the auth dir is missing, read-only, or out of space), the browser gets a failure page but the session stays pending. The login goroutines started in RequestAnthropicToken/RequestCodexToken/RequestGeminiCLIToken only stop when IsOAuthSessionPending becomes false, so they keep waiting until the timeout and the UI keeps reporting an in-progress auth even though the callback already failed. Setting the session error before returning would let those flows fail immediately.
Useful? React with 👍 / 👎.
internal/config/config.go
Outdated
|
|
||
| pruneMappingToGeneratedKeys(original.Content[0], generated.Content[0], "oauth-excluded-models") | ||
| pruneMappingToGeneratedKeys(original.Content[0], generated.Content[0], "oauth-model-alias") | ||
| pruneAmpCodeGeneratedKeys(original.Content[0], generated.Content[0]) |
There was a problem hiding this comment.
Preserve unrecognized
ampcode keys when rewriting config
This new prune step removes every child under ampcode that is not present in the current AmpCode struct before merging the generated YAML back. As a result, any forward-compatible or vendor-specific ampcode setting that this binary does not know about will be silently deleted the next time a management endpoint saves the config, even if the edit was unrelated. SaveConfigPreserveComments is otherwise designed to preserve unknown keys, so this introduces data loss specifically for shared/multi-version configs that carry extra ampcode fields.
Useful? React with 👍 / 👎.
Mark OAuth sessions as failed when callback persistence breaks so pending auth flows stop waiting immediately, and narrow amp config pruning to the managed upstream-api-keys field so unknown ampcode settings survive config rewrites. Constraint: OAuth callback failures must transition sessions out of pending without changing successful callback behavior Constraint: YAML-preserving saves must clear removed upstream-api-keys without deleting unknown ampcode extensions Rejected: Leave callback persistence failures as pending | keeps auth polling and UI state stuck until timeout Rejected: Prune the full ampcode mapping | would silently drop forward-compatible or vendor-specific keys Directive: Keep ampcode pruning targeted to explicitly managed keys unless broader schema-aware pruning is introduced Confidence: high Scope-risk: narrow Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a49b9ef60b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := mutate(nextCfg); err != nil { | ||
| c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) | ||
| return false |
There was a problem hiding this comment.
Preserve 404s when config-list items are missing
applyConfigMutation now turns every mutation error into 400 Bad Request. Several list handlers in this commit use fmt.Errorf("item not found") for missing resources—for example PatchCodexKey in internal/api/handlers/management/config_lists.go used to return 404 before this refactor. As a result, a request that targets a non-existent entry is now indistinguishable from a malformed payload, which breaks clients/UI flows that rely on 404 Not Found to detect stale selections and refresh state.
Useful? React with 👍 / 👎.
| if err != nil { | ||
| managementHandlers.SetOAuthSessionError(state, strings.TrimSpace(errStr)) |
There was a problem hiding this comment.
Don't fail the active OAuth session on wrong callback route
This helper marks the session errored for every WriteOAuthCallbackFileForPendingSession failure, including provider mismatches. If an Anthropic login is pending and the browser lands on /codex/callback?state=... (for example via a stale redirect URL or manual retry), WriteOAuthCallbackFileForPendingSession returns errOAuthSessionNotPending, then SetOAuthSessionError flips the real session out of pending. The wait loops in RequestAnthropicToken/RequestCodexToken exit as soon as IsOAuthSessionPending becomes false, so the legitimate login is aborted even though the real callback never failed.
Useful? React with 👍 / 👎.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a major refactoring of the management API handlers to improve thread safety and configuration management. The new approach uses state snapshots and a transactional configuration mutation workflow, which is a significant improvement in robustness. The PR also includes fixes for executor usage reporting and enhances compatibility with OpenAI-compatible providers by adding intelligent retry logic for include_usage stream options. Additionally, it hardens the DELETE /v0/management/ampcode/model-mappings endpoint to prevent accidental data loss from malformed requests. The changes are well-implemented and accompanied by thorough tests. I have one minor suggestion to improve API error messages.
| Value *[]string `json:"value"` | ||
| } | ||
| if err := c.ShouldBindJSON(&body); err != nil || body.Value == nil { | ||
| c.JSON(400, gin.H{"error": "invalid body"}) |
There was a problem hiding this comment.
The error message "invalid body" is a bit generic. It could be more specific to help API consumers debug their requests, especially since this check handles both malformed JSON and a missing/null value field.
| c.JSON(400, gin.H{"error": "invalid body"}) | |
| c.JSON(400, gin.H{"error": "invalid body: 'value' field is missing, null, or malformed"}) |
Summary
include_usagefixes from PR fix(executor): track usage consistently and harden openai compat streaming #1926DELETE /v0/management/ampcode/model-mappingsrequests can no longer clear all mappingsMerge notes
internal/api/handlers/management/config_lists.goapplyConfigMutationmanagement state model introduced by PR fix(management): harden state updates and config limits #1923value,value: null, and explicitvalue: []Test plan
go test ./internal/runtime/executorgo test ./internal/api/handlers/management ./internal/api ./internal/configgo test ./test -run "TestDeleteAmpModelMappings_(EmptyBody|InvalidJSON|MissingValue|NullValue|EmptyArrayClearsAll|Specific|VerifyState|NonExistent)"go test ./internal/api/handlers/management ./internal/api ./internal/config ./internal/runtime/executor ./test🤖 Generated with Claude Code