feat(usage statistics): add configurable usage stats persistence and management APIs#2125
feat(usage statistics): add configurable usage stats persistence and management APIs#2125shaoyuanyu wants to merge 1 commit intorouter-for-me:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the system's operational visibility by introducing a mechanism to persist usage statistics to disk. Previously, usage metrics were entirely memory-based and would be lost upon process restarts. The new feature allows for configurable, disk-based storage of these metrics, ensuring their retention and availability across server sessions. It provides a robust way to manage and monitor this persistence through new APIs and integrates seamlessly with the existing configuration and server lifecycle. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for persisting usage statistics, enhancing operational visibility. The implementation, including the new management APIs and integration into the server lifecycle, is well-executed. I have identified a few areas for improvement concerning error handling, dependency management, and code maintainability. Specifically, some critical persistence errors are not logged, there's some redundancy in object creation and configuration, and certain default values are hardcoded across multiple files. I've provided detailed suggestions in the comments to address these points, which should further strengthen the implementation.
| if needStart || shouldRestart { | ||
| _, _ = m.LoadNow() | ||
| go m.run() | ||
| } |
There was a problem hiding this comment.
The error from m.LoadNow() is ignored. If loading the persisted usage statistics fails when applying a new configuration, it happens silently. This could lead to a state where the server is running with incomplete data without any indication in the logs. Critical operations like this should have their errors logged for visibility.
| if needStart || shouldRestart { | |
| _, _ = m.LoadNow() | |
| go m.run() | |
| } | |
| if _, err := m.LoadNow(); err != nil { | |
| log.WithError(err).Error("Failed to load usage statistics on config apply") | |
| } | |
| go m.run() |
| for { | ||
| select { | ||
| case <-ticker.C: | ||
| _, _ = m.SaveNow() |
There was a problem hiding this comment.
The error returned by m.SaveNow() is ignored within the periodic saving loop. If periodic persistence fails, it will do so silently, which could lead to data loss over time. This error should be logged to ensure that any persistent storage issues are visible.
| _, _ = m.SaveNow() | |
| if _, err := m.SaveNow(); err != nil { | |
| log.WithError(err).Error("Failed to save usage statistics periodically") | |
| } |
| m.mu.Unlock() | ||
|
|
||
| if flush { | ||
| _, _ = m.SaveNow() |
There was a problem hiding this comment.
The error from m.SaveNow() is ignored during the Stop operation. This is a critical point where data should be flushed to disk. A silent failure here would result in data loss for the last operational period before shutdown. The error should be logged.
if _, err := m.SaveNow(); err != nil {
log.WithError(err).Error("Failed to save usage statistics on stop")
}| failedAttempts: make(map[string]*attemptInfo), | ||
| authManager: manager, | ||
| usageStats: usage.GetRequestStatistics(), | ||
| usagePersistence: usage.NewPersistenceManager(usage.GetRequestStatistics(), filepath.Dir(configFilePath)), |
There was a problem hiding this comment.
The PersistenceManager is instantiated here within NewHandler, but it is immediately replaced by the Server via SetUsagePersistenceManager. This creates a temporary, unused PersistenceManager instance on every NewHandler call, which is inefficient and confusing. The management.Handler should receive its dependencies rather than creating them.
| func (h *Handler) SetConfig(cfg *config.Config) { | ||
| h.cfg = cfg | ||
| if h != nil && h.usagePersistence != nil && cfg != nil { | ||
| h.usagePersistence.ApplyConfig(cfg.UsagePersistence) | ||
| } | ||
| } |
There was a problem hiding this comment.
The ApplyConfig method for usagePersistence is called here within SetConfig, and also in Server.UpdateClients. This results in a redundant call during a hot reload. The Server should be the single source of truth for orchestrating configuration updates to its components.
func (h *Handler) SetConfig(cfg *config.Config) {
h.cfg = cfg
}| cfg.UsagePersistence.FilePath = "usage-statistics.json" | ||
| cfg.UsagePersistence.IntervalSeconds = 30 |
There was a problem hiding this comment.
The default file path "usage-statistics.json" and interval 30 for usage persistence are hardcoded as magic values in multiple files (internal/config/config.go, internal/api/handlers/management/config_basic.go, internal/usage/persistence.go). This makes the configuration harder to maintain and increases the risk of inconsistencies. These default values should be defined as exported constants in the config package and used throughout the codebase.
| if err = tmpFile.Close(); err != nil { | ||
| return m.Status(), writeErr(err) | ||
| } |
There was a problem hiding this comment.
There is a subtle bug in the error handling of SaveNow. If tmpFile.Close() fails, the writeErr helper is called. However, writeErr itself calls tmpFile.Close() again, leading to a double-close on the file descriptor.
if err = tmpFile.Close(); err != nil {
_ = os.Remove(tmpName)
m.recordError(err)
return m.Status(), err
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0b0530d9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if s.usagePersistence != nil { | ||
| s.usagePersistence.ApplyConfig(cfg.UsagePersistence) | ||
| s.mgmt.SetUsagePersistenceManager(s.usagePersistence) |
There was a problem hiding this comment.
Stop old persistence manager before swapping handler manager
When usage-persistence.enabled is true at startup, management.NewHandler has already created and configured its own PersistenceManager (which can start autosave), and this assignment then swaps in s.usagePersistence without stopping the original one. That leaves an orphaned goroutine writing with stale settings, so you can get duplicate/background saves and persistence may continue even after later disabling the active manager.
Useful? React with 👍 / 👎.
| if flush { | ||
| _, _ = m.SaveNow() |
There was a problem hiding this comment.
Skip shutdown flush when usage persistence is disabled
Stop(true) always calls SaveNow() regardless of m.enabled. Because server shutdown always calls mgmt.Stop(), instances running with the default usage-persistence.enabled: false still persist usage data to disk during shutdown, which breaks the advertised opt-in behavior and can unexpectedly write telemetry data.
Useful? React with 👍 / 👎.
| s.mgmt.Stop() | ||
| } | ||
|
|
||
| // Shutdown the HTTP server. | ||
| if err := s.server.Shutdown(ctx); err != nil { |
There was a problem hiding this comment.
Flush persistence after HTTP shutdown to avoid lost requests
The stop sequence flushes usage persistence before s.server.Shutdown(ctx). During graceful shutdown, in-flight requests can still finish and update usage statistics after this early flush, but there is no second flush afterward, so tail requests are dropped from persisted stats under active traffic.
Useful? React with 👍 / 👎.
xkonjin
left a comment
There was a problem hiding this comment.
Code Review
Summary: This PR adds configurable usage statistics persistence with background saving and management APIs. The implementation looks solid overall.
Strengths
- Clean API design with separate GET/PUT endpoints for persistence config
- Proper error handling and validation (e.g., interval-seconds > 0 check)
- Good use of atomic write pattern (temp file + rename)
- Proper resource cleanup with Stop() method and graceful shutdown
- Thread-safe with mutex protection on shared state
Issues & Suggestions
-
Missing nil checks in race conditions: In
ApplyConfig(), you checkif m == nilat the start but accessm.muandm.pathafter unlocking. If another goroutine setsm = nilconcurrently, this could panic. Consider structuring differently or using RWMutex. -
Resource leak in
NewPersistenceManager(): IfApplyConfig()fails to start the background goroutine (e.g., network issues duringLoadNow()), the stopCh is never created but the manager may be left in an inconsistent state. -
Magic version number: The persistence payload uses
Version: 1but also checks forVersion: 0. This is unclear - are you supporting legacy data? Document this or remove the v0 check. -
Error handling in
LoadNow(): The function returnsnilfor file-not-found errors (which is good) but also clearslastError. However, a missing file might indicate a config issue worth surfacing to the user. -
Test coverage: No tests visible in the diff. Consider adding tests for:
- Concurrent config updates
- File corruption recovery
- Graceful shutdown with pending saves
- Version migration
Security
- ✅ File path validation with
TrimSpace()and empty string defaults - ✅ Directory creation with
os.MkdirAll()uses safe permissions - ✅ No injection vulnerabilities evident
Minor
- The
writeErrclosure pattern inSaveNow()is clever but makes the control flow harder to follow. Consider extracting to a helper function.
luispater
left a comment
There was a problem hiding this comment.
Summary:
This is a useful feature, but the current lifecycle handling in the persistence manager has a few correctness issues that make the behavior diverge from the advertised opt-in semantics.
Blocking findings:
- Disabling usage persistence does not actually stop the autosave loop unless the path or interval also changes.
NewHandlerstarts one persistence manager andNewServerthen creates a second one and swaps it in, which leaves the first instance orphaned when persistence is enabled at startup.- Shutdown currently stops and flushes persistence before
server.Shutdown()finishes, so in-flight requests can be missed, and the unconditionalStop(true)also writes a snapshot even when persistence is disabled.
Test plan:
- Not run locally.
- Please add coverage for startup-enabled, enable->disable, reconfigure, and graceful-shutdown flows.
Summary
This PR adds configurable usage statistics persistence and corresponding management APIs, so usage data can be safely persisted to disk and restored across restarts.
Motivation
Usage metrics are currently memory-based, which makes operational visibility weaker after process restarts. This change introduces a lightweight persistence mechanism controlled via
config.yaml, with runtime status exposed in management APIs.Changes
usage-persistenceconfig section:enabledfile-pathinterval-secondsPersistenceManagerfor usage stats:last saved,last loaded,last error)GET /v0/management/usage-persistencePUT/PATCH /v0/management/usage-persistenceGET /v0/management/usage/persistence-statusPOST /v0/management/usage/savePOST /v0/management/usage/loadusage-persistence.*changes.Backward Compatibility
enabled: falseby default).Validation