Skip to content

feat(usage statistics): add configurable usage stats persistence and management APIs#2125

Open
shaoyuanyu wants to merge 1 commit intorouter-for-me:mainfrom
shaoyuanyu:main
Open

feat(usage statistics): add configurable usage stats persistence and management APIs#2125
shaoyuanyu wants to merge 1 commit intorouter-for-me:mainfrom
shaoyuanyu:main

Conversation

@shaoyuanyu
Copy link

Summary

This PR adds configurable usage statistics persistence and corresponding management APIs, so usage data can be safely persisted to disk and restored across restarts.

Motivation

Usage metrics are currently memory-based, which makes operational visibility weaker after process restarts. This change introduces a lightweight persistence mechanism controlled via config.yaml, with runtime status exposed in management APIs.

Changes

  • Introduced usage-persistence config section:
    • enabled
    • file-path
    • interval-seconds
  • Added PersistenceManager for usage stats:
    • periodic autosave
    • manual save/load
    • runtime status (last saved, last loaded, last error)
    • graceful shutdown flush
  • Added management endpoints:
    • GET /v0/management/usage-persistence
    • PUT/PATCH /v0/management/usage-persistence
    • GET /v0/management/usage/persistence-status
    • POST /v0/management/usage/save
    • POST /v0/management/usage/load
  • Wired persistence into server lifecycle:
    • startup initialization
    • hot-reload config apply
    • shutdown stop/flush
  • Added watcher diff output for usage-persistence.* changes.
  • Updated default/sanitized persistence interval to 30s.
  • .gitignore: ignore local server binary.

Backward Compatibility

  • Fully backward compatible.
  • Persistence is opt-in (enabled: false by default).
  • Missing/invalid fields are sanitized to safe defaults.

Validation

  • Management route wiring and config apply path verified.
  • Persistence save/load/status flow validated.
  • No new errors in modified backend files.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's operational visibility by introducing a mechanism to persist usage statistics to disk. Previously, usage metrics were entirely memory-based and would be lost upon process restarts. The new feature allows for configurable, disk-based storage of these metrics, ensuring their retention and availability across server sessions. It provides a robust way to manage and monitor this persistence through new APIs and integrates seamlessly with the existing configuration and server lifecycle.

Highlights

  • Configurable Usage Statistics Persistence: Introduced a new configuration section usage-persistence in config.yaml to control the persistence of usage statistics, including enabled, file-path, and interval-seconds settings.
  • New Persistence Manager: Added a PersistenceManager responsible for periodic autosaving, manual saving and loading of usage statistics, and exposing runtime status (last saved, last loaded, last error).
  • Management Endpoints: Implemented new management API endpoints to interact with the usage persistence feature, allowing users to get/set persistence configuration, retrieve persistence status, and trigger manual save/load operations.
  • Server Lifecycle Integration: Integrated the usage persistence mechanism into the server's lifecycle, ensuring proper initialization on startup, application of configuration changes on hot-reload, and graceful flushing of data on shutdown.
  • Configuration Watcher Updates: Updated the configuration watcher to detect and report changes in usage-persistence.* settings, providing better visibility into dynamic configuration updates.
  • Backward Compatibility and Opt-in: Ensured full backward compatibility, with persistence being opt-in (disabled by default), and robust sanitization of missing or invalid configuration fields to safe defaults.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/api/handlers/management/config_basic.go
    • Added API handlers for getting and setting usage persistence configuration.
    • Implemented validation for interval-seconds in the PutUsagePersistence handler.
  • internal/api/handlers/management/handler.go
    • Added a usagePersistence field to the Handler struct.
    • Initialized the PersistenceManager in NewHandler and applied its configuration.
    • Modified SetConfig to apply UsagePersistence configuration during hot-reloads.
    • Added SetUsagePersistenceManager to allow replacing the persistence manager.
    • Implemented a Stop method to gracefully shut down the usagePersistence manager.
  • internal/api/handlers/management/usage.go
    • Added API handlers for retrieving usage persistence status.
    • Implemented API handlers for manually saving usage statistics.
    • Implemented API handlers for manually loading usage statistics.
  • internal/api/server.go
    • Added a usagePersistence field to the Server struct.
    • Initialized the PersistenceManager in NewServer and configured it.
    • Registered new management routes for usage persistence status, save, load, and configuration.
    • Called the Stop method of the management handler during server shutdown.
    • Updated UpdateClients to apply UsagePersistence configuration changes and set the UsagePersistenceManager.
  • internal/config/config.go
    • Added a UsagePersistence field to the main Config struct.
    • Defined the UsagePersistenceConfig struct with Enabled, FilePath, and IntervalSeconds fields.
    • Set default values for UsagePersistence fields in LoadConfigOptional.
    • Implemented sanitization logic for UsagePersistence fields, ensuring valid FilePath and IntervalSeconds.
  • internal/usage/persistence.go
    • Created a new PersistenceManager to manage the saving and loading of usage statistics.
    • Implemented ApplyConfig to dynamically update persistence settings.
    • Developed SaveNow and LoadNow methods for immediate persistence operations.
    • Provided a Status method to report the current state of persistence.
    • Included a Stop method for graceful shutdown, with an option to flush data.
    • Added a recordError method to track the last encountered error.
  • internal/watcher/diff/config_diff.go
    • Updated the BuildConfigChangeDetails function to include diffing for usage-persistence.enabled, usage-persistence.file-path, and usage-persistence.interval-seconds.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for persisting usage statistics, enhancing operational visibility. The implementation, including the new management APIs and integration into the server lifecycle, is well-executed. I have identified a few areas for improvement concerning error handling, dependency management, and code maintainability. Specifically, some critical persistence errors are not logged, there's some redundancy in object creation and configuration, and certain default values are hardcoded across multiple files. I've provided detailed suggestions in the comments to address these points, which should further strengthen the implementation.

Comment on lines +95 to +98
if needStart || shouldRestart {
_, _ = m.LoadNow()
go m.run()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error from m.LoadNow() is ignored. If loading the persisted usage statistics fails when applying a new configuration, it happens silently. This could lead to a state where the server is running with incomplete data without any indication in the logs. Critical operations like this should have their errors logged for visibility.

Suggested change
if needStart || shouldRestart {
_, _ = m.LoadNow()
go m.run()
}
if _, err := m.LoadNow(); err != nil {
log.WithError(err).Error("Failed to load usage statistics on config apply")
}
go m.run()

for {
select {
case <-ticker.C:
_, _ = m.SaveNow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error returned by m.SaveNow() is ignored within the periodic saving loop. If periodic persistence fails, it will do so silently, which could lead to data loss over time. This error should be logged to ensure that any persistent storage issues are visible.

Suggested change
_, _ = m.SaveNow()
if _, err := m.SaveNow(); err != nil {
log.WithError(err).Error("Failed to save usage statistics periodically")
}

m.mu.Unlock()

if flush {
_, _ = m.SaveNow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error from m.SaveNow() is ignored during the Stop operation. This is a critical point where data should be flushed to disk. A silent failure here would result in data loss for the last operational period before shutdown. The error should be logged.

		if _, err := m.SaveNow(); err != nil {
			log.WithError(err).Error("Failed to save usage statistics on stop")
		}

failedAttempts: make(map[string]*attemptInfo),
authManager: manager,
usageStats: usage.GetRequestStatistics(),
usagePersistence: usage.NewPersistenceManager(usage.GetRequestStatistics(), filepath.Dir(configFilePath)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PersistenceManager is instantiated here within NewHandler, but it is immediately replaced by the Server via SetUsagePersistenceManager. This creates a temporary, unused PersistenceManager instance on every NewHandler call, which is inefficient and confusing. The management.Handler should receive its dependencies rather than creating them.

Comment on lines +113 to +118
func (h *Handler) SetConfig(cfg *config.Config) {
h.cfg = cfg
if h != nil && h.usagePersistence != nil && cfg != nil {
h.usagePersistence.ApplyConfig(cfg.UsagePersistence)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ApplyConfig method for usagePersistence is called here within SetConfig, and also in Server.UpdateClients. This results in a redundant call during a hot reload. The Server should be the single source of truth for orchestrating configuration updates to its components.

func (h *Handler) SetConfig(cfg *config.Config) {
	h.cfg = cfg
}

Comment on lines +570 to +571
cfg.UsagePersistence.FilePath = "usage-statistics.json"
cfg.UsagePersistence.IntervalSeconds = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default file path "usage-statistics.json" and interval 30 for usage persistence are hardcoded as magic values in multiple files (internal/config/config.go, internal/api/handlers/management/config_basic.go, internal/usage/persistence.go). This makes the configuration harder to maintain and increases the risk of inconsistencies. These default values should be defined as exported constants in the config package and used throughout the codebase.

Comment on lines +176 to +178
if err = tmpFile.Close(); err != nil {
return m.Status(), writeErr(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a subtle bug in the error handling of SaveNow. If tmpFile.Close() fails, the writeErr helper is called. However, writeErr itself calls tmpFile.Close() again, leading to a double-close on the file descriptor.

	if err = tmpFile.Close(); err != nil {
		_ = os.Remove(tmpName)
		m.recordError(err)
		return m.Status(), err
	}

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0b0530d9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +269 to +271
if s.usagePersistence != nil {
s.usagePersistence.ApplyConfig(cfg.UsagePersistence)
s.mgmt.SetUsagePersistenceManager(s.usagePersistence)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop old persistence manager before swapping handler manager

When usage-persistence.enabled is true at startup, management.NewHandler has already created and configured its own PersistenceManager (which can start autosave), and this assignment then swaps in s.usagePersistence without stopping the original one. That leaves an orphaned goroutine writing with stale settings, so you can get duplicate/background saves and persistence may continue even after later disabling the active manager.

Useful? React with 👍 / 👎.

Comment on lines +270 to +271
if flush {
_, _ = m.SaveNow()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip shutdown flush when usage persistence is disabled

Stop(true) always calls SaveNow() regardless of m.enabled. Because server shutdown always calls mgmt.Stop(), instances running with the default usage-persistence.enabled: false still persist usage data to disk during shutdown, which breaks the advertised opt-in behavior and can unexpectedly write telemetry data.

Useful? React with 👍 / 👎.

Comment on lines +848 to 852
s.mgmt.Stop()
}

// Shutdown the HTTP server.
if err := s.server.Shutdown(ctx); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Flush persistence after HTTP shutdown to avoid lost requests

The stop sequence flushes usage persistence before s.server.Shutdown(ctx). During graceful shutdown, in-flight requests can still finish and update usage statistics after this early flush, but there is no second flush afterward, so tail requests are dropped from persisted stats under active traffic.

Useful? React with 👍 / 👎.

Copy link

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary: This PR adds configurable usage statistics persistence with background saving and management APIs. The implementation looks solid overall.

Strengths

  • Clean API design with separate GET/PUT endpoints for persistence config
  • Proper error handling and validation (e.g., interval-seconds > 0 check)
  • Good use of atomic write pattern (temp file + rename)
  • Proper resource cleanup with Stop() method and graceful shutdown
  • Thread-safe with mutex protection on shared state

Issues & Suggestions

  1. Missing nil checks in race conditions: In ApplyConfig(), you check if m == nil at the start but access m.mu and m.path after unlocking. If another goroutine sets m = nil concurrently, this could panic. Consider structuring differently or using RWMutex.

  2. Resource leak in NewPersistenceManager(): If ApplyConfig() fails to start the background goroutine (e.g., network issues during LoadNow()), the stopCh is never created but the manager may be left in an inconsistent state.

  3. Magic version number: The persistence payload uses Version: 1 but also checks for Version: 0. This is unclear - are you supporting legacy data? Document this or remove the v0 check.

  4. Error handling in LoadNow(): The function returns nil for file-not-found errors (which is good) but also clears lastError. However, a missing file might indicate a config issue worth surfacing to the user.

  5. Test coverage: No tests visible in the diff. Consider adding tests for:

    • Concurrent config updates
    • File corruption recovery
    • Graceful shutdown with pending saves
    • Version migration

Security

  • ✅ File path validation with TrimSpace() and empty string defaults
  • ✅ Directory creation with os.MkdirAll() uses safe permissions
  • ✅ No injection vulnerabilities evident

Minor

  • The writeErr closure pattern in SaveNow() is clever but makes the control flow harder to follow. Consider extracting to a helper function.

Copy link
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:
This is a useful feature, but the current lifecycle handling in the persistence manager has a few correctness issues that make the behavior diverge from the advertised opt-in semantics.

Blocking findings:

  • Disabling usage persistence does not actually stop the autosave loop unless the path or interval also changes.
  • NewHandler starts one persistence manager and NewServer then creates a second one and swaps it in, which leaves the first instance orphaned when persistence is enabled at startup.
  • Shutdown currently stops and flushes persistence before server.Shutdown() finishes, so in-flight requests can be missed, and the unconditional Stop(true) also writes a snapshot even when persistence is disabled.

Test plan:

  • Not run locally.
  • Please add coverage for startup-enabled, enable->disable, reconfigure, and graceful-shutdown flows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants