Skip to content

解决大量codex账号进行健康扫描时候巨量内存分配以及高CPU占用#2251

Open
extremk wants to merge 3 commits intorouter-for-me:mainfrom
extremk:main
Open

解决大量codex账号进行健康扫描时候巨量内存分配以及高CPU占用#2251
extremk wants to merge 3 commits intorouter-for-me:mainfrom
extremk:main

Conversation

@extremk
Copy link
Contributor

@extremk extremk commented Mar 20, 2026

内存泄漏分析报告
问题概述
从 heap profile 来看,主要内存占用集中在以下几个热点:

热点 1:auth.(*Manager).List - 最大内存消费者(~13.4MB 当前,历史累计 5.6GB)

40: 13434880 [16790: 5639290880] @ auth.(*Manager).List
3369: 1940544 [1607287: 925797312] @ auth.(*Auth).Clone (called from List)
2435: 1012960 [1160282: 482677312] @ auth.(*Auth).Clone
1662: 478656 [801513: 230835744] @ auth.(*Auth).Clone
根因分析:

在 conductor.go:1981 的 List() 方法中,每次调用都会对所有 auth 进行深拷贝(Clone)。当频繁扫描大量 codex 账号健康度时,这会产生巨大的内存分配压力。

关键调用路径:

management.(*Handler).authByIndex → auth.(*Manager).List → Clone every auth authByIndex 在 api_tools.go:620 被调用,它遍历所有 auth 只是为了找到一个匹配的索引。每次 API 调用都执行完整的 List+Clone。

修复建议
修复 1(关键):authByIndex 应使用索引查找而非全量 List

// 在 conductor.go 中添加方法
func (m *Manager) GetByIndex(index string) (*Auth, bool) {
index = strings.TrimSpace(index)
if index == "" {
return nil, false
}
m.mu.RLock()
defer m.mu.RUnlock()
for _, auth := range m.auths {
if auth == nil {
continue
}
auth.EnsureIndex()
if auth.Index == index {
return auth.Clone(), true
}
}
return nil, false
}
然后修改 api_tools.go:620:

func (h *Handler) authByIndex(authIndex string) *coreauth.Auth { authIndex = strings.TrimSpace(authIndex) if authIndex == "" || h == nil || h.authManager == nil { return nil } // 使用直接查找替代 List() 全量拷贝 if auth, ok := h.authManager.GetByIndex(authIndex); ok { return auth } return nil } 预期效果: 将每次 API 调用的内存分配从 O(N×auth_size) 降低到 O(1×auth_size)。对于 1000 个账号,每次请求节省约 800KB~1MB 分配。

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决系统在处理大量 codex 账号健康扫描时出现的严重内存分配和 CPU 占用问题。核心改进在于优化了认证对象的查找机制,避免了不必要的全量列表深拷贝操作,从而显著降低了资源消耗,提升了系统的稳定性和效率。

Highlights

  • 内存优化: 解决了在大量 codex 账号健康扫描时,auth.(*Manager).List 方法因深拷贝所有 auth 对象而导致的巨量内存分配和高 CPU 占用问题。
  • 性能改进: 通过引入 GetByIndex 方法并修改 authByIndex 函数,将每次 API 调用的内存分配从 O(N×auth_size) 降低到 O(1×auth_size),显著提升了性能。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次变更通过引入 GetByIndex 方法,优化了根据索引查找认证信息(auth)的逻辑,有效解决了在大量账号健康检查时因全量克隆 auth 列表导致的内存分配过多和 CPU 占用高的问题。这是一个很好的性能优化。

然而,在 GetByIndex 的新实现中发现一个严重的数据竞争问题。该函数在持有读锁的情况下修改了共享数据,这可能导致程序在并发场景下出现不可预期的行为。我已在相关代码行提出了具体的修复建议,请务必处理。

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary: Replaces the authByIndex() full-list iteration with a direct GetByIndex() lookup on the auth manager, avoiding O(n) list copies during health scans with many Codex accounts.

Issues Found

✅ Good: Clone on return prevents data races
GetByIndex() returns auth.Clone() while holding the mutex, which is the right pattern to prevent callers from mutating shared state.

⚠️ Minor: EnsureIndex() called under lock
GetByIndex() calls auth.EnsureIndex() inside the critical section (m.mu.Lock()). If EnsureIndex() is expensive or does I/O, this could increase lock contention under heavy concurrent access. Looking at the original authByIndex(), it also called EnsureIndex() on each element but without holding the manager lock, so this is technically a behavior change. Worth verifying EnsureIndex() is cheap (likely just a string computation).

⚠️ Minor: Indentation style
The new code uses spaces for indentation while Go convention and the rest of the file use tabs. Run gofmt before merge.

✅ Logic is correct

  • Early return on empty index matches the original.
  • The linear scan within the locked section is fine for correctness; if this becomes hot, an index map could be added later.
  • The original callers get the same semantics (nil return on miss).

Verdict

Clean, targeted optimization. The mutex-held EnsureIndex() call is the only thing to double-check. LGTM with the gofmt pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants