Skip to content

feat: add lightweight monitoring scripts#2255

Closed
dev-Flyblue wants to merge 1 commit intorouter-for-me:mainfrom
dev-Flyblue:codex/dev
Closed

feat: add lightweight monitoring scripts#2255
dev-Flyblue wants to merge 1 commit intorouter-for-me:mainfrom
dev-Flyblue:codex/dev

Conversation

@dev-Flyblue
Copy link

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求为 CLIProxyAPI 项目引入了一个新的轻量级监控解决方案。该系统通过 shell 脚本实现,提供对 API 健康、系统资源、Docker 容器、网络、安全配置(UFW、fail2ban)和 TLS 证书的全面检查。它还包括自动化安装过程,并与飞书集成以发送告警通知,从而提高了项目的运行可靠性。

Highlights

  • 轻量级监控系统: 引入了使用飞书的轻量级监控和告警系统,旨在提供全面的系统和应用健康检查。
  • 多维度监控项: 监控项涵盖 API 存活与响应时间、Docker 容器状态、CPU/内存/磁盘使用、网络错误、UFW 状态、fail2ban、TLS 证书到期和源站直连检查。
  • 增强功能: 系统包含心跳日志、摘要输出(OK/WARN/FAIL)、失败恢复通知、独立 cron 日志、自动日志轮转和使用 flock 防止并发重入等功能。
  • 自动化安装与配置: 提供了 install.sh 脚本,用于自动化设置监控脚本、生成配置、配置 cron 任务和日志轮转,简化部署流程。
  • 可配置阈值: 支持配置各种监控指标的阈值,例如 API 失败次数、API 响应时间、CPU/内存/磁盘使用率和证书到期天数,以适应不同的告警需求。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 增加了一套轻量级的监控脚本。整体实现比较可靠,比如使用 flock 来防止并发执行,以及实现了告警冷却功能。我的审查主要集中在提升脚本的健壮性和修正一处逻辑问题。具体来说,我建议:降低 cron 任务的执行频率以符合“轻量级”的定位;修正 API 检查中一个可能引起误解的告警消息;以及加固用于状态管理的键值对读写函数,以防止潜在的数据损坏。

else
record_warn "API" "接口异常但未达到告警阈值,HTTP=${http_code},连续失败=${fail_count}"
fi
send_recovery "api_slow" "API" "响应时间已恢复正常"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

当 API 检查失败时(http_code != "200"),脚本会发送一个“响应时间已恢复正常”的恢复通知。这是不正确的,因为 API 此时是异常状态,而不是响应时间恢复正常。这会给用户带来困惑。建议删除此行,让 api_slow 状态在 API 真正恢复并响应正常时再被清除。

touch /var/log/cliproxyapi-monitor-run.log

cat > "$CRON_FILE" <<'EOF'
* * * * * root /opt/CLIProxyAPI/monitor/monitor.sh >> /var/log/cliproxyapi-monitor-run.log 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

当前的 cron 任务设置为每分钟执行一次。对于一个“轻量级”监控脚本来说,这个频率可能太高了,特别是考虑到某些检查(如证书到期)不需要频繁运行,而且 CPU 检查本身包含1秒的 sleep。过于频繁的执行可能会消耗不必要的系统资源。建议将执行频率降低到每5分钟一次。

Suggested change
* * * * * root /opt/CLIProxyAPI/monitor/monitor.sh >> /var/log/cliproxyapi-monitor-run.log 2>&1
*/5 * * * * root /opt/CLIProxyAPI/monitor/monitor.sh >> /var/log/cliproxyapi-monitor-run.log 2>&1

Comment on lines +79 to +86
set_kv() {
local file="$1" key="$2" value="$3"
if grep -qE "^${key}=" "$file"; then
sed -i "s#^${key}=.*#${key}=${value}#g" "$file"
else
echo "${key}=${value}" >> "$file"
fi
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

当前的 set_kv 函数实现不够健壮,存在一些问题:

  1. 它不是原子操作。在 grepsed/echo 之间如果发生中断,可能会导致状态不一致。
  2. sed 命令对 value 中的特殊字符(如 &)很敏感,可能导致状态文件损坏。
  3. grep 命令对 key 中的正则表达式元字符也很敏感。

虽然当前的使用场景似乎是安全的,但为了未来的可维护性和健壮性,建议使用一个更安全的方法,例如使用 awk 和原子文件替换(mv)。

set_kv() {
  local file="$1" key="$2" value="$3"
  local temp_file
  temp_file=$(mktemp)

  # Use awk to robustly update or append the key. It handles special characters
  # in values and avoids complex shell quoting issues with sed.
  # It uses index() to match the start of the line to avoid regex issues with `key`
  # and to correctly handle keys that are prefixes of other keys.
  awk -v k="$key" -v v="$value" '
    BEGIN { found = 0 }
    index($0, k "=") == 1 {
      print k "=" v
      found = 1
      next
    }
    { print }
    END {
      if (!found) {
        print k "=" v
      }
    }
  ' "$file" > "$temp_file"

  # Atomically replace the original file
  mv "$temp_file" "$file"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants