监听器与GitHub Actions分工以及分片传输

### 概览
给出一套可直接实现的端到端方案，主题是**服务器端与 GitHub Actions 端职责划分、脚本模块化、以及分片传输协议**。目标是：服务端负责可靠抓取与打包完整上下文并决定是否分片；Actions 负责复杂筛选、构建 LLM 可读的 markdown、权限防火墙与最终执行；分片为自动回退机制，仅在 payload 超限或单片过大时启用。

---

### 职责划分与高层流程
**服务端职责**
- 主动轮询 GET /notifications 并去重、标记已读。  
- 推断通知类型并拉取完整上下文（PR: comments, reviews, review comments; Issue: body + comments; Discussion: GraphQL thread）。  
- 定位触发节点并构建 ContextEnvelope（原始上下文、触发节点、基础元数据）。  
- 体积检测与压缩尝试（gzip → base64），若序列化后 ≤ SAFE_LIMIT 则直接 repository_dispatch。  
- 若超限则进入自动分片 fallback：分片、生成 shard metadata、dispatch 每片或上传外部临时存储并 dispatch URL 列表。  
- 记录审计日志并重试失败 dispatch。

**GitHub Actions 职责**
- 接收 ContextEnvelope 或 shard dispatch。  
- 若收到单片 envelope：执行上下文筛选、批次识别、长讨论裁剪、构建 markdown context、基于 trust_level 执行防火墙校验、调用 LLM、执行通过的 gh/git 操作、写审计。  
- 若收到 shard：保存为 artifact（命名 `<jobid>-shard-<index>`），在收齐后触发合并 workflow，合并并校验后进入同上处理链。  
- 提供幂等处理、重试与告警。

---

### 服务端模块化设计与接口
单脚本但模块化，模块职责清晰，便于单元测试与替换。建议文件或模块划分如下。

#### 模块清单
- **poller**  
  - 功能：轮询 GET /notifications，去重，写入事件队列。  
  - 接口：`poll_notifications()` → yields `Notification` objects。

- **classifier**  
  - 功能：把 notification 映射为 event type（pr/issue/discussion/review_comment 等）。  
  - 接口：`classify(notification) -> EventType`.

- **context_fetcher**  
  - 功能：按类型拉取完整上下文（REST + GraphQL）。  
  - 接口：`fetch_context(event) -> RawContext`。  
  - 注意：分页、速率限流、并发控制。

- **trigger_locator**  
  - 功能：在 RawContext 中定位触发节点（comment id / review id / discussion reply id）。  
  - 接口：`locate_trigger(raw_context, notification) -> TriggerRef`.

- **envelope_builder**  
  - 功能：构建 ContextEnvelope JSON（见下节 schema）。  
  - 接口：`build_envelope(raw_context, trigger_ref, metadata) -> envelope_json`.

- **size_checker**  
  - 功能：序列化并计算字节长度，尝试 gzip 压缩后再测量。  
  - 接口：`check_size(envelope_json) -> {size, compressed_size}`。

- **shard_splitter**  
  - 功能：当压缩后仍超 SAFE_LIMIT 时分片（按字节或按语义优先），生成 shard payloads。  
  - 接口：`split_into_shards(envelope_json, shard_size) -> list[Shard]`。  
  - 输出每片包含 `jobid, shard_index, total_shards, encoding, checksum, data_b64` 或 `url`。

- **dispatcher**  
  - 功能：对单片 envelope 发 `repository_dispatch`，或对每个 shard 发 dispatch；支持并发与重试。  
  - 接口：`dispatch(payload)`。  
  - 记录返回码并写审计。

- **audit_logger**  
  - 功能：记录每次 fetch、dispatch、分片、重试、错误与最终状态。  
  - 接口：`log(event_type, details)`。

#### 关键伪代码片段
```python
def handle_notification(notification):
    event = classify(notification)
    raw = fetch_context(event)
    trigger = locate_trigger(raw, notification)
    envelope = build_envelope(raw, trigger, metadata)
    size_info = check_size(envelope)
    if size_info['compressed_size'] <= SAFE_LIMIT:
        payload = maybe_compress(envelope)
        dispatch(payload)
    else:
        shards = split_into_shards(envelope, shard_size=SHARD_BYTES)
        for shard in shards:
            dispatch(shard.to_dispatch_payload())
```

---

### 分片协议与 JSON schema
**设计目标**：简单、幂等、可校验、易合并、审计友好。

#### shard 元数据字段
- **jobid** string UUID 唯一任务标识  
- **shard_index** integer 从 0 开始  
- **total_shards** integer  
- **encoding** string 如 `gzip+base64` 或 `plain`  
- **checksum** string 每片 SHA256 hex  
- **context_type** string `pr|issue|discussion`  
- **trigger_ref** object 指向触发节点（id, kind）  
- **manifest_hint** optional string 指向 central manifest 或外部 URL  
- **data_b64** string 当直接在 dispatch 中包含分片数据  
- **data_url** string 当分片上传到外部存储时包含短期签名 URL

#### ContextEnvelope 精简 schema 示例
```json
{
  "version": 1,
  "type": "pr",
  "repo": "owner/name",
  "actor": "alice",
  "trust_level": "collaborator",
  "trigger": {"kind":"review_comment","id":123456},
  "pr": {"number":42,"head_repo":"llm-fork/name","head_branch":"llm/fix-xyz"},
  "thread": {"items":[{"kind":"comment","id":1,"author":"bob","body":"...","created_at":"..."}]},
  "timestamp":"2026-01-30T12:00:00Z"
}
```

#### dispatch payload 示例（单片）
```json
{
  "event_type":"agent_context",
  "client_payload":{
    "jobid":"uuid",
    "envelope_b64":"<gzip+base64>",
    "encoding":"gzip+base64",
    "context_type":"pr",
    "shard_index":0,
    "total_shards":1,
    "checksum":"sha256hex"
  }
}
```

#### dispatch payload 示例（分片）
```json
{
  "event_type":"agent_shard",
  "client_payload":{
    "jobid":"uuid",
    "shard_index":2,
    "total_shards":5,
    "encoding":"gzip+base64",
    "checksum":"sha256hex",
    "data_b64":"<...>"
  }
}
```

---

### GitHub Actions 工作流与模块化实现
建议在仓库中提供三套 workflow 模板并配套模块化脚本。

#### workflows
- **receive-context.yml** on: repository_dispatch types: [agent_context]  
  - 步骤：解析 payload → 解码 envelope → 保存临时文件 → 调用 `context_builder` 模块进行筛选与 markdown 构建 → 调用 `firewall` 校验 → 调用 `llm_runner` → `executor` 执行 → 审计上传。

- **receive-shard.yml** on: repository_dispatch types: [agent_shard]  
  - 步骤：解析 payload → 保存 shard 为 artifact 名 `<jobid>-shard-<index>` → 更新 manifest artifact 或调用 manifest API → 若为最后一片或 manifest 表示收齐则触发 merge workflow。

- **merge-shards.yml** on: repository_dispatch types: [agent_merge] 或 schedule/manual  
  - 步骤：列出 artifacts 通过 API 过滤 `<jobid>-shard-*` → 下载并解压每片 → 按索引合并并校验整体 checksum → 生成 merged envelope artifact `<jobid>-merged` → 触发 `receive-context.yml` 的内部处理逻辑或直接调用 context_builder。

#### Actions 模块化脚本
放在 `agent/` 目录，模块化如下：
- **agent/context_builder.py** 收到 envelope 或 merged artifact，执行：批次识别、review 聚合、长讨论裁剪、生成 markdown context。输出 `context.md` 与 `task`（trigger body）。
- **agent/firewall.py** 根据 `context.trust_level` 与 policy.yaml 决定允许的操作集合，返回 allow/reject 列表与拒绝原因。
- **agent/llm_runner.py** 调用 LLM（受控容器或外部服务），输入 `context.md` 与 `task`，输出命令数组或意图对象。
- **agent/executor.py** 将命令数组交给宿主代理或使用 gh CLI（由 runner 的 token 执行），捕获 stdout/stderr，写审计。
- **agent/artifact_utils.py** artifact 下载/解压/上传与合并工具函数。
- **agent/audit.py** 统一写审计事件到 artifact 或外部日志系统。

#### 合并触发策略建议
- **优先** 最后一片 dispatch 同时发 `agent_merge` 事件（发送端在上传最后一片后触发）。  
- **备选** 合并协调器 workflow 定期检查 artifacts（schedule）并触发合并。  
- **manifest** 可选：每片上传后更新 central manifest artifact，合并 workflow 读取 manifest 判断是否完成。

---

### 容错、幂等、监控与测试
**幂等**
- artifact 命名规则保证幂等：同名覆盖或先删除再上传。Actions 在接收时检查是否已存在并跳过重复处理。

**重试**
- dispatcher 对 HTTP 非 2xx 重试（指数退避，最多 5 次）。  
- Actions 下载 artifact 失败重试并记录。

**缺片处理**
- 合并 workflow 检查 `total_shards` 与实际下载到的数量；若缺片则重试若干次并告警；若仍缺片则记录失败并通知发送端。

**安全**
- 最小权限 token 存 Secrets；artifact 不存明文敏感信息；所有片与合并做 SHA256 校验。

**监控**
- 指标：dispatch 成功率、artifact 上传成功率、合并成功率、缺片率、平均合并延迟。  
- 告警：合并超时、校验失败、缺片重传次数超阈值。

**测试**
- 单元测试：每个服务端模块与 Actions 模块。  
- 集成测试：构造极端样本（单条 comment 38k 字符）验证 fallback 分片、artifact 上传、合并与校验。  
- 灰度部署：先在测试仓库跑一周监控指标，再在生产启用。

---

### 部署与逐步迁移建议
1. 在测试仓库实现并验证：实现 `envelope_builder`、`size_checker`、`shard_splitter`、`dispatcher`，并部署 `receive-shard.yml` 与 `merge-shards.yml`。  
2. 启用自动 fallback：默认不分片，只有当 envelope 超 SAFE_LIMIT 才分片。  
3. 运行端到端测试（含极端长文本），调整 SAFE_LIMIT 与 shard_size。  
4. 打开监控与告警，观察一周后调整并发与重试策略。  
5. 在生产仓库逐步切换，保留回滚路径（手动触发合并或回退到不分片模式）。

---

如果你需要，我可以立刻生成以下交付物之一以便落地实现：**自动 fallback 分片的 Python 最小实现**，或 **receive-shard.yml 与 merge-shards.yml 的完整 workflow 模板**，或 **ContextEnvelope 与 shard metadata 的完整 JSON schema**。请选择一个我现在为你生成。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

监听器与GitHub Actions分工以及分片传输 #180

概览

职责划分与高层流程

服务端模块化设计与接口

模块清单

关键伪代码片段

分片协议与 JSON schema

shard 元数据字段

ContextEnvelope 精简 schema 示例

dispatch payload 示例（单片）

dispatch payload 示例（分片）

GitHub Actions 工作流与模块化实现

workflows

Actions 模块化脚本

合并触发策略建议

容错、幂等、监控与测试

部署与逐步迁移建议

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

监听器与GitHub Actions分工以及分片传输 #180

Description

概览

职责划分与高层流程

服务端模块化设计与接口

模块清单

关键伪代码片段

分片协议与 JSON schema

shard 元数据字段

ContextEnvelope 精简 schema 示例

dispatch payload 示例（单片）

dispatch payload 示例（分片）

GitHub Actions 工作流与模块化实现

workflows

Actions 模块化脚本

合并触发策略建议

容错、幂等、监控与测试

部署与逐步迁移建议

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions