[Feature] Enable Tracing Mechanism (Phase 1)#1068
[Feature] Enable Tracing Mechanism (Phase 1)#1068Mustafa974 wants to merge 20 commits intoLazyAGI:mainfrom
Conversation
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a foundational tracing mechanism into the LazyLLM framework, enabling developers to gain deeper insights into the execution flow of their LLM applications. By integrating with tracing backends like Langfuse, it provides a proof-of-concept for monitoring and debugging modules and pipelines. The changes include adding global configuration options for tracing, defining a new LazyTracingHook to manage span lifecycles, and enhancing the core call methods of ModuleBase and FlowBase to support these hooks and improve error reporting. This work lays the groundwork for comprehensive observability within LazyLLM. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a tracing mechanism, a significant feature for observability. It integrates with Langfuse using OpenTelemetry. The core changes involve refactoring ModuleBase.__call__ and LazyLLMFlowsBase.__call__ to support hooks with robust error handling via try...except...finally blocks. A new LazyTracingHook is added to capture execution spans, inputs, outputs, and errors. The implementation is well-structured within a new lazyllm/tracing package. My review includes a suggestion to improve thread safety in the new tracing configuration module to prevent potential race conditions.
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
|
可以把 包装成一个函数,内部调用lazyllm.globals;后面用户直接调用函数,相比于直接操作globals,会清晰一些 |
|
"request_tags": ["poc", "simple-rag"],这两个是什么? |
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
wzh1994
left a comment
There was a problem hiding this comment.
Test review body (debug)
wzh1994
left a comment
There was a problem hiding this comment.
PR Summary:
Purpose:
This PR introduces Phase 1 of a tracing mechanism for LazyLLM, adding infrastructure to trace execution of flows and modules via hooks and spans, along with global state support and documentation.
Files/Modules Changed:
lazyllm/common/globals.py: Adds a newtrace={}entry to the global thread-safe dictionary, providing per-thread storage for tracing state.lazyllm/configs.py: Minor reformatting — wraps the config chain in parentheses for cleaner multi-line formatting. No new config keys added in the visible diff.lazyllm/docs/__init__.py: Registers the newtracingdocs module so documentation is initialized alongside other modules.lazyllm/docs/hook.py: Adds documentation (Chinese and English) for several new APIs:LazyLLMHook.on_error(error-handling lifecycle hook),HookPhaseError(exception raised when strict hooks fail during a phase),LazyTracingHook(the concrete tracing hook that creates/updates/finishes spans), and presumably more (diff is truncated).
The actual implementation files for LazyTracingHook, HookPhaseError, and the span/tracing infrastructure are not visible in the truncated diff but are referenced by the documentation.
Key Design Decisions:
- Hook lifecycle extension: The existing
LazyLLMHookbase class gains anon_errorcallback, enabling hooks to react to exceptions — important for marking spans as errored. - Strict vs. non-strict hooks:
HookPhaseErroraggregates failures from multiple strict-mode hooks in a single phase, allowing lenient hooks to fail silently while strict ones propagate errors. This is a pragmatic trade-off between observability reliability and application resilience. - Global thread-local trace state: Using the existing
ThreadSafeDictglobals mechanism for trace context keeps the design consistent with other cross-cutting concerns (user_id, usage, etc.) and avoids introducing a separate context-propagation mechanism. - Phase 1 scope: This appears to be foundational infrastructure (hooks, spans, global state, docs) without yet wiring tracing into all flows/modules, suggesting an incremental rollout.
Potential Risk Areas:
- The truncated diff hides the core tracing implementation — the actual span creation, context propagation, and hook registration logic needs careful review for thread safety, async compatibility, and performance overhead.
- The
trace={}global is a mutable dict default; need to verifyThreadSafeDictproperly deep-copies defaults per thread to avoid cross-thread contamination. HookPhaseErroraggregating multiple exceptions — callers must handle this composite error type correctly, especially in existing error-handling paths that may not expect it.- Performance impact of tracing hooks on hot paths (every flow/module call) should be benchmarked, even when tracing is disabled.
Findings:
- total_issues: 22
- exception: 6
- logic: 6
- type: 4
- safety: 2
- concurrency: 1
- design: 1
- performance: 1
- style: 1
auto reviewed by BOT (claude-opus-4-6)
wzh1994
left a comment
There was a problem hiding this comment.
PR Summary:
目的: 本 PR 为 LazyLLM 框架引入第一阶段的 tracing(链路追踪)机制,通过 hook 体系在 flow/module 执行生命周期中自动创建、更新和结束 tracing span,并将 trace 数据存储在全局线程安全字典中。
变更文件及原因:
lazyllm/common/globals.py:在全局线程安全字典__global_attrs__中新增trace={}字段,用于存储当前线程/请求的 tracing 数据。lazyllm/configs.py:对config链式调用做了括号包裹的格式调整(将整个表达式用()包围),属于代码风格修正,无功能变更。lazyllm/docs/__init__.py:在文档初始化导入列表中增加tracing模块。lazyllm/docs/hook.py:为 hook 体系新增三组中英文文档:LazyLLMHook.on_error(异常处理钩子)、HookPhaseError(hook 阶段错误异常类)、LazyTracingHook(tracing hook)。
关键设计决策:
- 复用 hook 机制: tracing 并非独立子系统,而是通过
LazyTracingHook作为 hook 插件接入现有的pre_hook / post_hook生命周期,设计上保持了扩展性和解耦。 - 引入
on_error钩子: 在原有pre_hook/post_hook/report基础上增加异常处理阶段,使 tracing 能捕获执行失败的 span 状态。 - 引入
HookPhaseError: 区分 strict 模式和非 strict 模式的 hook,strict 模式下 hook 失败会抛出聚合异常(包含所有失败 hook 的信息),这是一个重要的错误传播策略选择。 - 全局
trace字典: 利用已有的ThreadSafeDict机制存储 trace 数据,与usage、chat_history等保持一致的存取模式。
潜在风险点:
- diff 被截断:
LazyTracingHook的实际实现代码未在 diff 中完整展示,需确认 span 的创建/结束逻辑、父子 span 关联、以及与call_stack的交互是否正确。 HookPhaseError的 strict 模式: 需关注哪些 hook 默认是 strict 的——如果 tracing hook 被设为 strict,其自身失败可能中断业务流程。trace字典的生命周期管理: 需确认在请求结束后 trace 数据是否被正确清理,避免内存泄漏。configs.py的括号重构: 虽然是纯格式变更,但链式调用较长,需确认结尾括号匹配无误。
建议:
- config = _NamespaceConfig().add('mode', ...
+ config = (_NamespaceConfig().add('mode', ...
+ ))configs.py 中的括号包裹改动建议在 CI 中确认 config 对象的所有属性访问仍正常工作,避免因运算符优先级变化引入隐蔽问题。
Findings:
- total_issues: 13
- style: 4
- maintainability: 3
- logic: 2
- concurrency: 1
- design: 1
- safety: 1
- type: 1
auto reviewed by BOT (claude-opus-4-6)
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
| 'LazyTracingHook', | ||
|
|
||
| # tracing | ||
| 'TracingSetupError', |
There was a problem hiding this comment.
tracing相关的能力不在tools里面吧,这样确定能import到么
There was a problem hiding this comment.
修改为显示 import,其他不需要的部分删掉了
lazyllm/hook.py
Outdated
|
|
||
|
|
||
| def resolve_default_hooks(obj): | ||
| trace_cfg = globals.get('trace', {}) |
lazyllm/flow/flow.py
Outdated
| self._sync = False | ||
| self._hooks = set() | ||
| self._hooks = [] | ||
| register_hooks(self, resolve_default_hooks(self)) |
There was a problem hiding this comment.
做成hook意味着模块在 “最上层” 。应该是在“trace”里面判断标志位,然后注册给flow和module;或者放把对应的cls到一个公共空间,每次实例化flow和module的时候从这个公共空间里面取cls,然后注册进去,而不是反过来在flow里面判断要注册哪些hook
There was a problem hiding this comment.
已修改。
- 在 hook.py 增加了全局 provider registry,统一管理所有 hook provider(方便后续扩展其他类型的 hook)
- 在 tracing/hook.py 里注册 tracing provider(根据对象动态判断是否要注册 tracing hook,返回 [] 或者 [LazyTracingHook])
- flow.py/module.py 里调用 hook.py 里的
resolve_builtin_hooks()函数,而不是由 flow/module 判断是否要注册 tracing hook
| LOG.warning('Flow on_error hook failed', exc_info=True) | ||
| raise | ||
| else: | ||
| run_hooks(hook_objs, 'post_hook', r) |
There was a problem hiding this comment.
post_hook是在else还是finally,如果在else,那么需要释放资源的时候,一旦发生了异常,资源就无法释放了
There was a problem hiding this comment.
post_hook应该放在 else,处理成功路径下的后处理(比如结束span的记录)。
释放资源、clean up 之类的操作应该放在 report 里,在 finally 里强制执行(不论执行成功与否)。
在语义上,应该区分 hook 里的 post_hook 和 report,后续有其他 hook,也应该在 report 函数里执行资源的释放。
为了防止后续歧义,把 report 函数改名为 finalize 了。
lazyllm/module/module.py
Outdated
| self._use_cache: Union[bool, str] = False | ||
| self._hooks = set() | ||
| self._hooks = [] | ||
| register_hooks(self, resolve_default_hooks(self)) |
| LOG.warning('Module on_error hook failed', exc_info=True) | ||
| raise err from None | ||
| else: | ||
| run_hooks(hook_objs, 'post_hook', r) |
lazyllm/hook.py
Outdated
| except StopIteration: pass | ||
|
|
||
|
|
||
| class LazyTracingHook(LazyLLMHook): |
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>
Signed-off-by: Mustafa974 <[email protected]>

📌 PR 内容 / PR Description
本 PR 引入 LazyLLM tracing 机制的 Phase 1,实现最小可用闭环,并补充了一轮稳定性与可维护性修正。
核心目标:
✅ 主要变更 / Main Changes
1. 新增 tracing 基础设施
lazyllm/tracing/包,提供 tracing runtime、配置与 backend 抽象TracingBackend抽象接口和LangfuseBackend实现trace_enabledtrace_backendtrace_content_enabledglobals['trace']作为当前请求 / 执行上下文的 tracing 状态载体2. Hook 体系增强
LazyLLMHook增加on_error生命周期LazyTracingHook,在pre_hook / post_hook / on_error / report生命周期中维护 spanprepare_hooksregister_hooksresolve_default_hooksrun_hooksHookPhaseError,统一 hook phase 失败时的异常表达3. Flow / Module tracing 接入
LazyLLMFlowsBase.__call__ModuleBase.__call__4. 稳定性修正
post_hook异常被误判为主业务异常on_error/reporthook 覆盖原始业务异常ignore调整为warn,语义更清晰5. 文档与依赖
LazyTracingHook生命周期方法文档:pre_hookpost_hookon_errorreportlangfuseopentelemetry-apiopentelemetry-sdkopentelemetry-exporter-otlp-proto-http✅ 变更类型 / Type of Change
⚡ 更新后的用法示例 / Usage After Update