feat: optimize context compression and enhance monitoring dashboard#1550
Open
Uygniqoar wants to merge 1 commit intomofa-org:feature/mofa-rsfrom
Open
feat: optimize context compression and enhance monitoring dashboard#1550Uygniqoar wants to merge 1 commit intomofa-org:feature/mofa-rsfrom
Uygniqoar wants to merge 1 commit intomofa-org:feature/mofa-rsfrom
Conversation
- Optimize SlidingWindowCompressor to use token-based adaptive truncation. - Refactor ReAct Agent compression to maximize token budget utilization. - Implement MetricsCollectorTrait and enhance LLMMetrics dimensions. - Add ReAct Agent trace and tool latency histogram components (Leptos). - Update embedded dashboard assets for real-time ReAct and token monitoring. Closes mofa-org#1549
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Optimize context compression and enhance monitoring dashboard
📋 Summary
This PR addresses critical issues in the context compression logic and significantly improves the observability of the ReAct Agent and LLM token usage. By moving from simple message counting to token-aware adaptive truncation, we ensure that agents can handle long-running conversations without exceeding model context limits. Additionally, the enhanced monitoring dashboard provides granular insights into agent reasoning steps and tool performance distribution.
🔗 Related Issues
Closes #1549
🧠 Context
Previously, the
SlidingWindowCompressoronly considered the number of messages, which often led to context overflow when messages were large. TheReActAgentalso lacked deep observability into its internal reasoning steps and the performance of tool calls. These changes are necessary to provide a more robust and transparent agentic experience, preventing unexpected model errors and providing better diagnostic data.🛠️ Changes
SlidingWindowCompressorto implement a "backwards-filling" strategy based on token budget, ensuring the most recent context is preserved while strictly adhering tomax_tokens.ReActAgentinternal compression to prioritize keeping the initial task description and recent reasoning steps.MetricsCollectorTraitto standardize the metrics reporting pipeline.LLMMetricswith new dimensions:avg_tokens_per_call,prompt_tokens,completion_tokens, anderror_rate.ReactAgentMetricsfor real-time trace monitoring andToolLatencyChartfor tool execution latency distribution.INDEX_HTML,APP_JS) to support the visualization of these new metrics.🧪 How you Tested
cargo testformofa-foundationto verify the new token-based truncation logic handles both small and oversized messages correctly.ReActAgentcompression during a simulated 20-step conversation to ensure it doesn't crash on context limits.🧹 Checklist
Code Quality
cargo fmtruncargo clippypasses without warningsTesting
cargo testpasses locally without any errorDocumentation
PR Hygiene
main