Skip to content

feat: optimize context compression and enhance monitoring dashboard#1550

Open
Uygniqoar wants to merge 1 commit intomofa-org:feature/mofa-rsfrom
Uygniqoar:feature/mofa-rs
Open

feat: optimize context compression and enhance monitoring dashboard#1550
Uygniqoar wants to merge 1 commit intomofa-org:feature/mofa-rsfrom
Uygniqoar:feature/mofa-rs

Conversation

@Uygniqoar
Copy link
Copy Markdown

@Uygniqoar Uygniqoar commented Mar 30, 2026

feat: Optimize context compression and enhance monitoring dashboard

📋 Summary

This PR addresses critical issues in the context compression logic and significantly improves the observability of the ReAct Agent and LLM token usage. By moving from simple message counting to token-aware adaptive truncation, we ensure that agents can handle long-running conversations without exceeding model context limits. Additionally, the enhanced monitoring dashboard provides granular insights into agent reasoning steps and tool performance distribution.

🔗 Related Issues

Closes #1549


🧠 Context

Previously, the SlidingWindowCompressor only considered the number of messages, which often led to context overflow when messages were large. The ReActAgent also lacked deep observability into its internal reasoning steps and the performance of tool calls. These changes are necessary to provide a more robust and transparent agentic experience, preventing unexpected model errors and providing better diagnostic data.


🛠️ Changes

  • Context Compression Optimization:
    • Refactored SlidingWindowCompressor to implement a "backwards-filling" strategy based on token budget, ensuring the most recent context is preserved while strictly adhering to max_tokens.
    • Updated ReActAgent internal compression to prioritize keeping the initial task description and recent reasoning steps.
  • Monitoring & Observability Enhancement:
    • Introduced MetricsCollectorTrait to standardize the metrics reporting pipeline.
    • Enhanced LLMMetrics with new dimensions: avg_tokens_per_call, prompt_tokens, completion_tokens, and error_rate.
    • Developed new Leptos dashboard components: ReactAgentMetrics for real-time trace monitoring and ToolLatencyChart for tool execution latency distribution.
    • Updated embedded dashboard assets (INDEX_HTML, APP_JS) to support the visualization of these new metrics.

🧪 How you Tested

  1. Unit Testing: Ran cargo test for mofa-foundation to verify the new token-based truncation logic handles both small and oversized messages correctly.
  2. Integration Testing: Verified the ReActAgent compression during a simulated 20-step conversation to ensure it doesn't crash on context limits.
  3. UI Verification: Inspected the updated dashboard via the gateway to confirm that ReAct active traces and token usage trends are correctly updated via WebSockets.

⚠️ Breaking Changes

  • No breaking changes

🧹 Checklist

Code Quality

  • Code follows Rust idioms and project conventions
  • cargo fmt run
  • cargo clippy passes without warnings

Testing

  • Tests added/updated
  • cargo test passes locally without any error

Documentation

  • Public APIs documented

PR Hygiene

  • PR is small and focused (one logical change)
  • Branch is up to date with main
  • Commit messages explain why, not only what

- Optimize SlidingWindowCompressor to use token-based adaptive truncation.
- Refactor ReAct Agent compression to maximize token budget utilization.
- Implement MetricsCollectorTrait and enhance LLMMetrics dimensions.
- Add ReAct Agent trace and tool latency histogram components (Leptos).
- Update embedded dashboard assets for real-time ReAct and token monitoring.

Closes mofa-org#1549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant