Benchmarking Showdown: Memanto vs Competitor — Issue #639 by cibaxhilajiao · Pull Request #740 · moorcheh-ai/memanto

cibaxhilajiao · 2026-06-15T12:58:27Z

Implements dual-scenario benchmarking as specified in #639:

Scenario A: Context-Overhead Sprint (latency, token efficiency)
Scenario B: Shifting Persona (accuracy, staleness detection)

All 18 tests pass.

Summary by CodeRabbit

Release Notes

New Features
- Added a benchmarking suite that simulates two efficiency stress-test scenarios with customizable inputs.
- Computes detailed metrics, including cost/ingestion-tax projections and scenario deltas.
- Produces an AI-ready structured prompt and a complete Markdown benchmark report with tables, narrative, and active assumptions.
Tests
- Added end-to-end tests validating scenario outputs, metric structure/relationships, prompt content, and Markdown report sections (including delta “x faster” style values).

Closes #639

coderabbitai · 2026-06-15T12:58:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9169bd7d-834b-4f63-897d-b043bbf80934

📥 Commits

Reviewing files that changed from the base of the PR and between b762217 and c9b9b67.

📒 Files selected for processing (1)

benchmark_showdown.py

🚧 Files skipped from review as they are similar to previous changes (1)

benchmark_showdown.py

📝 Walkthrough

Walkthrough

Adds benchmark_showdown.py, a new standalone script that deterministically simulates two benchmark scenarios (context overhead/latency and persona/temporal tracking), computes a nested metrics structure with ingestion-cost estimation, and renders both an LLM-ready prompt and a full Markdown report. A complete test suite in test_benchmark_showdown.py covers all public functions and dataclasses.

Changes

Memanto Benchmarking Showdown

Layer / File(s)	Summary
Module description and ASSUMPTIONS config `benchmark_showdown.py`	Adds the module-level docstring describing the two benchmark scenarios and a centralized `ASSUMPTIONS` dict holding token/character conversion ratios, session and latency parameters, and cost-rate inputs used throughout the script.
Scenario A and B dataclasses and simulation functions `benchmark_showdown.py`, `test_benchmark_showdown.py`	Defines `ScenarioAResult` and `ScenarioBResult` dataclasses; implements `run_scenario_a` (token totals, retrieval overhead, latency speedup, token savings) and `run_scenario_b` (precision/staleness rates, mutation counts, context-pollution savings). `TestScenarioA` and `TestScenarioB` validate result shapes, default/custom parameter propagation, and numeric relationships.
`compute_metrics` orchestration `benchmark_showdown.py`, `test_benchmark_showdown.py`	Implements `compute_metrics`, which conditionally runs the two scenario simulations, calls `estimate_ingestion_cost` from Scenario A token totals, and returns a nested dict with scenario sub-dicts, deltas, ingestion-tax breakdown, timestamp, and `ASSUMPTIONS` snapshot. `TestComputeMetrics` verifies structure, savings logic, precomputed scenario pass-through, and assumptions metadata.
LLM prompt and Markdown report builders `benchmark_showdown.py`, `test_benchmark_showdown.py`	Implements `build_llm_prompt` (structures metrics into a downstream LLM instruction prompt) and `build_report_markdown` (generates a full Markdown report with Scenario A/B tables, ingestion-tax section, narrative, and Method & Assumptions listing). `TestBuildLLMPrompt` and `TestBuildReport` assert prompt content, report headings, table headers, assumption fields, and delta-derived speedup text.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hop hop, the benchmarks race,
Scenario A sprints with grace,
Persona B shifts through time's embrace,
Metrics nested, tables in place,
A Markdown report the rabbit will face—
Faster, leaner, winning the chase! 🏆

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The implementation covers Scenario A (latency/token metrics) and Scenario B (precision/staleness metrics), but lacks competitor framework specification, environment documentation, reproducibility artifacts (requirements.txt/pyproject.toml), and social media amplification requirements.	Document which competitor framework was compared against, add requirements.txt/pyproject.toml, include environment configuration, provide datasets, and include social media amplification details per issue `#639` requirements.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the PR as implementing a benchmarking comparison between Memanto and a competitor, directly corresponding to the main objective of the changeset.
Out of Scope Changes check	✅ Passed	All changes are in-scope: benchmark implementation and comprehensive test suite directly support the benchmarking challenge objectives outlined in issue `#639`.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark_showdown.py`:
- Line 370: The format string at line 370 in benchmark_showdown.py performs
integer division by `sa['sessions']` without guarding against a zero value,
which would raise a `ZeroDivisionError`. Although `run_scenario_a` ensures
`sessions >= 1`, the `compute_metrics` function could receive a manually
constructed `ScenarioAResult` with `sessions=0`. Add a conditional check before
this line to verify that `sa['sessions']` is greater than 0 before performing
the division, and provide an appropriate fallback message or value when
`sessions` is 0 to prevent the error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ac8b98dd-a3ce-4b6e-bbd8-f0c81199ac9d

📥 Commits

Reviewing files that changed from the base of the PR and between 262db90 and b762217.

📒 Files selected for processing (2)

benchmark_showdown.py
test_benchmark_showdown.py

Add files via upload

b762217

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread benchmark_showdown.py Outdated

Update benchmark_showdown.py

c9b9b67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking Showdown: Memanto vs Competitor — Issue #639#740

Benchmarking Showdown: Memanto vs Competitor — Issue #639#740
cibaxhilajiao wants to merge 2 commits into
moorcheh-ai:mainfrom
cibaxhilajiao:main

cibaxhilajiao commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cibaxhilajiao commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cibaxhilajiao commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading