Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions examples/benchmarks/temporal-memory-benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Temporal Memory Benchmark (Illustrative Framework)

This benchmark provides a framework to evaluate `memanto` against a baseline vector database approach for handling temporal reasoning and memory tasks.

> [!NOTE]
> **Disclaimer:** The metrics and script provided below currently serve as **illustrative placeholders and examples** demonstrating how the benchmarking pipeline is structured. A real dataset and active API integration are required to generate live metrics.

## Metrics

We measure three primary dimensions:
1. **P95 Latency**: Total time taken to retrieve information.
2. **Token Efficiency**: The footprint of LLM tokens consumed during retrieval.
3. **Retrieval Accuracy**: The percentage of successfully recalled temporal facts.

## Example Results (Simulated Placeholder Data)

| Metric | Memanto | Baseline Vector DB |
| --- | --- | --- |
| Accuracy | 96% | 68% |
| Token Usage | 450 | 15000 |
| P95 Latency | ~0.06s | ~0.9s |

## How to reproduce the simulated framework

```bash
python benchmark.py
```
26 changes: 26 additions & 0 deletions examples/benchmarks/temporal-memory-benchmark/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import time
import random

def benchmark_memanto():
print("Starting Memanto Benchmark (Illustrative Framework)...")
start_time = time.time()

# NOTE: This is a simulated placeholder framework.
# Real measurements require loading a temporal dataset and integrating actual API calls.
time.sleep(0.5)

# Generate illustrative placeholder metrics
latency_memanto = 0.05 + random.uniform(0, 0.02)
latency_baseline = 0.8 + random.uniform(0, 0.2)

tokens_memanto = 450
tokens_baseline = 15000

print(f"Memanto Average Latency: {latency_memanto:.3f}s")
print(f"Baseline Average Latency: {latency_baseline:.3f}s")
print(f"Memanto Token Footprint: {tokens_memanto}")
print(f"Baseline Token Footprint: {tokens_baseline}")
print("Benchmark completed.")

if __name__ == "__main__":
benchmark_memanto()