Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
1220572
feat: check nodes existence
Nyakult Jul 25, 2025
1396919
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 25, 2025
85b89bb
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 28, 2025
c8c1488
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 28, 2025
8baf5c6
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 29, 2025
9982782
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 30, 2025
f3dd6e7
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 30, 2025
4471790
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
0f9ccd4
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
27196ef
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
70d0a4a
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 5, 2025
5dd9662
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 6, 2025
27203e7
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 6, 2025
b2cd7f0
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 8, 2025
6fa6af7
feat: use different template for different language input
Nyakult Aug 8, 2025
b641c51
feat: use different template for different language input
Nyakult Aug 8, 2025
9f5aca1
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 12, 2025
5eafce4
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 13, 2025
5c2e637
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 18, 2025
332bab6
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 26, 2025
d3dca58
fix: eval script
Nyakult Aug 26, 2025
45cee24
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 27, 2025
b1b448e
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 1, 2025
ffb034e
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 3, 2025
4551297
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 4, 2025
204b545
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 5, 2025
298d155
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 25, 2025
84d421d
feat: memos-api eval scripts
Nyakult Sep 25, 2025
6956cf0
Merge remote-tracking branch 'upstream/main' into eval/0925
Nyakult Sep 25, 2025
6a465f4
feat: mem reader
Nyakult Sep 25, 2025
bcfa7c9
feat: 实现äºprefeval memos-api evaluation scripts
2Rant Sep 25, 2025
eac7984
Merge pull request #2 from 2Rant/prefeval
Nyakult Sep 25, 2025
035d1a1
refactor:format code
Nyakult Sep 25, 2025
9c2ab81
feat: add PersonaMem eval scripts
Nyakult Sep 25, 2025
92b78c1
docs(evaluation): update PersonaMem eval readme
Nyakult Sep 25, 2025
82fecee
feat:memos-api ingest batch message
Nyakult Sep 25, 2025
1ca5ead
feat: refactor search
Nyakult Sep 28, 2025
405a162
feat: refactor search
Nyakult Sep 28, 2025
7628da8
update: add api for memory
fridayL Sep 28, 2025
b12db2f
Merge pull request #4 from fridayL/searchupdate
Nyakult Sep 28, 2025
235125a
feat: add memory api return memory and memory type
Nyakult Sep 28, 2025
81bc1f6
refactor(server):重构服务器路由模块以优化内存管理
Nyakult Sep 28, 2025
71f357a
format: ruff format code
Nyakult Sep 29, 2025
c04ed79
feat(server): 增加LLM最大令牌数
Nyakult Sep 29, 2025
aaa5d18
test
Nyakult Sep 29, 2025
880f60c
fix: user query embedding for search
Nyakult Sep 29, 2025
cce9f6c
count memory_size by user
Nyakult Sep 29, 2025
9b46589
fix(server):修复记忆读取逻辑中的列表展开问题
Nyakult Sep 29, 2025
9a45f60
feat(nebular):优化图数据库查询性能
Nyakult Oct 14, 2025
0887a6b
Merge branch 'feat/search1' into eval/0929-test
Nyakult Oct 15, 2025
e53e810
refactor(memory):
Nyakult Oct 15, 2025
4936907
feat: remove user idx_memory_user_name
Nyakult Oct 15, 2025
c74fe37
Merge branch 'feat/search1' into eval/0929-test
Nyakult Oct 15, 2025
d51885b
feat(graph):优化Nebula图数据库查询性能
Nyakult Oct 15, 2025
50163bb
Merge branch 'feat/search1' into eval/0929-test
Nyakult Oct 15, 2025
1e5021b
feat: rollback remove_oldest_memory
Nyakult Oct 15, 2025
b69a35e
Merge remote-tracking branch 'upstream/test' into eval/0929-test
Nyakult Oct 15, 2025
b300cd2
Merge remote-tracking branch 'upstream/test' into feat/search1
Nyakult Oct 15, 2025
2bfde7d
feat:nebula gql add index
Nyakult Oct 15, 2025
3499003
Merge branch 'feat/search1' into eval/0929-test
Nyakult Oct 15, 2025
07751cd
feat: align code
Nyakult Oct 15, 2025
0857bb1
Merge branch 'feat/search1' into eval/0929-test
Nyakult Oct 15, 2025
756550d
feat: update memos_api
Nyakult Oct 16, 2025
59ceadf
feat: update memos_api
Nyakult Oct 16, 2025
1567987
feat: 更新默认选项
Nyakult Oct 16, 2025
1edfebe
feat:memory client
Nyakult Oct 16, 2025
18a63e2
feat:refactor lme
Nyakult Oct 16, 2025
de11c9b
feat: memu & supermemory client
Nyakult Oct 16, 2025
279c4d9
feat: locomo memu
Nyakult Oct 17, 2025
55515c3
feat: locomo supermemory
Nyakult Oct 17, 2025
b810bd9
New 'add' and 'process' modes.
2Rant Oct 17, 2025
4980c62
Merge pull request #5 from 2Rant/eval/0929-test
Nyakult Oct 17, 2025
94c6661
feat: lme supermemory & memu
Nyakult Oct 17, 2025
f32dabd
feat: default args
Nyakult Oct 17, 2025
9ef548e
api and local
2Rant Oct 17, 2025
043d260
api and local
2Rant Oct 17, 2025
5c66068
memobase fix
Nyakult Oct 20, 2025
ccc865e
Merge remote-tracking branch 'upstream/test' into eval/0929-test
Nyakult Oct 20, 2025
7715343
memos fix
Nyakult Oct 20, 2025
f19af21
default args
Nyakult Oct 20, 2025
e9fa1ed
fix memos-api search data
Nyakult Oct 20, 2025
496387f
Merge pull request #6 from 2Rant/eval/0929-test
Nyakult Oct 20, 2025
8babcad
Merge remote-tracking branch 'upstream/dev' into eval/0929-test
Nyakult Oct 20, 2025
9b0e7ef
prefeval pipeline
2Rant Oct 20, 2025
b1f8d4d
fix lme memos-api
Nyakult Oct 21, 2025
697fc60
Merge pull request #7 from 2Rant/eval/1020
Nyakult Oct 21, 2025
627ee18
personamem pipeline
2Rant Oct 21, 2025
23404c8
personamem pipeline
2Rant Oct 21, 2025
80240dd
Merge pull request #8 from 2Rant/eval/1020
Nyakult Oct 21, 2025
750afad
lme scrips
Nyakult Oct 21, 2025
c2c9246
Merge remote-tracking branch 'upstream/dev' into eval/0929-test
Nyakult Oct 21, 2025
1b03c14
align dev
Nyakult Oct 21, 2025
78f0e99
format code
Nyakult Oct 21, 2025
5ed5d56
refactor: remove old files
Nyakult Oct 21, 2025
cdd9447
format code
Nyakult Oct 21, 2025
6109d96
pm and prefeval pipeline
2Rant Oct 21, 2025
4af98e6
format code
Nyakult Oct 21, 2025
b703fa8
pm and prefeval pipeline
2Rant Oct 21, 2025
28ecba5
format code
Nyakult Oct 21, 2025
a2e7b02
pm and prefeval pipeline
2Rant Oct 21, 2025
21d9d37
pm and prefeval pipeline
2Rant Oct 21, 2025
35963cf
pm and prefeval pipeline
2Rant Oct 21, 2025
67b6eec
Merge pull request #11 from 2Rant/eval/1020
Nyakult Oct 21, 2025
5385eba
pm and prefeval pipeline
2Rant Oct 21, 2025
d58cd0d
format code
Nyakult Oct 21, 2025
f7a229f
format code
Nyakult Oct 21, 2025
7f03ff5
pref pipeline
2Rant Oct 22, 2025
023897d
add search response mode
2Rant Oct 22, 2025
7ee470a
add search response mode
2Rant Oct 22, 2025
e9b7fff
add search response mode
2Rant Oct 22, 2025
25746a2
Merge pull request #13 from 2Rant/eval/1020
Nyakult Oct 22, 2025
c9acde2
update readme and example
Nyakult Oct 22, 2025
6c98a62
update mem0 api
Nyakult Oct 22, 2025
e42fb9d
pm mem0
2Rant Oct 22, 2025
6745310
Merge branch 'eval/0929-test' of https://github.com/Nyakult/MemOS int…
2Rant Oct 22, 2025
d7146f5
fix MEMOBASE api
Nyakult Oct 23, 2025
0fd983f
update pm and prefeval pipepline for frames
2Rant Oct 23, 2025
531d4ab
update pm and prefeval readme
2Rant Oct 23, 2025
f07863f
Merge pull request #14 from 2Rant/eval/1020
Nyakult Oct 23, 2025
5450882
format code
Nyakult Oct 23, 2025
8363c2f
fix memobase api
Nyakult Oct 23, 2025
0a369ca
fix memobase api
Nyakult Oct 23, 2025
d92e021
Merge branch 'dev' into eval/0929-test
Nyakult Oct 23, 2025
e6c90ed
format code
Nyakult Oct 23, 2025
6efc45f
format code
Nyakult Oct 23, 2025
2cca27a
fix format
Nyakult Oct 24, 2025
5b90421
fix format
Nyakult Oct 24, 2025
5a5dfc9
fix format
Nyakult Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions evaluation/.env-example
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,28 @@ MODEL="gpt-4o-mini"
OPENAI_API_KEY="sk-***REDACTED***"
OPENAI_BASE_URL="http://***.***.***.***:3000/v1"

MEM0_API_KEY="m0-***REDACTED***"

ZEP_API_KEY="z_***REDACTED***"

# response model
CHAT_MODEL="gpt-4o-mini"
CHAT_MODEL_BASE_URL="http://***.***.***.***:3000/v1"
CHAT_MODEL_API_KEY="sk-***REDACTED***"

# memos
MEMOS_KEY="Token mpg-xxxxx"
MEMOS_URL="https://apigw-pre.memtensor.cn/api/openmem/v1"
PRE_SPLIT_CHUNK=false # pre split chunk in client end
MEMOS_URL="http://127.0.0.1:8001"
MEMOS_ONLINE_URL="https://memos.memtensor.cn/api/openmem/v1"

# other memory agents
MEM0_API_KEY="m0-xxx"
ZEP_API_KEY="z_xxx"
MEMU_API_KEY="mu_xxx"
SUPERMEMORY_API_KEY="sm_xxx"
MEMOBASE_API_KEY="xxx"
MEMOBASE_PROJECT_URL="http://***.***.***.***:8019"

# eval settings
PRE_SPLIT_CHUNK=false

MEMOBASE_API_KEY="xxxxx"
MEMOBASE_PROJECT_URL="http://xxx.xxx.xxx.xxx:8019"

# Configuration Only For Scheduler
# RabbitMQ Configuration
Expand Down
21 changes: 19 additions & 2 deletions evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,14 @@ This repository provides tools and scripts for evaluating the LoCoMo dataset usi

2. Copy the `configs-example/` directory to a new directory named `configs/`, and modify the configuration files inside it as needed. This directory contains model and API-specific settings.

## Setup MemOS
```bash
#start server
uvicorn memos.api.server_api:app --host 0.0.0.0 --port 8001 --workers 8

# modify .env file
MEMOS_URL="http://127.0.0.1:8001"
```
## Evaluation Scripts

### LoCoMo Evaluation
Expand All @@ -45,10 +52,20 @@ First prepare the dataset `longmemeval_s` from https://huggingface.co/datasets/x
./scripts/run_lme_eval.sh
```

### prefEval Evaluation
### PrefEval Evaluation
To evaluate the **Prefeval** dataset using one of the supported memory frameworks — `memos`, `mem0`, or `zep` — run the following [script](./scripts/run_prefeval_eval.sh):

### personaMem Evaluation
```bash
# Edit the configuration in ./scripts/run_prefeval_eval.sh
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
./scripts/run_prefeval_eval.sh
```

### PersonaMem Evaluation
get `questions_32k.csv` and `shared_contexts_32k.jsonl` from https://huggingface.co/datasets/bowen-upenn/PersonaMem and save them at `data/personamem/`
```bash
# Edit the configuration in ./scripts/run_pm_eval.sh
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
# If you want to use MIRIX, edit the the configuration in ./scripts/personamem/config.yaml
./scripts/run_pm_eval.sh
```
46 changes: 31 additions & 15 deletions evaluation/scripts/PrefEval/pref_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,6 @@
API_KEY = os.getenv("OPENAI_API_KEY")
API_URL = os.getenv("OPENAI_BASE_URL")

INPUT_FILE = "./results/prefeval/pref_memos_process.jsonl"
OUTPUT_FILE = "./results/prefeval/eval_pref_memos.jsonl"
OUTPUT_EXCEL_FILE = "./results/prefeval/eval_pref_memos_summary.xlsx"


async def call_gpt4o_mini_async(client: OpenAI, prompt: str) -> str:
messages = [{"role": "user", "content": prompt}]
Expand Down Expand Up @@ -255,9 +251,10 @@ def generate_excel_summary(
avg_search_time: float,
avg_context_tokens: float,
avg_add_time: float,
output_excel_file: str,
model_name: str = "gpt-4o-mini",
):
print(f"Generating Excel summary at {OUTPUT_EXCEL_FILE}...")
print(f"Generating Excel summary at {output_excel_file}...")

def get_pct(key):
return summary_results.get(key, {}).get("percentage", 0)
Expand All @@ -282,7 +279,7 @@ def get_pct(key):

df = pd.DataFrame(data)

with pd.ExcelWriter(OUTPUT_EXCEL_FILE, engine="xlsxwriter") as writer:
with pd.ExcelWriter(output_excel_file, engine="xlsxwriter") as writer:
df.to_excel(writer, index=False, sheet_name="Summary")

workbook = writer.book
Expand All @@ -300,10 +297,10 @@ def get_pct(key):
bold_pct_format = workbook.add_format({"num_format": "0.0%", "bold": True})
worksheet.set_column("F:F", 18, bold_pct_format)

print(f"Successfully saved summary to {OUTPUT_EXCEL_FILE}")
print(f"Successfully saved summary to {output_excel_file}")


async def main(concurrency_limit: int):
async def main(concurrency_limit: int, input_file: str, output_file: str, output_excel_file: str):
semaphore = asyncio.Semaphore(concurrency_limit)
error_counter = Counter()

Expand All @@ -313,17 +310,17 @@ async def main(concurrency_limit: int):
total_add_time = 0

print(f"Starting evaluation with a concurrency limit of {concurrency_limit}...")
print(f"Input file: {INPUT_FILE}")
print(f"Output JSONL: {OUTPUT_FILE}")
print(f"Output Excel: {OUTPUT_EXCEL_FILE}")
print(f"Input file: {input_file}")
print(f"Output JSONL: {output_file}")
print(f"Output Excel: {output_excel_file}")

client = OpenAI(api_key=API_KEY, base_url=API_URL)

try:
with open(INPUT_FILE, "r", encoding="utf-8") as f:
with open(input_file, "r", encoding="utf-8") as f:
lines = f.readlines()
except FileNotFoundError:
print(f"Error: Input file not found at '{INPUT_FILE}'")
print(f"Error: Input file not found at '{input_file}'")
return

if not lines:
Expand All @@ -332,7 +329,7 @@ async def main(concurrency_limit: int):

tasks = [process_line(line, client, semaphore) for line in lines]

with open(OUTPUT_FILE, "w", encoding="utf-8") as outfile:
with open(output_file, "w", encoding="utf-8") as outfile:
pbar = tqdm(
asyncio.as_completed(tasks),
total=len(tasks),
Expand Down Expand Up @@ -382,13 +379,19 @@ async def main(concurrency_limit: int):
avg_search_time,
avg_context_tokens,
avg_add_time,
output_excel_file,
)
except Exception as e:
print(f"\nFailed to generate Excel file: {e}")


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Evaluate assistant responses from a JSONL file.")

parser.add_argument(
"--input", type=str, required=True, help="Path to the input JSONL file from pref_memos.py."
)

parser.add_argument(
"--concurrency-limit",
type=int,
Expand All @@ -397,4 +400,17 @@ async def main(concurrency_limit: int):
)
args = parser.parse_args()

asyncio.run(main(concurrency_limit=args.concurrency_limit))
input_path = args.input
output_dir = os.path.dirname(input_path)

output_jsonl_path = os.path.join(output_dir, "eval_pref_memos.jsonl")
output_excel_path = os.path.join(output_dir, "eval_pref_memos_summary.xlsx")

asyncio.run(
main(
concurrency_limit=args.concurrency_limit,
input_file=input_path,
output_file=output_jsonl_path,
output_excel_file=output_excel_path,
)
)
Loading