Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
1220572
feat: check nodes existence
Nyakult Jul 25, 2025
1396919
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 25, 2025
85b89bb
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 28, 2025
c8c1488
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 28, 2025
8baf5c6
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 29, 2025
9982782
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 30, 2025
f3dd6e7
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 30, 2025
4471790
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
0f9ccd4
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
27196ef
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 1, 2025
70d0a4a
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 5, 2025
5dd9662
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 6, 2025
27203e7
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 6, 2025
b2cd7f0
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 8, 2025
6fa6af7
feat: use different template for different language input
Nyakult Aug 8, 2025
b641c51
feat: use different template for different language input
Nyakult Aug 8, 2025
9f5aca1
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 12, 2025
5eafce4
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 13, 2025
5c2e637
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 18, 2025
332bab6
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 26, 2025
d3dca58
fix: eval script
Nyakult Aug 26, 2025
45cee24
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Aug 27, 2025
b1b448e
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 1, 2025
ffb034e
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 3, 2025
4551297
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 4, 2025
204b545
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Sep 5, 2025
fca2542
mirix funcs
Nyakult Sep 5, 2025
8a96a79
mirix funcs
Nyakult Sep 5, 2025
ce69868
mirix funcs
Nyakult Sep 5, 2025
56c685d
mirix funcs
Nyakult Sep 8, 2025
f0adff6
mirix funcs
Nyakult Sep 9, 2025
837456f
mirix funcs
Nyakult Sep 9, 2025
0318c5b
mirix funcs
Nyakult Sep 9, 2025
5e8a52c
mirix funcs
Nyakult Sep 9, 2025
3a0de05
mirix funcs
Nyakult Sep 10, 2025
12a4945
mirix funcs
Nyakult Sep 10, 2025
eac1032
mirix funcs
Nyakult Sep 10, 2025
3eacbd3
mirix funcs
Nyakult Sep 11, 2025
2fe7351
mirix funcs
Nyakult Sep 11, 2025
03313e1
mirix funcs
Nyakult Sep 11, 2025
53464d4
mirix funcs
Nyakult Sep 11, 2025
fa9da46
mirix funcs
Nyakult Sep 11, 2025
9bfb9b9
mirix funcs
Nyakult Sep 16, 2025
00593da
mirix funcs
Nyakult Sep 16, 2025
9723ea3
mirix funcs
Nyakult Sep 16, 2025
8bcba1d
mirix funcs
Nyakult Sep 16, 2025
c3489cc
mirix funcs
Nyakult Sep 17, 2025
2a9c356
mirix funcs
Nyakult Sep 18, 2025
eeb0f09
Merge remote-tracking branch 'upstream/test' into eval/0910
Nyakult Sep 19, 2025
15b6a3c
```
Nyakult Sep 23, 2025
b05aa8c
feat(evaluation): 更新LongMemEval评估配置与脚本- 添加MEMOS_KEY和MEMOS_URL到.env-ex…
Nyakult Sep 24, 2025
7d4a8ea
Merge remote-tracking branch 'upstream/test' into eval/0910
Nyakult Sep 24, 2025
dbc59ab
refactor(locomo):重构路径导入和内存处理逻辑
Nyakult Sep 24, 2025
8af0457
feat(evaluation): 更新环境配置和客户端初始化逻辑
Nyakult Sep 24, 2025
f8776da
Merge remote-tracking branch 'upstream/test' into eval/0910
Nyakult Sep 24, 2025
514a6f6
feat(evaluation): 更新 longmemeval 和 locomo评估脚本- 将搜索结果字段从 openmem_searc…
Nyakult Sep 24, 2025
facc15d
refactor: ruff format code
Nyakult Sep 24, 2025
58d434c
chore: bump version to v1.1.0 (#340)
fridayL Sep 24, 2025
06d4250
change version to 1.1.0
fridayL Sep 24, 2025
8db7584
Merge branch 'main' into dev
fridayL Sep 24, 2025
c312757
chore: bump version to v1.1.0 (#345)
fridayL Sep 24, 2025
f72869c
change: version to v1.1.1
fridayL Sep 24, 2025
da0617d
Merge branch 'main' into dev
fridayL Sep 24, 2025
07a0197
feat: add api client (#316)
CarltonXiang Sep 19, 2025
a5ad654
docker start (#324)
pursues Sep 23, 2025
2411836
feat: api client (#334)
CarltonXiang Sep 24, 2025
62d16e2
change version to 1.1.0
fridayL Sep 24, 2025
3767638
change: version to v1.1.1
fridayL Sep 24, 2025
f411f75
Merge remote-tracking branch 'upstream/dev' into eval/0910
Nyakult Sep 25, 2025
382312e
feat: memos-api evaluation scripts
Nyakult Aug 8, 2025
ec759e3
feat: add api client (#316)
CarltonXiang Sep 19, 2025
11b36ca
feat: api client (#334)
CarltonXiang Sep 24, 2025
4934e50
change version to 1.1.0
fridayL Sep 24, 2025
923f59f
change: version to v1.1.1
fridayL Sep 24, 2025
1199026
change version to 1.1.0
fridayL Sep 24, 2025
a8fe8bd
change: version to v1.1.1
fridayL Sep 24, 2025
44147b9
Merge remote-tracking branch 'origin/eval/0910' into eval/0910
Nyakult Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions evaluation/.env-example
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# memory process model
MODEL="gpt-4o-mini"
OPENAI_API_KEY="sk-***REDACTED***"
OPENAI_BASE_URL="http://***.***.***.***:3000/v1"
Expand All @@ -6,6 +7,13 @@ MEM0_API_KEY="m0-***REDACTED***"

ZEP_API_KEY="z_***REDACTED***"

# response model
CHAT_MODEL="gpt-4o-mini"
CHAT_MODEL_BASE_URL="http://***.***.***.***:3000/v1"
CHAT_MODEL_API_KEY="sk-***REDACTED***"

MEMOS_KEY="Token mpg-xxxxx"
MEMOS_URL="https://apigw-pre.memtensor.cn/api/openmem/v1"

MEMOBASE_API_KEY="xxxxx"
MEMOBASE_PROJECT_URL="http://xxx.xxx.xxx.xxx:8019"
14 changes: 14 additions & 0 deletions evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,17 @@ This repository provides tools and scripts for evaluating the LoCoMo dataset usi
```

✍️ For evaluating OpenAI's native memory feature with the LoCoMo dataset, please refer to the detailed guide: [OpenAI Memory on LoCoMo - Evaluation Guide](./scripts/locomo/openai_memory_locomo_eval_guide.md).

### LongMemEval Evaluation
First prepare the dataset `longmemeval_s` from https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned
, and save it as `data/longmemeval/longmemeval_s.json`

```bash
# Edit the configuration in ./scripts/run_lme_eval.sh
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
./scripts/run_lme_eval.sh
```

### prefEval Evaluation

### personaMem Evaluation
9 changes: 9 additions & 0 deletions evaluation/configs-example/mirix_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
agent_name: mirix
model_name: gpt-4o-mini
model_endpoint: http://***.***.***.***:3000/v1
api_key: sk-***REDACTED***
embedding_model_name: text-embedding-3-small
generation_config:
temperature: 0.8
max_tokens: 16192
context_window: 32768
12 changes: 7 additions & 5 deletions evaluation/scripts/locomo/locomo_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

import nltk
import numpy as np
import tiktoken
import transformers

from bert_score import score as bert_score
Expand All @@ -23,7 +24,7 @@

logging.basicConfig(level=logging.CRITICAL)
transformers.logging.set_verbosity_error()

encoding = tiktoken.get_encoding("cl100k_base")
# Download necessary NLTK resources
try:
nltk.download("wordnet", quiet=True)
Expand Down Expand Up @@ -173,7 +174,7 @@ def calculate_nlp_metrics(gold_answer, response, context, options=None):
gold_answer = str(gold_answer) if gold_answer is not None else ""
response = str(response) if response is not None else ""

metrics = {"context_tokens": len(nltk.word_tokenize(context)) if context else 0}
metrics = {"context_tokens": len(encoding.encode(context)) if context else 0}

if "lexical" in options:
gold_tokens = nltk.word_tokenize(gold_answer.lower())
Expand Down Expand Up @@ -363,11 +364,12 @@ async def limited_task(task):
"--lib",
type=str,
choices=["zep", "memos", "mem0", "mem0_graph", "openai", "memos-api", "memobase"],
default="memos-api",
)
parser.add_argument(
"--version",
type=str,
default="default",
default="0917-test",
help="Version identifier for loading results (e.g., 1010)",
)
parser.add_argument(
Expand All @@ -376,9 +378,9 @@ async def limited_task(task):
default=3,
help="Number of times to run the LLM grader for each question",
)
parser.add_argument("--options", nargs="+", default=["lexical", "semantic"])
parser.add_argument("--options", nargs="+", default=[])
parser.add_argument(
"--workers", type=int, default=4, help="Number of concurrent workers for processing groups"
"--workers", type=int, default=10, help="Number of concurrent workers for processing groups"
)
args = parser.parse_args()

Expand Down
Loading
Loading