Skip to content

Conversation

@butsugiri
Copy link
Collaborator

@butsugiri butsugiri commented Nov 10, 2025

This PR adds a new argument --num_repeats to flexeval_lm command.

What this PR does

For the given command...

flexeval_lm \
  --language_model HuggingFaceLM \
  --language_model.model "sbintuitions/tiny-lm" \
  --eval_setups="./tests/dummy_modules/configs/eval_suite.jsonnet" \
  --save_dir "results-multi" \
  --force=true \
  --num_repeats=3

The final directory structure is as follows:

.
├── generation
│   ├── run0
│   │   ├── config.json
│   │   ├── metrics.json
│   │   └── outputs.jsonl
│   ├── run1
│   │   ├── config.json
│   │   ├── metrics.json
│   │   └── outputs.jsonl
│   └── run2
│       ├── config.json
│       ├── metrics.json
│       └── outputs.jsonl
├── multiple_choice
│   ├── run0
│   │   ├── config.json
│   │   ├── metrics.json
│   │   └── outputs.jsonl
│   ├── run1
│   │   ├── config.json
│   │   ├── metrics.json
│   │   └── outputs.jsonl
│   └── run2
│       ├── config.json
│       ├── metrics.json
│       └── outputs.jsonl
└── perplexity
    ├── run0
    │   ├── config.json
    │   └── metrics.json
    ├── run1
    │   ├── config.json
    │   └── metrics.json
    └── run2
        ├── config.json
        └── metrics.json

What this PR does not

  • Aggregating the result from multiple generations
    • I am wondering if we should include this feature in this PR

Copy link
Collaborator

@junya-takayama junya-takayama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature👏
LGTM!

@butsugiri butsugiri merged commit 956d3cf into main Nov 11, 2025
7 checks passed
@butsugiri butsugiri deleted the multiple-generations branch November 11, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants