Add preliminary answer generation with zero shot and correct chunk #35

sumukshashidhar · 2025-03-26T08:41:38Z

Add answer generation with two settings:

zero_shot (to test knowledge capabilities)
with_correct_chunk (to test potential upper bound)

sumukshashidhar · 2025-03-26T08:42:32Z

i can fix CQ after merge

clefourrier · 2025-03-26T09:04:46Z

No, please fix before - it's bad practice to merge PRs where checks do not pass

sumukshashidhar · 2025-03-26T09:18:04Z

fixed

clefourrier · 2025-03-26T09:22:09Z

yourbench/pipeline/answer_generation.py

+        _generate_answers_for_questions(
+            config=config,
+            source_subset="single_shot_questions",
+            scenario=scenario,
+            output_subset="single_shot_questions_with_answers",
+        )
+
+        # Multi-hop
+        _generate_answers_for_questions(
+            config=config,
+            source_subset="multi_hop_questions",
+            scenario=scenario,
+            output_subset="multi_hop_questions_with_answers",
+        )


Why do you select both here? It should depend on the config and whether we generated multihop questions & single shot ones, no?

updated the code so that single-shot answers are only generated if single_shot_question_generation.run is true, and similarly for multi-hop if multi_hop_question_generation.run is true. This way, we only generate answers for the question types actually produced.

of course, need to add further validation later, for special cases (perhaps multiple config files, later runs, etc), but this should do as an officially supported functionality

clefourrier · 2025-03-26T09:23:43Z

yourbench/pipeline/answer_generation.py

+                row_map.append(i)
+
+        else:
+            logger.warning(f"Unrecognized scenario '{scenario}' (row={i}), skipping.")


Should fail way earlier as it will affect all rows

(btw, no longer an issue if you invert the if and for clauses ^^)

now raise a ValueError if the scenario is unrecognized (not in ("zero_shot","with_correct_chunk")) before processing any rows, instead of issuing repeated warnings per-row

clefourrier · 2025-03-26T09:31:00Z

yourbench/pipeline/answer_generation.py

+        return calls, row_map
+
+    doc_meta_map = {}
+    if scenario == "with_correct_chunk":


You need to split here the if, and iterate on the rows inside the clauses - else you're adding an extra if check at each new row, it's an extra operation which is useless.

if scenario == "zero shot": for row; do the thing elif scenario == "with correct chunks": load dataset for row: do the thing else: etc

I'll re do a review once this is fixed

The _build_inference_calls_scenario() function now has a top-level branch on scenario (if scenario == "zero_shot": ... elif scenario == "with_correct_chunk": ...), so we avoid repeated scenario checks within the row loop.

clefourrier · 2025-03-26T09:32:44Z

yourbench/pipeline/answer_generation.py

+
+            # Single-shot uses 'chunk_id', multi-hop uses 'source_chunk_ids'
+            if source_subset.startswith("single_shot"):
+                chunk_id = row.get("chunk_id", "")


Ŵhy don't you use an extraction function here?

added _extract_chunk_ids_for_row(...) which pulls out document_id and chunk IDs for single-hop vs. multi-hop

clefourrier · 2025-03-26T09:33:26Z

yourbench/pipeline/answer_generation.py

+
+    for model_name, model_responses in responses_dict.items():
+        logger.info(f"Processing {len(model_responses)} responses from model={model_name} for scenario={scenario}.")
+        n_common = min(len(model_responses), len(row_map))


what is the role of n_common? (the name is not clear to me, please add one line of comment)

added comment

alozowski · 2025-03-26T12:43:40Z

yourbench/pipeline/answer_generation.py

+
+    A column 'answer_fashion' indicates which scenario was used to generate the answer.
+    """
+    stage_cfg = config.get("pipeline", {}).get("answer_generation", {})


A possible situation here, if the config structure is wrong, it could silently fail or behave incorrectly. I would add a "TODO" here to integrate a config validation for the whole project, something like
TODO: Add global config schema validation across the project

…tion add global guard for question generation

Make "fast_chunking" the default chunking mode and preserve semantic mode under config

Add TruffleHog Secret Scanning GitHub Action

Bump up huggingface-hub to 0.30.1

fix uninitialized subset loading

remove fallback row add

update readme to include HF_ORGANIZATION env var

Add yourbench paper

Update README.md

remove torch dependency

Throw relevant error when inference fails

…nch into add-answer-generation

sumukshashidhar added 7 commits March 25, 2025 23:51

add global guard for question generation

dc79342

remove fallback row add

81ea344

fix uninitialized subset loading

c81c5de

add answer-generation

06b9e01

add answer generation to handler

53e5119

add answer generations to example

bb3e385

Update example.yaml

2f3e6e6

sumukshashidhar requested review from clefourrier and alozowski March 26, 2025 08:41

refactor for cq

c5402db

clefourrier requested changes Mar 26, 2025

View reviewed changes

alozowski reviewed Mar 26, 2025

View reviewed changes

sumukshashidhar and others added 15 commits March 30, 2025 15:55

Merge pull request #31 from huggingface/add-guards-single-shot-genera…

5dccc49

…tion add global guard for question generation

add fast chunking

ab2bfe2

fix cq

838de5d

Merge pull request #38 from huggingface/fast_chunk

316201e

Make "fast_chunking" the default chunking mode and preserve semantic mode under config

Update trufflehog action

06d2b88

Merge pull request #41 from huggingface/add-secrets-scanning

48e47a5

Add TruffleHog Secret Scanning GitHub Action

Bump up huggingface-hub to 0.30.1

87c83c8

Merge pull request #42 from huggingface/bump-up-huggingface-hub

18bb771

Bump up huggingface-hub to 0.30.1

Merge pull request #34 from huggingface/fix-uninitialized-subset-loading

1ac362c

fix uninitialized subset loading

Merge pull request #32 from huggingface/remove-unnatural-concat

bb09623

remove fallback row add

update readme to include HF_ORGANIZATION env var

dec0d9a

Update README.md

a7bbfbe

Merge pull request #43 from huggingface/add-hf-env-var-to-readme

aacffa8

update readme to include HF_ORGANIZATION env var

Create README.md

d567e34

Add original Yourbench Paper

4d1c9cb

sumukshashidhar and others added 27 commits April 2, 2025 10:52

Delete docs/academic/README.md

6f30ae9

Merge pull request #44 from huggingface/add-yb-paper

334778a

Add yourbench paper

Update README.md

1197058

Create USING_OPENAI_COMPATIBLE_MODELS.md

b86cfe1

Update README.md

fc6df91

revise configuration

d09ac25

add arxiv link

e214aac

Merge pull request #45 from huggingface/update_documentation

b5a4dad

Update README.md

remove torch dependency

a8fab7d

cq

7889af8

fix code quality

9103724

Create LICENSE (#47)

95b9c4b

Update pyproject.toml

c028423

Update pyproject.toml

bf6835d

Update pyproject.toml

4df5724

Update pyproject.toml

e23eb4e

remove chunking errors that occur without torch installed

e7cc509

fix cq

9f2a6c3

Merge pull request #49 from huggingface/remove-pytorch-dependency

ecfdde2

remove torch dependency

raise error for failed inference

020bf47

add sane timeout

eb50e1f

Merge pull request #52 from huggingface/throw-relevant-error

d7aac71

Throw relevant error when inference fails

add additional instructions

bf758d7

fix cq

cfbf89c

Merge branch 'add-answer-generation' of github.com:huggingface/yourbe…

79abd7d

…nch into add-answer-generation

redo answer generation

94a7b33

fix cq

ce355f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add preliminary answer generation with zero shot and correct chunk #35

Add preliminary answer generation with zero shot and correct chunk #35

sumukshashidhar commented Mar 26, 2025

sumukshashidhar commented Mar 26, 2025

clefourrier commented Mar 26, 2025

sumukshashidhar commented Mar 26, 2025

clefourrier Mar 26, 2025

sumukshashidhar Apr 13, 2025

clefourrier Mar 26, 2025

clefourrier Mar 26, 2025

sumukshashidhar Apr 13, 2025

clefourrier Mar 26, 2025

clefourrier Mar 26, 2025

sumukshashidhar Apr 13, 2025

clefourrier Mar 26, 2025

sumukshashidhar Apr 13, 2025

clefourrier Mar 26, 2025

sumukshashidhar Apr 13, 2025

alozowski Mar 26, 2025 •

edited

Loading

sumukshashidhar Apr 13, 2025

Add preliminary answer generation with zero shot and correct chunk #35

Are you sure you want to change the base?

Add preliminary answer generation with zero shot and correct chunk #35

Conversation

sumukshashidhar commented Mar 26, 2025

sumukshashidhar commented Mar 26, 2025

clefourrier commented Mar 26, 2025

sumukshashidhar commented Mar 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alozowski Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alozowski Mar 26, 2025 •

edited

Loading