Skip to content

Add preliminary answer generation with zero shot and correct chunk #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

sumukshashidhar
Copy link
Member

Add answer generation with two settings:

  • zero_shot (to test knowledge capabilities)
  • with_correct_chunk (to test potential upper bound)

@sumukshashidhar
Copy link
Member Author

i can fix CQ after merge

@clefourrier
Copy link
Member

No, please fix before - it's bad practice to merge PRs where checks do not pass

@sumukshashidhar
Copy link
Member Author

fixed

Comment on lines 65 to 78
_generate_answers_for_questions(
config=config,
source_subset="single_shot_questions",
scenario=scenario,
output_subset="single_shot_questions_with_answers",
)

# Multi-hop
_generate_answers_for_questions(
config=config,
source_subset="multi_hop_questions",
scenario=scenario,
output_subset="multi_hop_questions_with_answers",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you select both here? It should depend on the config and whether we generated multihop questions & single shot ones, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code so that single-shot answers are only generated if single_shot_question_generation.run is true, and similarly for multi-hop if multi_hop_question_generation.run is true. This way, we only generate answers for the question types actually produced.

of course, need to add further validation later, for special cases (perhaps multiple config files, later runs, etc), but this should do as an officially supported functionality

row_map.append(i)

else:
logger.warning(f"Unrecognized scenario '{scenario}' (row={i}), skipping.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fail way earlier as it will affect all rows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, no longer an issue if you invert the if and for clauses ^^)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now raise a ValueError if the scenario is unrecognized (not in ("zero_shot","with_correct_chunk")) before processing any rows, instead of issuing repeated warnings per-row

return calls, row_map

doc_meta_map = {}
if scenario == "with_correct_chunk":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to split here the if, and iterate on the rows inside the clauses - else you're adding an extra if check at each new row, it's an extra operation which is useless.

if scenario == "zero shot":
    for row;
        do the thing
elif scenario == "with correct chunks":
    load dataset
    for row:
        do the thing
else:
 etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re do a review once this is fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _build_inference_calls_scenario() function now has a top-level branch on scenario (if scenario == "zero_shot": ... elif scenario == "with_correct_chunk": ...), so we avoid repeated scenario checks within the row loop.


# Single-shot uses 'chunk_id', multi-hop uses 'source_chunk_ids'
if source_subset.startswith("single_shot"):
chunk_id = row.get("chunk_id", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ŵhy don't you use an extraction function here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added _extract_chunk_ids_for_row(...) which pulls out document_id and chunk IDs for single-hop vs. multi-hop


for model_name, model_responses in responses_dict.items():
logger.info(f"Processing {len(model_responses)} responses from model={model_name} for scenario={scenario}.")
n_common = min(len(model_responses), len(row_map))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the role of n_common? (the name is not clear to me, please add one line of comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment


A column 'answer_fashion' indicates which scenario was used to generate the answer.
"""
stage_cfg = config.get("pipeline", {}).get("answer_generation", {})
Copy link
Collaborator

@alozowski alozowski Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible situation here, if the config structure is wrong, it could silently fail or behave incorrectly. I would add a "TODO" here to integrate a config validation for the whole project, something like
TODO: Add global config schema validation across the project

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants