This repository contains code and data for "Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs" by Sayan Ghosh, Shahzaib Saqib Warraich, Dhruv Tarsadiya, Gregory Yauney, and Swabha Swayamdipta.
Biography generation:
- For our list of biography entities, refer to the
scripts/factscore_eval/bio_entities.txt
file. - Use the prompt: "Tell me a paragraph bio of " + entity + ". ", where the entity is each line in the bio_entities.txt file.
PopQA generation:
- For our list of PopQA entities, refer to the
scripts/factscore_eval/popqa_entities.json
file. - Use the prompt: "Provide me with a paragraph detailing some facts related to " + wiki_title + ". ", where the wiki_title is a key-value pair in the list of dictionaries in the popqa_entities.json file.
All generated responses that we synthesize from are included in the data
directory.
We use FActScore for evaluation. Download the Wikipedia database and relevant files here: https://github.com/shmsw25/FActScore/blob/main/factscore/download_data.py
Sample bio generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_bio_gen.py
Sample LM consensus generation script: scripts/baseline_generation/cons.py
execution example:
cd scripts/baseline_generation
python cons.py --input <INPUT_FILE> --output <OUTPUT_FILE>
Construction script: scripts/test_poa_bios_batch.ipynb
ConGrs construction saves pkl files. Provide path to generated samples in the script.
Decoding script: scripts/test_poa_bios_batch_decode.ipynb
Response Synthesis with Consensus Decoding
- Provide the path to your saved ConGr
.pkl
files in the script. - Specify the task code in the script:
popqa
→ PopQAbio
→ Biographies
Note: FActScore eval is to be executed first, and then HALoGEN eval. The output of the FActScore eval is the input of the HALoGEN eval.
Sample FActScore eval input file: scripts/factscore_eval/sample_fs_input.json
Sample FActScore eval output file: scripts/factscore_eval/sample_fs_output.json
execution example:
cd scripts/factscore_eval
python factscore_eval_run.py --data_path <INPUT_FILE> --result_path <OUTPUT_FILE>
Sample HALoGEN eval input file: scripts/halogen_eval/sample_factuality_input.json
Add OpenAI API keys in the config.yml file.
execution example:
cd scripts/halogen_eval
python bio_scorer.py --input_dir <INPUT_FILE> --output_dir <INPUT_FILE>
Postprocessing example:
Add the path of the output file from previous step in the following script
python factuality_postprocess.py
Add the path of the output file from previous step in the following script, this file generates results for tables 1 and 2 in the main paper
python factuality_results.py
False presuppositions generation:
For our list of prompts, refer to the scripts/halogen_eval/fp.json
file.
Scientific attributions generation:
For our list of prompts, refer to the scripts/halogen_eval/refs.json
file.
Historical events generation:
For our list of prompts, refer to the scripts/halogen_eval/he.json
file.
All generated responses that we synthesize from are included in the data
directory.
Sample fp generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_fp_gen.py
Sample LM consensus generation script: scripts/baseline_generation/cons.py
execution example:
cd scripts/baseline_generation
python cons.py --input <INPUT_FILE> --output <OUTPUT_FILE>
Construction script: scripts/test_poa_bios_batch.ipynb
ConGrs construction saves pkl files. Provide path to generated samples in the script.
Decoding script: scripts/test_poa_bios_batch_decode.ipynb
Response Synthesis with Consensus Decoding
- Provide the path to your saved ConGr
.pkl
files in the script. - Specify the task code in the script:
fp
→ False Presuppositionsrefs
→ Scientific References
Sample HALoGEN hallucination generation input file: scripts/halogen_eval/sample_refusal_halc_input.json
Add Semantic Scholar and OPENAI API keys in the config.yml file.
execution example:
cd scripts/halogen_eval
python evaluate_hallucinations.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE> --scientific_attribution
Sample HALoGEN eval input file: scripts/halogen_eval/sample_refusal_input.json
execution example:
cd scripts/halogen_eval
python reference_scorer.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE>
postprocessing example:
Add the path of the output file from previous step in the script, this file generates results for table 5 in the main paper
python refusal_results.py
We use two benchmark datasets:
Inference scripts:
- MATH:
scripts/math_sample_generation.py
- AIME:
scripts/aime_sample_generation.py
Generates 5 samples per instance for all models. Provide path to benchmark data in the scripts.
Construction and decoding script: scripts/process_math.ipynb
ConGrs construction and response synthesis with guided self-verification. Provide path to generated samples in the script.
Evaluation script: scripts/matheval.py
Measures Accuracy across dataset. Provide path to final decoded responses file in the script.
@inproceedings{
ghosh-etal-2025-sample-align-synthesize,
title={Sample, Align, Synthesize: Graph-Based Response Synthesis with {C}on{G}rs},
author={Sayan Ghosh and Shahzaib Saqib Warraich and Dhruv Tarsadiya and Gregory Yauney and Swabha Swayamdipta},
booktitle={Workshop on Test-time Scaling and Reasoning Models at COLM 2025},
year={2025}
}