Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

This repository contains code and data for "Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs" by Sayan Ghosh, Shahzaib Saqib Warraich, Dhruv Tarsadiya, Gregory Yauney, and Swabha Swayamdipta.

Factuality Tasks

1. Data

Biography generation:

For our list of biography entities, refer to the scripts/factscore_eval/bio_entities.txt file.
Use the prompt: "Tell me a paragraph bio of " + entity + ". ", where the entity is each line in the bio_entities.txt file.

PopQA generation:

For our list of PopQA entities, refer to the scripts/factscore_eval/popqa_entities.json file.
Use the prompt: "Provide me with a paragraph detailing some facts related to " + wiki_title + ". ", where the wiki_title is a key-value pair in the list of dictionaries in the popqa_entities.json file.

All generated responses that we synthesize from are included in the data directory.

We use FActScore for evaluation. Download the Wikipedia database and relevant files here: https://github.com/shmsw25/FActScore/blob/main/factscore/download_data.py

2. Generate Samples / Generate Baselines

Sample bio generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_bio_gen.py

Sample LM consensus generation script: scripts/baseline_generation/cons.py

execution example:

  cd scripts/baseline_generation
  python cons.py --input <INPUT_FILE>  --output <OUTPUT_FILE>

3. ConGrs Construction

Construction script: scripts/test_poa_bios_batch.ipynb

ConGrs construction saves pkl files. Provide path to generated samples in the script.

4. Decoding

Decoding script: scripts/test_poa_bios_batch_decode.ipynb

Response Synthesis with Consensus Decoding

Provide the path to your saved ConGr .pkl files in the script.
Specify the task code in the script:
- popqa → PopQA
- bio → Biographies

5. Eval

Note: FActScore eval is to be executed first, and then HALoGEN eval. The output of the FActScore eval is the input of the HALoGEN eval.

Sample FActScore eval input file: scripts/factscore_eval/sample_fs_input.json

Sample FActScore eval output file: scripts/factscore_eval/sample_fs_output.json

execution example:

  cd scripts/factscore_eval
  python factscore_eval_run.py --data_path <INPUT_FILE>  --result_path <OUTPUT_FILE>

Sample HALoGEN eval input file: scripts/halogen_eval/sample_factuality_input.json

Add OpenAI API keys in the config.yml file.

execution example:

    cd scripts/halogen_eval
    python bio_scorer.py --input_dir <INPUT_FILE> --output_dir <INPUT_FILE>

Postprocessing example:

Add the path of the output file from previous step in the following script

    python factuality_postprocess.py

Add the path of the output file from previous step in the following script, this file generates results for tables 1 and 2 in the main paper

    python factuality_results.py

Refusal-based tasks

1. Data

False presuppositions generation: For our list of prompts, refer to the scripts/halogen_eval/fp.json file.

Scientific attributions generation: For our list of prompts, refer to the scripts/halogen_eval/refs.json file.

Historical events generation: For our list of prompts, refer to the scripts/halogen_eval/he.json file.

All generated responses that we synthesize from are included in the data directory.

2. Generate Samples / Generate Baselines

Sample fp generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_fp_gen.py

Sample LM consensus generation script: scripts/baseline_generation/cons.py

execution example:

  cd scripts/baseline_generation
  python cons.py --input <INPUT_FILE> --output <OUTPUT_FILE>

3. ConGrs Construction

Construction script: scripts/test_poa_bios_batch.ipynb

ConGrs construction saves pkl files. Provide path to generated samples in the script.

4. Decoding

Decoding script: scripts/test_poa_bios_batch_decode.ipynb

Response Synthesis with Consensus Decoding

Provide the path to your saved ConGr .pkl files in the script.
Specify the task code in the script:
- fp → False Presuppositions
- refs → Scientific References

5. Eval

Sample HALoGEN hallucination generation input file: scripts/halogen_eval/sample_refusal_halc_input.json

Add Semantic Scholar and OPENAI API keys in the config.yml file.

execution example:

  cd scripts/halogen_eval
  python evaluate_hallucinations.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE> --scientific_attribution

Sample HALoGEN eval input file: scripts/halogen_eval/sample_refusal_input.json

execution example:

  cd scripts/halogen_eval
  python reference_scorer.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE>

postprocessing example:

Add the path of the output file from previous step in the script, this file generates results for table 5 in the main paper

  python refusal_results.py

Reasoning tasks

1. Data

We use two benchmark datasets:

2. Generate Samples

Inference scripts:

MATH: scripts/math_sample_generation.py
AIME: scripts/aime_sample_generation.py

Generates 5 samples per instance for all models. Provide path to benchmark data in the scripts.

3. ConGrs Construction and Guided Self-Verification

Construction and decoding script: scripts/process_math.ipynb

ConGrs construction and response synthesis with guided self-verification. Provide path to generated samples in the script.

4. Evaluation

Evaluation script: scripts/matheval.py

Measures Accuracy across dataset. Provide path to final decoded responses file in the script.

Citation

@inproceedings{
  ghosh-etal-2025-sample-align-synthesize,
  title={Sample, Align, Synthesize: Graph-Based Response Synthesis with {C}on{G}rs},
  author={Sayan Ghosh and Shahzaib Saqib Warraich and Dhruv Tarsadiya and Gregory Yauney and Swabha Swayamdipta},
  booktitle={Workshop on Test-time Scaling and Reasoning Models at COLM 2025},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Factuality Tasks

1. Data

2. Generate Samples / Generate Baselines

3. ConGrs Construction

4. Decoding

5. Eval

Refusal-based tasks

1. Data

2. Generate Samples / Generate Baselines

3. ConGrs Construction

4. Decoding

5. Eval

Reasoning tasks

1. Data

2. Generate Samples

3. ConGrs Construction and Guided Self-Verification

4. Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

License

dill-lab/sample-fusion-with-congrs

Folders and files

Latest commit

History

Repository files navigation

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Factuality Tasks

1. Data

2. Generate Samples / Generate Baselines

3. ConGrs Construction

4. Decoding

5. Eval

Refusal-based tasks

1. Data

2. Generate Samples / Generate Baselines

3. ConGrs Construction

4. Decoding

5. Eval

Reasoning tasks

1. Data

2. Generate Samples

3. ConGrs Construction and Guided Self-Verification

4. Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages