Skip to content

dill-lab/sample-fusion-with-congrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

This repository contains code and data for "Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs" by Sayan Ghosh, Shahzaib Saqib Warraich, Dhruv Tarsadiya, Gregory Yauney, and Swabha Swayamdipta.

Factuality Tasks

1. Data

Biography generation:

  1. For our list of biography entities, refer to the scripts/factscore_eval/bio_entities.txt file.
  2. Use the prompt: "Tell me a paragraph bio of " + entity + ". ", where the entity is each line in the bio_entities.txt file.

PopQA generation:

  1. For our list of PopQA entities, refer to the scripts/factscore_eval/popqa_entities.json file.
  2. Use the prompt: "Provide me with a paragraph detailing some facts related to " + wiki_title + ". ", where the wiki_title is a key-value pair in the list of dictionaries in the popqa_entities.json file.

All generated responses that we synthesize from are included in the data directory.

We use FActScore for evaluation. Download the Wikipedia database and relevant files here: https://github.com/shmsw25/FActScore/blob/main/factscore/download_data.py

2. Generate Samples / Generate Baselines

Sample bio generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_bio_gen.py

Sample LM consensus generation script: scripts/baseline_generation/cons.py

execution example:

  cd scripts/baseline_generation
  python cons.py --input <INPUT_FILE>  --output <OUTPUT_FILE>

3. ConGrs Construction

Construction script: scripts/test_poa_bios_batch.ipynb

ConGrs construction saves pkl files. Provide path to generated samples in the script.

4. Decoding

Decoding script: scripts/test_poa_bios_batch_decode.ipynb

Response Synthesis with Consensus Decoding

  • Provide the path to your saved ConGr .pkl files in the script.
  • Specify the task code in the script:
    • popqaPopQA
    • bioBiographies

5. Eval

Note: FActScore eval is to be executed first, and then HALoGEN eval. The output of the FActScore eval is the input of the HALoGEN eval.

Sample FActScore eval input file: scripts/factscore_eval/sample_fs_input.json

Sample FActScore eval output file: scripts/factscore_eval/sample_fs_output.json

execution example:

  cd scripts/factscore_eval
  python factscore_eval_run.py --data_path <INPUT_FILE>  --result_path <OUTPUT_FILE>

Sample HALoGEN eval input file: scripts/halogen_eval/sample_factuality_input.json

Add OpenAI API keys in the config.yml file.

execution example:

    cd scripts/halogen_eval
    python bio_scorer.py --input_dir <INPUT_FILE> --output_dir <INPUT_FILE>

Postprocessing example:

Add the path of the output file from previous step in the following script

    python factuality_postprocess.py

Add the path of the output file from previous step in the following script, this file generates results for tables 1 and 2 in the main paper

    python factuality_results.py

Refusal-based tasks

1. Data

False presuppositions generation: For our list of prompts, refer to the scripts/halogen_eval/fp.json file.

Scientific attributions generation: For our list of prompts, refer to the scripts/halogen_eval/refs.json file.

Historical events generation: For our list of prompts, refer to the scripts/halogen_eval/he.json file.

All generated responses that we synthesize from are included in the data directory.

2. Generate Samples / Generate Baselines

Sample fp generation script for the 8 bit Qwen72B-Instruct model: scripts/sample_fp_gen.py

Sample LM consensus generation script: scripts/baseline_generation/cons.py

execution example:

  cd scripts/baseline_generation
  python cons.py --input <INPUT_FILE> --output <OUTPUT_FILE>

3. ConGrs Construction

Construction script: scripts/test_poa_bios_batch.ipynb

ConGrs construction saves pkl files. Provide path to generated samples in the script.

4. Decoding

Decoding script: scripts/test_poa_bios_batch_decode.ipynb

Response Synthesis with Consensus Decoding

  • Provide the path to your saved ConGr .pkl files in the script.
  • Specify the task code in the script:
    • fpFalse Presuppositions
    • refsScientific References

5. Eval

Sample HALoGEN hallucination generation input file: scripts/halogen_eval/sample_refusal_halc_input.json

Add Semantic Scholar and OPENAI API keys in the config.yml file.

execution example:

  cd scripts/halogen_eval
  python evaluate_hallucinations.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE> --scientific_attribution

Sample HALoGEN eval input file: scripts/halogen_eval/sample_refusal_input.json

execution example:

  cd scripts/halogen_eval
  python reference_scorer.py --input_dir <INPUT_FILE> --output_dir <OUTPUT_FILE>

postprocessing example:

Add the path of the output file from previous step in the script, this file generates results for table 5 in the main paper

  python refusal_results.py

Reasoning tasks

1. Data

We use two benchmark datasets:

2. Generate Samples

Inference scripts:

  • MATH: scripts/math_sample_generation.py
  • AIME: scripts/aime_sample_generation.py

Generates 5 samples per instance for all models. Provide path to benchmark data in the scripts.

3. ConGrs Construction and Guided Self-Verification

Construction and decoding script: scripts/process_math.ipynb

ConGrs construction and response synthesis with guided self-verification. Provide path to generated samples in the script.

4. Evaluation

Evaluation script: scripts/matheval.py

Measures Accuracy across dataset. Provide path to final decoded responses file in the script.

Citation

@inproceedings{
  ghosh-etal-2025-sample-align-synthesize,
  title={Sample, Align, Synthesize: Graph-Based Response Synthesis with {C}on{G}rs},
  author={Sayan Ghosh and Shahzaib Saqib Warraich and Dhruv Tarsadiya and Gregory Yauney and Swabha Swayamdipta},
  booktitle={Workshop on Test-time Scaling and Reasoning Models at COLM 2025},
  year={2025}
}

About

Code and data for "Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published