Community Alignment

Hugging Face | Paper

community_alignment.mov

Dataset

Community Alignment is a large-scale open source, multilingual and multi-turn preference dataset to align LLMs with human preferences across cultures. It features prompt-level overlap in annotators, enabling social-choice-based and distributional approaches to LLM alignment, as well as natural language explanations for choices.

[Large-scale] ~200,000 comparisons of LLM responses, collected from >3,000 unique annotators who provided feedback at an individual level.
[Multilingual] Contains comparisons in English, French, Italian, Hindi, and Portuguese. 63% of comparisons are non-English.
[Prompt-level overlap] 2599 prompts feature at least 10 annotations per comparison where annotators overlap across prompts.
[High-quality natural language explanations] For 27% of prompts, annotators provided detailed explanations why they preferred one response over another.

License

Community Alignment is released under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). For details, see the LICENSE in this repository.

Codebook

Please see Appendix H of the paper for the codebook.

Usage

In ~27% of the conversations in our dataset, annotators initiate the dialogue with their own prompts. These prompts do not reflect the position of Meta or its employees. Users must implement appropriate filtering and moderation measures when utilizing this dataset for training purposes to ensure that the generated outputs adhere to their own content standards. The user-initiated conversations can be easily filtered out of the dataset using the is_pregenerated_first_prompt flag.

Attribution

When using this dataset in any publications or research output, please cite the accompanying paper. For BibTex, use

@article{zhang2025cultivating,
  title   = {Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset},
  author  = {Lily Hong Zhang and Smitha Milli and Karen Jusko and Jonathan Smith and Brandon Amos and Wassim and Bouaziz and Manon Revel and Jack Kussman and Lisa Titus and Bhaktipriya Radharapu and Jane Yu and Vidya Sarma and Kris Rose and Maximilian Nickel},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2507.09650}
}

For in-text citations, use

Zhang, L. H., Milli, S., Jusko, K., Smith, J., Amos, B., Bouaziz, W., Revel, M., Kussmann, J., Titus, L., Radharapu, B., Yu, J., Sarma, V., Rose, K., Nickel, M. (2025). Cultivating Pluralism In Algorithmic Monoculture: The Community Alignent Dataset.

Feedback

If you use Community Alignment, we would love to know (a) what you found valuable in it and (b) what features you wish it had (as well as any other feedback you may have). This will help support and guide us in doing future projects of this kind. Additionally, if you encounter any issues, such as the presence of personal or private information (PII) or requests from participants for data removal, please let us know. You can contact us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitattributes		.gitattributes
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
community_alignment.csv.zip		community_alignment.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Community Alignment

Hugging Face | Paper

Dataset

License

Codebook

Usage

Attribution

Feedback

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

facebookresearch/community-alignment-dataset

Folders and files

Latest commit

History

Repository files navigation

Community Alignment

Hugging Face | Paper

Dataset

License

Codebook

Usage

Attribution

Feedback

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages