Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG harm categories for PKU-SafeRLHF Dataset aren't added yet #736

Open
eugeniavkim opened this issue Feb 25, 2025 · 0 comments
Open

BUG harm categories for PKU-SafeRLHF Dataset aren't added yet #736

eugeniavkim opened this issue Feb 25, 2025 · 0 comments
Labels
bug Something isn't working datasets Pulling in external datasets into PyRIT help wanted Extra attention is needed

Comments

@eugeniavkim
Copy link
Contributor

Similar to this issue, but the harm categories are explicitly included in the repo: #730

The fetch_adv_bench_dataset currently does not have any applied harm categories to the different prompts. We want to be able to use this dataset with harm category filters and this requires we grab the values from its category labels to the dataset to use in PyRIT. There are harm_categories in the original hugging face dataset which we can grab them from.

@eugeniavkim eugeniavkim added bug Something isn't working datasets Pulling in external datasets into PyRIT labels Feb 25, 2025
@romanlutz romanlutz added the help wanted Extra attention is needed label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets Pulling in external datasets into PyRIT help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants