-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update refusal prompt #1083
base: main
Are you sure you want to change the base?
update refusal prompt #1083
Conversation
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
Developer Certificate of Origin Copyright (C) 2004, 2006 The Linux Foundation and its contributors. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. |
I have read the DCO Document and I hereby sign the DCO |
recheck |
thanks a lot for this - it is in the queue and we're looking forward to integrating as soon as we can! |
Tell us what this change does. If you're fixing a bug, please mention
the github issue number.
I just swapped out a prompt with a new one. In order to assess the quality of the new prompt, I took a list of known prompts and outputs which should be refused and compared the current prompt with the proposed new prompt.
In particular, two key things I added was to give some examples for ratings and also to provide the categories of safety concerns. The categories are the same ones from Aegis 2.0.
Please ensure you are submitting from a unique branch in your repository to
main
upstream.Verification
List the steps needed to make sure this thing works
garak -m <model_type> -n <model_name>
python -m pytest tests/
If you are opening a PR for a new plugin that targets a specific piece of hardware or requires a complex or hard-to-find testing environment, we recommend that you send us as much detail as possible.
Specific Hardware Examples:
cuda
/mps
( Please notcuda
viaROCm
if related )Complex Software Examples: