-
Notifications
You must be signed in to change notification settings - Fork 7
Rename AgentDojo attack to TemplateString attack #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| user=self._user_name, | ||
| model=agent.get_agent_name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make these extra fields part of the attack config? Different attacks might want to define their own custom templated fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can have as part of the config a list of fields? Or maybe we should use a jinja template for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I moved too fast and forgot about these. Let me do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Check the latest version!
src/prompt_siren/config/default/attack/agentdojo_important_instructions.yaml
Show resolved
Hide resolved
dedeswim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few more template_string that should be changed to agentdojo_important_instructions. Can merge after this has been changed
| # Use different component types | ||
| uv run prompt-siren run benign +dataset=agentdojo agent.config.model=azure:gpt-5 | ||
| uv run prompt-siren run attack +dataset=agentdojo +attack=agentdojo | ||
| uv run prompt-siren run attack +dataset=agentdojo +attack=template_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these (and other ones below) also need renaming to agentdojo_important_instructions
As in the title, the name made sense originally but it is more intuitive as a generic template string class.
We ran tests with:
and they pass