Skip to content

Conversation

rucpande
Copy link

@rucpande rucpande commented Oct 1, 2025

Description

Add support for Cisco AI Defense Security, Privacy, and Safety guardrails as a third party API.

Features

  • Input Protection: Inspect user prompts before processing
  • Output Protection: Inspect bot responses before delivery
  • Configuration: Environment-based API configuration
  • Rails Exceptions: Support for enable_rails_exceptions mode
  • Logging: Logging for debugging and monitoring

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @cparisien thanks for reviewing!

- Add AI Defense action for input/output protection
- Add documentation for setup and configuration
- Support for environment-based API key configuration

Fixes NVIDIA-NeMo#1420
Copy link
Contributor

github-actions bot commented Oct 1, 2025

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1433

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi requested a review from Copilot October 2, 2025 13:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Cisco AI Defense integration to NeMo Guardrails, providing security guardrails for input and output protection. The integration enables inspection of user prompts and bot responses through Cisco's AI Defense API to detect and block potentially harmful content.

Key changes:

  • Implementation of AI Defense inspection actions and flows for input/output protection
  • Configuration support through environment variables (API key and endpoint)
  • Comprehensive test coverage including unit, integration, and error handling tests

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
nemoguardrails/library/ai_defense/actions.py Core AI Defense inspection action with HTTP client for API calls
nemoguardrails/library/ai_defense/flows.v1.co Colang v1.0 flow definitions for input/output protection
nemoguardrails/library/ai_defense/flows.co Colang v2.0 flow definitions for input/output protection
tests/test_ai_defense.py Comprehensive test suite covering unit, integration, and error scenarios
docs/user-guides/community/ai-defense.md User documentation for setup and usage
docs/user-guides/guardrails-library.md Integration into main guardrails library documentation
examples/configs/ai_defense/config.yml Example configuration file
examples/configs/ai_defense/README.md Example documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

- Remove placeholder comment in test_real_api_call_with_safe_output
- Remove debug print statements from test code
- Fix incorrect docstring in ai_defense_text_mapping function~
Copy link
Collaborator

@tgasser-nv tgasser-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Mostly naming nits to address.

Could you also run a local integration test (pytest -m integration) with AI_DEFENSE_API_ENDPOINT and AI_DEFENSE_API_KEY set and copy the result into the description?

log = logging.getLogger(__name__)


def ai_defense_text_mapping(result: dict) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a type-annotation here (would dict[str, Any] work with the response?)

nit: Maybe rename to indicate the polarity of the bool returned, i.e. is_ai_defense_text_blocked()? So a True means blocked and False is ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I've made the suggested changes.

user_prompt: Optional[str] = None, bot_response: Optional[str] = None, **kwargs
):
api_key = os.environ.get("AI_DEFENSE_API_KEY")
if api_key is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe change to if not api_key to catch if the AI_DEFENSE_API_KEY is a falsy value like ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Expects result to be a dict with:
- "is_blocked": a boolean indicating if the prompt or response sent to AI Defense should be blocked.
Returns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the intent?

# default to not blocked (safe/fail-open) if is_blocked is missing

then shouldn't we change the default value to False?

is_blocked = result.get("is_blocked", False)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning this up to remove is_blocked and keep jsut a single value and have it default to fail closed.

else:
msg = "Either user_prompt or bot_response must be provided"
log.error(msg)
raise ValueError(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No timeout configured. If we expect the AI Defense API hangs for any reason, this will block indefinitely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have started your review before my most recent changes where I added a timeout (and a fail open config). Please let me know if that's not the case and I'm still missing something.


payload: Dict[str, Any] = {"messages": messages}
if metadata:
payload["metadata"] = metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code assumes data.get("is_safe") exists but doesn't validate the response structure. If API returns unexpected format, this could fail silently.

Copy link
Author

@rucpande rucpande Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, added those checks in my commit from yesterday.

if fail_open:
# Fail open: allow content when API call fails
log.warning(
"AI Defense API call failed, but fail_open=True, allowing content"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both is_blocked and is_safe are returned, which are redundant (one is just not of the other). Are you expecting this to evolve differently?

Copy link
Author

@rucpande rucpande Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will clean it up so it only uses is_safe.


# Action error should be handled by the runtime and surface as a generic error message
chat >> "Hello"
chat << "I'm sorry, an internal error has occurred."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test 👍🏻 I suggest to have a similar test for Colang 2.0.

I expect to see some issues which are not necessarily related to your PR but we might be able to resolve it.

you can see some example of colang 2.0 configs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: Add support for Cisco AI Defense API as a guardrail provider for both input (prompt) and output (response) protection in NeMo Guardrails.
4 participants