Skip to content

Conversation

gautamvarmadatla
Copy link

@gautamvarmadatla gautamvarmadatla commented Oct 1, 2025

Summary

colang/v2_x state serialization was type-tagging all dataclasses (e.g., {"__type":"Foo","value":...}), which breaks decoding for classes outside Guardrails’ known types (name_to_class is built from colang_ast + flows). When such JSON is round-tripped via json_to_state, decoding raises Unknown d_type: Foo.

This PR keeps type tags for known Guardrails classes (so they still round-trip as structured objects) and encodes unknown dataclasses as plain dicts ({"__type":"dict","value":...}), ensuring robust decoding for user-land or third-party dataclasses. Adds a unit test to prevent regressions.

What’s affected (scope in the framework)

  • Module: nemoguardrails/colang/v2_x/runtime/serialization.py

    • encode_to_dict (encoding path) — changed
    • decode_from_dict (decoding path) — unchanged, but now protected from unknown dataclass tags
    • state_to_json / json_to_state — behavior preserved; round-trip is more resilient
  • Runtime surfaces that rely on state JSON:

    • LLM rails logging & tracing (state snapshots emitted during generation/execution)
    • Action/tool logging (e.g., passthrough and tool-calling paths that serialize intermediate state)
    • Persistence/telemetry/debugging that stores or reloads State JSON

Changes

  • Encoding rule for dataclasses:

    • If type(obj).__name__ is in name_to_class (i.e., Guardrails’ own Colang/flows types) → retain type tag ({"__type":"ClassName","value":...}) to enable full object reconstruction.
    • If not in name_to_class (unknown/user-land dataclass) → encode as dict ({"__type":"dict","value":...}) to avoid Unknown d_type on decode.
  • Tests: tests/test_serialization_dataclass.py ensures an unknown dataclass is encoded as a dict payload and decodes safely.

Rationale

  • Real-world states can contain custom dataclasses from actions, tools, or integration code. Previous behavior emitted {"__type":"CustomClass"} which decode_from_dict cannot map back (since name_to_class is limited), causing hard failures when logs are reloaded or states are restored.
  • This change preserves lossless round-trip for Guardrails’ native types, while guaranteeing JSON-safety and decode-safety for everything else.

Testing

  • Unit test (new):

    • python -m pytest tests/v2_x/test_serialization_dataclass.py -q

Backward compatibility & risk

  • BC-safe: Known Guardrails classes still produce type-tagged JSON and decode to original objects as before.
  • Safer defaults: Unknown dataclasses previously produced JSON that could not be decoded; now they decode to plain dicts with the same field values.
  • Schema note: For unknown dataclasses, the on-wire shape remains the project’s typed envelope ({"__type":"dict","value":...}), so downstream consumers that already tolerate dict-encoded nodes remain compatible.
  • Performance: Negligible; only affects dataclass branch during encoding.

Developer notes

  • name_to_class is populated from colang_ast_module and flows_module. The new rule relies solely on that mapping to decide when to keep a class tag vs. downgrade to dict.
  • If future modules add decodable types, they will naturally benefit from the keep-tag path without changes here.

Links

@gautamvarmadatla gautamvarmadatla force-pushed the fix/dataclass-serialization branch from 689f847 to 787b7f4 Compare October 1, 2025 05:02
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: state_to_json() doesn't correctly serialize dataclasses
2 participants