feat(schema-generator): deduplicate equivalent enum entity types across tools#93
Conversation
…ss tools When `deduplicate_entity_types` is enabled, enum entity types with identical names and variant lists are consolidated into a single definition placed at the lowest common ancestor (LCA) namespace. Deduplication is skipped (enums stay local) when: - The target LCA namespace already contains a type with the same name - Multiple fingerprints with the same name but different variants would target the same LCA The request generator resolves deduplicated enums by verifying the source namespace membership, preventing false matches between same-named enums with different variants. Signed-off-by: Liana Hadarean <hadarean@amazon.com>
Signed-off-by: Liana Hadarean <hadarean@amazon.com>
Coverage ReportHead Commit: Base Commit: Download the full coverage report. Coverage of Added or Modified Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 80.44% Status: PASSED ✅ Details
Coverage of All Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 90.63% Status: PASSED ✅ Details
|
Coverage ReportHead Commit: Base Commit: Download the full coverage report. Coverage of Added or Modified Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 80.44% Status: PASSED ✅ Details
Coverage of All Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 90.63% Status: PASSED ✅ Details
|
| } | ||
|
|
||
| // Pass 1: Collect enum occurrences for deduplication | ||
| if self.config.deduplicate_entity_types { |
There was a problem hiding this comment.
I wonder if this could have been implemented as a separate pass on the schema resulting from the generator. Currently, as far as I understand, the deduplication logic is interleaved with the generation logic. It looks fine for the enum entities, but as we add more types to it (records), the generation+dedup logic will get complicated.
There was a problem hiding this comment.
I refactored it into it's own method, and in the process realized we weren't applying deduplication to the outputs so fixed that too. Deduplication needs to happen before handling the common types, so it can't be completely outside of the generation. I guess another option would be to mutate the resulting Cedar schema. We would need to do enough book-keeping to make sure that still works with the request generator.
There was a problem hiding this comment.
Yes, I think it would require changing somewhat the request generator. I wonder if the request generator could be written to accept any Cedar schema that is sufficient to properly typecheck a given JSON request -- i.e. not actually require the bookkeeping.
| action call_tool; | ||
|
|
||
| // Pre-existing enum entity type that will collide with dedup placement | ||
| entity status enum ["active", "inactive"]; |
There was a problem hiding this comment.
Just a thought, but I would have seen the ability to reuse enum defined by the user in the generated schema as a nice feature, maybe in another PR.
There was a problem hiding this comment.
Fixed that, it should now reuse the existing type if it matches.
Signed-off-by: Liana Hadarean <hadarean@amazon.com>
Coverage ReportHead Commit: Base Commit: Download the full coverage report. Coverage of Added or Modified Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 82.12% Status: PASSED ✅ Details
Coverage of All Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 90.68% Status: PASSED ✅ Details
|
…assertions. Signed-off-by: Liana Hadarean <hadarean@amazon.com>
victornicolet
left a comment
There was a problem hiding this comment.
I think it needs a cargo fmt reformatting and then it looks good to me.
| .iter() | ||
| .map(|ns| { | ||
| // Safe to unwrap: we checked for None above | ||
| #[expect(clippy::unwrap_used, reason = "None case handled above")] |
There was a problem hiding this comment.
I might try to adjust the is_none check to avoid this unwrap
There was a problem hiding this comment.
Good point, will fix.
| .collect(); | ||
|
|
||
| // Split each string into path segments | ||
| let segment_lists: Vec<Vec<&str>> = name_strings |
There was a problem hiding this comment.
Going through strings here shouldn't be necessary. We should be able to get the already parsed Ids out of the Names
There was a problem hiding this comment.
Good point, using the Name namespaces.
| EntityTypeKind::Enum { choices } => { | ||
| let existing: Vec<&SmolStr> = choices.iter().collect(); | ||
| let candidate: Vec<&SmolStr> = variants.iter().collect(); | ||
| existing == candidate |
There was a problem hiding this comment.
Is everything sorted here? I think an earlier comment mentions EntityTypeFingerprint variants will be sorted, I'm not sure choices.
There was a problem hiding this comment.
They are not sorted, I think the comment said ordered to mean we keep the order. We don't dedup if the same variants are in a different order. I added a test to show this.
There was a problem hiding this comment.
Hmm, that's surprising to me. I wouldn't expect variant order to matter.
Signed-off-by: Liana Hadarean <hadarean@amazon.com>
Coverage ReportHead Commit: Base Commit: Download the full coverage report. Coverage of Added or Modified Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 82.62% Status: PASSED ✅ Details
Coverage of All Lines of Rust CodeRequired coverage: 80.00% Actual coverage: 90.71% Status: PASSED ✅ Details
|
Description of changes
When
deduplicate_entity_typesis enabled, enum entity types with identical names and variant lists are consolidated into a single definition placed at the lowest common ancestor (LCA) namespace.Deduplication is skipped (enums stay local) when:
The request generator resolves deduplicated enums by verifying the source namespace membership, preventing false matches between same-named enums with different variants.
Issue #, if available
Checklist for requesting a review
The change in this PR is (choose one, and delete the other options):
I confirm that this PR (choose one, and delete the other options):