OSAC-463: Enforce VirtualNetwork child referential integrity via DB triggers#688
OSAC-463: Enforce VirtualNetwork child referential integrity via DB triggers#688tchughesiv wants to merge 9 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
@tchughesiv: This pull request references OSAC-463 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: osac-project/coderabbit/.coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
WalkthroughMigration 55 adds two partial indexes and three PL/pgSQL trigger functions enforcing referential integrity between ChangesVirtualNetwork Deletion Referential Integrity
Sequence Diagram(s)sequenceDiagram
participant Client
participant PrivateVirtualNetworksServer
participant virtual_networks DB
participant subnets DB
participant security_groups DB
Client->>PrivateVirtualNetworksServer: DeleteVirtualNetwork(id)
PrivateVirtualNetworksServer->>virtual_networks DB: UPDATE set deletion_timestamp (soft-delete)
virtual_networks DB->>subnets DB: COUNT active WHERE spec.virtual_network = id (FOR SHARE)
virtual_networks DB->>security_groups DB: COUNT active WHERE spec.virtual_network = id (FOR SHARE)
alt active children exist
virtual_networks DB-->>PrivateVirtualNetworksServer: RAISE Z0003 (count + child type)
PrivateVirtualNetworksServer-->>Client: FailedPrecondition error
else no active children
virtual_networks DB-->>PrivateVirtualNetworksServer: OK
PrivateVirtualNetworksServer-->>Client: Deleted
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/servers/private_virtual_networks_server_test.go`:
- Around line 1443-1445: Remove the unnecessary metadata.annotations entry
"osac.io/owner-reference": vn.GetId() from the Subnet and SecurityGroup test
fixtures; locate the fixtures in private_virtual_networks_server_test.go where
Annotations: map[string]string{ "osac.io/owner-reference": vn.GetId(), } is set
and delete that Annotations map (or at least that key), leaving
Spec.VirtualNetwork populated as the sole relationship indicator (the
referential checks use spec.virtual_network). Ensure no other tests rely on that
annotation and update any fixture construction sites that contain the same
pattern at the other noted locations.
In `@internal/servers/private_virtual_networks_server.go`:
- Line 229: Validate the incoming virtualNetworkID (obtained from
request.GetId()) as a UUID before building the CEL filter: call
uuid.Parse(virtualNetworkID) (from github.com/google/uuid) and if it returns an
error, return a gRPC InvalidArgument (e.g., status.Errorf(codes.InvalidArgument,
"invalid virtual network id: %v", err)); only after successful parse construct
the filter variable (filter := fmt.Sprintf("this.spec.virtual_network == %q",
virtualNetworkID)) and proceed with the existing DAO/List logic so malformed IDs
are rejected early.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 6c5094ef-517f-4c89-8940-9278bcb6bfd6
📒 Files selected for processing (2)
internal/servers/private_virtual_networks_server.gointernal/servers/private_virtual_networks_server_test.go
Add docstrings for Delete and checkNoChildReferences, and remove unnecessary owner-reference annotations from deletion guard tests.
|
Addressed CodeRabbit feedback in b100e1f: Pre-merge checks
Inline comments
Focused tests still pass: |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
/lgtm |
|
|
||
| subnetResponse, err := s.subnetDao.List(). | ||
| SetFilter(filter). | ||
| SetLimit(1). |
There was a problem hiding this comment.
If you aren't interested in the items but only on the count you can put 0 here.
There was a problem hiding this comment.
Thanks... we only need GetTotal(), not the returned items. I kept SetLimit(1) because in our DAO SetLimit(0) maps to defaultLimit (100), not “return zero rows” — see also the test “Interprets zero limit as requesting the default number of items”.
SetLimit(1) is the minimal row fetch today (the count(*) query always runs regardless). Same pattern as daoReferenceChecker.hasReference().
If we want count-only behavior repo-wide, I can follow up with a small DAO change (skip the SELECT when only the count is needed) — happy to do that in a separate commit on this branch or a follow-up PR, whichever the team prefers.
There was a problem hiding this comment.
no longer relevant if we keep DB trigger approach
There was a problem hiding this comment.
You are correct; it is a negative value that means "count only":
fulfillment-service/internal/database/dao/generic_dao_test.go
Lines 1159 to 1166 in dc987c9
I opened https://redhat.atlassian.net/browse/OSAC-1541 to improve that.
| return grpcstatus.Errorf(grpccodes.FailedPrecondition, | ||
| "cannot delete VirtualNetwork '%s': %d Subnet(s) still reference it", | ||
| virtualNetworkID, subnetResponse.GetTotal()) | ||
| } |
There was a problem hiding this comment.
What happens if a new subnet referencing the virtual network is created after this check but before the delete is sent to the database?
There was a problem hiding this comment.
Same pattern as PrivateClusterCatalogItemsServer.Delete + daoReferenceChecker.hasReference(): check then delete inside one RPC handler, both DAO calls using the transaction from TxInterceptor.UnaryServer (List reads tx from context).
Under PostgreSQL READ COMMITTED, a concurrent create that commits between the List count and generic.Delete could theoretically slip through... same tradeoff we already accept for catalog-item delete.
OSAC-463 is the application-layer guard for the common case (delete VN while children still exist). Hard guarantees would need DB triggers like migration 52 (subnet ↔ compute instance). We should probably treat that as a separate design decision, not part of this PR unless you want it scoped in.
There was a problem hiding this comment.
looking closer at #646 ... it used DB triggers. i'll add a new commit to this PR implementing similar approach and we can decide if we want to keep it or not.
There was a problem hiding this comment.
adopted the OSAC-879 / migration 52 pattern in migration 54: bidirectional triggers with FOR SHARE on child insert. removed the app-layer checkNoChildReferences... enforcement is DB-only, same as subnet ↔ compute instance.
| return grpcstatus.Errorf(grpccodes.FailedPrecondition, | ||
| "cannot delete VirtualNetwork '%s': %d SecurityGroup(s) still reference it", | ||
| virtualNetworkID, sgResponse.GetTotal()) | ||
| } |
There was a problem hiding this comment.
Same, what happens if a security group referencing this virtual network is created after this check but before the delete?
Replace the application-layer delete guard with bidirectional PostgreSQL triggers (migration 54): block virtual network soft-delete when active subnets or security groups reference it, and validate child inserts with FOR SHARE locking to close TOCTOU races.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@internal/database/migrations/54_add_virtual_network_child_ref_triggers_test.go`:
- Around line 292-314: Add a new test case in the same test block to validate
that creating a security group referencing a non-existent virtual network (not
just soft-deleted) is prevented. Create a test similar to the existing "Prevents
creating a security group referencing a soft-deleted virtual network" test but
without inserting the virtual network at all, and directly attempt to insert a
security group referencing a non-existent virtual network ID. Verify the
operation fails with the expected error code Z0002 and that the error message
contains the referenced virtual network ID. This covers the non-existent
VirtualNetwork path that is currently untested, complementing the existing
soft-deleted scenario validation.
- Around line 73-89: The test loop checking for the indexes
"subnets_by_virtual_network" and "security_groups_by_virtual_network" only
validates that indexes with these names exist, but does not verify their actual
definition or predicate. Strengthen each iteration by querying the indexdef
column from pg_indexes (or an equivalent approach) to also assert that the index
definition is correct and matches what the migration is supposed to create, not
just that an index with the right name exists.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 80859a88-4f17-4a82-89d5-ea38820833e6
📒 Files selected for processing (3)
internal/database/migrations/54_add_virtual_network_child_ref_triggers.up.sqlinternal/database/migrations/54_add_virtual_network_child_ref_triggers_test.gointernal/servers/private_virtual_networks_server_test.go
Migration 54's subnet insert trigger requires the referenced VirtualNetwork to exist. Seed test-vnet in BeforeEach before creating test-subnet.
Assert index expression/predicate via pg_indexes.indexdef and add SecurityGroup non-existent VirtualNetwork insert test.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eranco74, jhernand, tchughesiv The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-vmaas |
|
@tchughesiv: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
New changes are detected. LGTM label has been removed. |
main added 54_create_baremetal_tables; duplicate migration 54 broke CI BeforeSuite. Renumber OSAC-463 triggers to migration 55.
|
New changes are detected. LGTM label has been removed. |
OSAC-463: VirtualNetwork can be deleted while still referenced by Subnet
Jira: https://redhat.atlassian.net/browse/OSAC-463
Story type: Bug
Summary
Enforces VirtualNetwork ↔ Subnet/SecurityGroup referential integrity at the database layer using migration 54, following the same trigger pattern as OSAC-879 (migration 52 for Subnet ↔ ComputeInstance).
virtual_networks: block soft-delete when active Subnets or SecurityGroups reference the VN (FailedPrecondition/ Z0003)subnetsandsecurity_groups: validate the referenced VirtualNetwork exists and is not deleted (InvalidArgument/ Z0002)Soft-deleted children no longer block VN deletion — the delete trigger counts only rows with
deletion_timestamp = 'epoch'.This PR supersedes #586. Please close #586 in favor of this PR.
Changes
internal/database/migrations/54_add_virtual_network_child_ref_triggers.up.sqldata->'spec'->>'virtual_network'internal/database/migrations/54_add_virtual_network_child_ref_triggers_test.gointernal/servers/private_virtual_networks_server_test.go"Deletion validation"suite (server-level delete behavior via generic DAO + triggers)internal/servers/compute_instances_server_test.go,private_compute_instances_server_test.gotest-vnetbeforetest-subnetin BeforeEach (required by insert trigger)No app-layer
checkNoChildReferences()guard —PrivateVirtualNetworksServer.Delete()delegates togeneric.Delete()like other resources under OSAC-879.Public API inherits enforcement via delegation — no changes to
virtual_networks_server.go.Testing
"Deletion validation"suite on private VirtualNetworks serverValidation run locally:
ginkgo run --focus="Compute instances server|Private compute instances server" internal/servers— 67/67 passginkgo run --focus="Deletion validation" internal/servers— passginkgo run internal/database/migrations --focus="54_add_virtual_network_child_ref_triggers"— passAcceptance Criteria
FailedPrecondition)FailedPrecondition)Related
Summary by CodeRabbit