Skip to content

fix(controller): map NetworkPolicy to its GVR so agent provisioning isn't wedged (v0.10.0 blocker)#630

Closed
bussyjd wants to merge 1 commit into
mainfrom
fix/agent-provisioning-rc16
Closed

fix(controller): map NetworkPolicy to its GVR so agent provisioning isn't wedged (v0.10.0 blocker)#630
bussyjd wants to merge 1 commit into
mainfrom
fix/agent-provisioning-rc16

Conversation

@bussyjd

@bussyjd bussyjd commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

v0.10.0 release blocker — fixes agent-backed selling

The rc16 cut absorbed the #625 agent-isolation NetworkPolicy, but the controller's resourceFor() had no case for NetworkPolicy and fell through to the ConfigMap default. On a real apiserver the apply fails:

apply NetworkPolicy/agent-isolation: NetworkPolicy in version "v1" cannot be handled as a ConfigMap

That error aborts the agent reconcile before the remote-signer is provisioned, so a CRD-declared agent (obol agent new / obol sell agent) never gets a wallet, its offer never reaches Ready, and the 402 probe returns storefront HTML. The default stack agent uses a different (netpol-free) path — which is exactly why flow-04 (default) was green but flow-16 (new agent) was red.

Fix

  • monetizeapi.NetworkPolicyGVR + a resourceFor case for NetworkPolicy.
  • Register NetworkPolicyGVR in the provisioning test harness — its absence is why a fake client tolerated the wrong GVR and the unit tests stayed green through the regression.
  • TestResourceFor_NetworkPolicyUsesNetworkPolicyGVR asserts the object lands under the NetworkPolicy GVR, not ConfigMap.

Validation (live, on rc16 + this fix)

  • Reproduced the failure live (controller logged the conversion error every ~3s; signer Deployment never created).
  • After the fix: agent reaches Ready, remote-signer 1/1, NetworkPolicy applied, wallet set.
  • flow-16-sell-agent: 15 PASS / 0 FAIL.
  • Full release-smoke (wopus via Ollama /v1): 11 PASS / 2 non-gating SKIP / 0 FAIL → "Release smoke passed".
  • Full unit suite: 35 packages green.

The #625 security hardening is intact, not relaxed — both Hermes and the signer are Ready with the NetworkPolicy in place (the netpol does not block kubelet health probes).

Do not publish v0.10.0 without this.

…sn't wedged

The agent-isolation NetworkPolicy was added to the agent manifest set, but
resourceFor() had no case for kind=NetworkPolicy and fell through to the
ConfigMap default. On a real apiserver that apply fails with 'NetworkPolicy in
version v1 cannot be handled as a ConfigMap', which errors the whole agent
reconcile BEFORE the remote-signer is provisioned — so a CRD-declared agent
(obol agent new / obol sell agent) never gets a wallet, its offer never
reaches Ready, and the 402 probe returns the storefront HTML. The default
stack agent uses a different path with no netpol, which is why it was
unaffected (flow-04 green, flow-16 red). Reproduced live on rc16.

- Add monetizeapi.NetworkPolicyGVR and a resourceFor case for it.
- Register NetworkPolicyGVR in the provisioning test harness (its absence is
  why a fake client tolerated the wrong GVR and unit tests stayed green).
- TestResourceFor_NetworkPolicyUsesNetworkPolicyGVR asserts the object lands
  under the NetworkPolicy GVR, not ConfigMap.

Verified live: agent Ready, remote-signer 1/1, netpol applied, wallet set;
flow-16-sell-agent 15/0/0 and full release-smoke green. Security hardening
intact (both hermes and signer are Ready WITH the netpol in place).
@OisinKyne OisinKyne force-pushed the fix/agent-provisioning-rc16 branch from a574932 to 66c2257 Compare June 12, 2026 10:37
@OisinKyne

Copy link
Copy Markdown
Contributor

whats the difference between this pr and this one? #628 just a test? the case block looks unchanged. did you run the smoke on the latest rc17?

@bussyjd

bussyjd commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #628 (merged to main as d16d167), which landed the same resourceFor NetworkPolicy→GVR fix plus the fake-client harness entry. Rebased on top of #628, this PR now carries a duplicate case "NetworkPolicy" arm (two identical arms → duplicate-case compile error), so it can't merge as-is. The only net-new content was the regression guard test TestResourceFor_NetworkPolicyUsesNetworkPolicyGVR, now in #631 off main. Closing as superseded.

@bussyjd bussyjd closed this Jun 12, 2026
OisinKyne pushed a commit that referenced this pull request Jun 12, 2026
…egression)

main carries the fix (#628 / d16d167: NetworkPolicyGVR + the resourceFor
case + the fake-client harness entry) but no test pinning it. Add
TestResourceFor_NetworkPolicyUsesNetworkPolicyGVR, salvaged from the
superseded #630, so a future refactor can't silently route the
agent-isolation NetworkPolicy back to the ConfigMap default — a fake
client tolerates the wrong GVR, which is how the rc16 regression hid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants