Skip to content

Commit abfedff

Browse files
davidepasquerodavidepasquero1
authored andcommitted
Merge pull request #13 from davidepasquero/codex/analyze-last-49-commits-and-list-modified-files-j2mb8k
docs: remove architecture image
2 parents 32db5f7 + 8af510e commit abfedff

File tree

1 file changed

+164
-0
lines changed

1 file changed

+164
-0
lines changed
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Refactoring for Scalability and Stability
2+
3+
**Authors:** davidepasquero
4+
**Status:** Draft
5+
**Created At:** 2025-09-15
6+
**Last Updated:** 2025-09-15
7+
8+
---
9+
10+
## Summary
11+
12+
This set of changes introduces a fundamental architectural evolution for the vm-dhcp-controller, transforming it into a more robust, scalable, and resilient solution. The main update shifts the agent management model from one pod per IPPool to a unified Deployment, while also introducing a leader election mechanism to ensure high availability.
13+
The most significant change is the move away from the "one pod per pool" model. Now, a single Deployment manages all the agent pods required for the active networks.
14+
15+
Unifies the DHCP deployment by running a single agent deployment configured through JSON environment variables so it can serve multiple IPPools concurrently, replacing per-pool flags and pods with configuration generated by the controller.【F:chart/templates/agent-deployment.yaml†L1-L58】【F:pkg/controller/ippool/controller.go†L247-L452】
16+
17+
---
18+
19+
## Motivation
20+
21+
Operating one DHCP agent per IPPool required repetitive manifests, manual Multus annotations, and static CLI arguments, making it costly to scale networks and keep leases synchronized. Centralizing configuration in the controller and letting a shared agent read structured JSON removes this duplication while enabling the DHCP allocator to reason about pool-specific leases.【F:pkg/controller/ippool/controller.go†L229-L452】【F:pkg/dhcp/dhcp.go†L18-L409】
22+
23+
---
24+
25+
## Goals / Non-goals
26+
27+
- **Goals**
28+
- Deliver a first-class IPPool CRD with lifecycle conditions and IPv4 configuration to describe DHCP networks authoritatively.【F:pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go†L9-L130】
29+
- Reconcile a single agent deployment that receives per-pool configuration and Multus attachments from the controller.【F:pkg/controller/ippool/controller.go†L247-L452】
30+
- Let the agent bootstrap interfaces, watchers, and DHCP servers directly from `AGENT_NETWORK_CONFIGS` and `IPPOOL_REFS_JSON` without CLI flags.【F:cmd/agent/root.go†L17-L102】【F:pkg/agent/agent.go†L23-L317】
31+
- Store leases keyed by IPPool reference so DHCP handlers can safely multiplex multiple networks.【F:pkg/dhcp/dhcp.go†L57-L224】
32+
- **Non-goals**
33+
- Change IP allocation algorithms or introduce IPv6 handling (the CRD and allocator remain IPv4-only in this revision).【F:pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go†L47-L130】
34+
- Redesign controller metrics, webhook, or webhook deployment beyond what is required to drive the shared agent.【F:chart/templates/deployment.yaml†L1-L132】
35+
36+
---
37+
38+
## Proposal / Design
39+
40+
### Modified Components
41+
42+
- **Helm chart** – Creates a dedicated agent deployment, ensuring `NET_ADMIN` capability, default JSON configuration values, and removal of obsolete CLI arguments or per-pool environment variables.【F:chart/templates/agent-deployment.yaml†L1-L59】
43+
- **Controller deployment** – Exposes the agent deployment/container names through environment variables so runtime logic can patch a single target.【F:chart/templates/deployment.yaml†L35-L58】
44+
- **Values schema** – Adds `agent.*` knobs (image, RBAC, scheduling) to configure the shared agent from Helm values.【F:chart/values.yaml†L13-L43】
45+
- **Agent CLI/runtime** – Loads structured JSON, instantiates per-pool event handlers, configures interfaces, and starts DHCP servers under leader election or dry-run mode.【F:cmd/agent/root.go†L51-L102】【F:pkg/agent/agent.go†L63-L317】【F:cmd/agent/run.go†L22-L103】
46+
- **Controller CLI/runtime** – Discovers its namespace from `POD_NAMESPACE`, exposes cache-dump APIs, and wraps IPPool callbacks with deployment reconciliation logic.【F:cmd/controller/run.go†L26-L110】
47+
- **IPPool watcher** – Adds per-pool controllers with `initialSyncDone` barriers so the agent only advertises DHCP services after caches warm up.【F:pkg/agent/ippool/event.go†L28-L197】【F:pkg/agent/ippool/controller.go†L19-L173】
48+
- **Agent/controller contract** – Defines JSON-bearing options for the agent and records controller namespace in the management context.【F:pkg/config/context.go†L52-L176】
49+
- **IPPool CRD** – Declares status conditions and IPv4 schema for CIDR, server IP, reservation, and DNS/NTP hints.【F:pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go†L9-L130】
50+
- **DHCP allocator** – Reworks lease storage to a nested map keyed by IPPool reference and exposes multi-interface `Run`/`DryRun` flows with graceful cleanup.【F:pkg/dhcp/dhcp.go†L57-L484】
51+
52+
### Data / Control Flow
53+
54+
1. Users author IPPool resources containing IPv4 ranges and metadata.【F:pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go†L34-L130】
55+
2. The controller watches IPPools, updates cache/metrics, then aggregates active pools to patch agent annotations and JSON environment payloads.【F:pkg/controller/ippool/controller.go†L143-L452】
56+
3. Agent pods deserialize the JSON, configure interfaces, wait for per-pool caches to synchronise, and start DHCP services for every network.【F:pkg/agent/agent.go†L117-L317】
57+
4. DHCP handlers respond on each interface using leases stored by IPPool reference, ensuring isolation across networks.【F:pkg/dhcp/dhcp.go†L226-L409】
58+
59+
### Interaction with Existing Entities
60+
61+
- **NetworkAttachmentDefinitions** – Controller still labels and references NADs so the agent joins the right Multus networks before configuration.【F:pkg/controller/ippool/controller.go†L323-L341】
62+
- **IPAM / Cache allocators** – Existing allocation logic persists; reconciliation recomputes usage metrics and reserves server/router/excluded addresses.【F:pkg/controller/ippool/controller.go†L171-L224】【F:pkg/controller/ippool/controller.go†L200-L218】
63+
- **Controller Manager** – Management context wires Harvester, CNI, Core, and KubeVirt factories just as before while storing namespace state for deployment patches.【F:pkg/config/context.go†L74-L177】
64+
- **HTTP servers** – Both binaries expose cache-dump endpoints controlled by flags to aid debugging without altering default service wiring.【F:cmd/agent/run.go†L60-L89】【F:cmd/controller/run.go†L73-L99】
65+
66+
### Distribution & Configuration
67+
68+
- Helm values toggle the agent deployment and tune its scheduling, RBAC, and resources.【F:chart/values.yaml†L13-L43】
69+
- Controller expects `AGENT_DEPLOYMENT_NAME` and `AGENT_CONTAINER_NAME` supplied via chart-managed env vars and falls back to opinionated defaults otherwise.【F:chart/templates/deployment.yaml†L55-L58】【F:pkg/controller/ippool/controller.go†L229-L245】
70+
- Agent consumes Downward API namespace and new JSON env vars, defaulting to empty arrays if not patched yet.【F:chart/templates/agent-deployment.yaml†L21-L58】【F:cmd/agent/root.go†L51-L102】
71+
72+
### Diagrams
73+
74+
- **IPPool CRDs** capture IPv4 ranges, lifecycle conditions, and optional pause semantics that seed the reconcile loop.【F:pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go†L9-L130】
75+
- **Controller reconcile loop** aggregates active pools, labels the referenced NADs, builds the `AGENT_NETWORK_CONFIGS`/`IPPOOL_REFS_JSON` payloads, and patches the shared agent Deployment template exposed by the Helm chart.【F:pkg/controller/ippool/controller.go†L143-L452】【F:chart/templates/deployment.yaml†L35-L58】
76+
- **Helm chart templates** supply the unified agent Deployment, service account, and configurable values that both the controller and runtime consume when rendering pods.【F:chart/templates/agent-deployment.yaml†L1-L59】【F:chart/values.yaml†L13-L43】
77+
- **Agent Deployment and runtime** deserialize the JSON configuration, configure Multus-backed interfaces, and launch per-pool watchers before exposing DHCP services keyed by IPPool reference.【F:pkg/agent/agent.go†L63-L317】【F:pkg/agent/ippool/controller.go†L19-L173】【F:pkg/dhcp/dhcp.go†L57-L409】
78+
- **Leader election** in both binaries ensures that only one controller or agent instance actively reconciles resources at a time, preventing conflicting updates.【F:cmd/agent/run.go†L22-L103】【F:cmd/controller/run.go†L26-L110】
79+
80+
---
81+
82+
## Security Considerations
83+
84+
Agent pods explicitly retain `NET_ADMIN` capability when user-provided security contexts omit it, ensuring interface programming succeeds while avoiding broader privilege escalation. The controller and agent discover namespaces via Downward API, so RBAC bindings remain scoped to their respective service accounts and cluster roles defined in the chart.【F:chart/templates/agent-deployment.yaml†L21-L35】【F:chart/templates/deployment.yaml†L30-L79】
85+
86+
---
87+
88+
## Upgrade & Migration Plan
89+
90+
1. Deploy the updated chart so controller pods begin emitting agent identifiers and the new agent deployment manifests with empty JSON defaults.【F:chart/templates/deployment.yaml†L35-L58】【F:chart/templates/agent-deployment.yaml†L43-L58】
91+
2. Create or update IPPools; reconciliation patches the agent deployment with Multus networks and configuration blobs derived from active pools.【F:pkg/controller/ippool/controller.go†L247-L393】
92+
3. Agent instances read the new env vars and spin up per-pool handlers; if no pools exist they remain idle, allowing gradual migration from legacy pods.【F:pkg/agent/agent.go†L71-L254】
93+
4. Roll back by removing the new agent deployment values and deleting IPPools; the controller will stop patching env vars and existing cleanup logic revokes allocations.【F:pkg/controller/ippool/controller.go†L435-L452】
94+
95+
---
96+
97+
## Alternatives Considered
98+
99+
Maintaining one agent pod per pool would have required keeping the legacy `prepareAgentPod` scaffolding and CLI flags, but that logic is now commented out and superseded by deployment reconciliation, avoiding pod churn for every IPPool change.【F:pkg/controller/ippool/common.go†L14-L146】【F:cmd/agent/root.go†L96-L100】
100+
101+
---
102+
103+
## Drawbacks / Risks
104+
105+
- A single agent now represents a broader blast radius; misconfiguration or crash disrupts all pools simultaneously, so leader election and readiness gating must be reliable.【F:cmd/agent/run.go†L67-L103】【F:pkg/agent/agent.go†L196-L317】
106+
- JSON parsing or invalid NAD references can leave interfaces unconfigured, so controller validation must prevent malformed specs.【F:pkg/controller/ippool/controller.go†L295-L343】
107+
- DHCP allocator still relies on interface-specific UDP sockets without aggregated error propagation, so unexpected failures may only surface via logs.【F:pkg/dhcp/dhcp.go†L326-L409】
108+
109+
---
110+
111+
## Testing Plan
112+
113+
- Unit coverage for controller reconciliation to assert Multus annotations, env var payloads, and legacy argument removal paths.【F:pkg/controller/ippool/controller.go†L375-L444】
114+
- Agent integration tests that mock IPPool handlers, ensuring DHCP services wait for `InitialSyncDone` before serving leases.【F:pkg/agent/ippool/controller.go†L95-L146】【F:pkg/agent/agent.go†L196-L254】
115+
- DHCP allocator tests that inject multiple configurations and confirm per-pool lease separation plus graceful shutdown signals.【F:pkg/dhcp/dhcp.go†L57-L484】
116+
- End-to-end validation verifying NAD labels, IPAM metrics, and DHCP offers against multiple concurrent IPPools.【F:pkg/controller/ippool/controller.go†L151-L224】
117+
118+
---
119+
120+
## Dependencies
121+
122+
- Requires Kubernetes with Multus support because the controller still annotates pods with `k8s.v1.cni.cncf.io/networks` entries derived from IPPool network names.【F:pkg/controller/ippool/controller.go†L318-L393】
123+
- Depends on insomniacslk DHCPv4 libraries for packet handling across interfaces.【F:pkg/dhcp/dhcp.go†L12-L15】
124+
- Uses Harvester, KubeVirt, and NAD generated clients wired through the shared management factories.【F:pkg/config/context.go†L74-L177】
125+
126+
---
127+
128+
## References
129+
130+
- Files touched across the most recent 49 commits, forming the baseline for this proposal.【f13097†L1-L25】
131+
132+
---
133+
134+
## Additional Notes
135+
136+
### Files Modified in the Last 49 Commits
137+
138+
- SCCcredentials
139+
- chart/templates/_helpers.tpl
140+
- chart/templates/agent-clusterrole.yaml
141+
- chart/templates/agent-clusterrolebinding.yaml
142+
- chart/templates/agent-deployment.yaml
143+
- chart/templates/agent-serviceaccount.yaml
144+
- chart/templates/deployment.yaml
145+
- chart/templates/rbac.yaml
146+
- chart/templates/serviceaccount.yaml
147+
- chart/values.yaml
148+
- cmd/agent/root.go
149+
- cmd/agent/run.go
150+
- cmd/controller/root.go
151+
- cmd/controller/run.go
152+
- pkg/agent/agent.go
153+
- pkg/agent/ippool/controller.go
154+
- pkg/agent/ippool/event.go
155+
- pkg/agent/ippool/ippool.go
156+
- pkg/apis/network.harvesterhci.io/v1alpha1/ippool.go
157+
- pkg/config/context.go
158+
- pkg/controller/ippool/common.go
159+
- pkg/controller/ippool/controller.go
160+
- pkg/dhcp/dhcp.go
161+
162+
(Extracted via `git log -n 49 --name-only --pretty=format: | sort -u`.)【f13097†L1-L25】
163+
164+
---

0 commit comments

Comments
 (0)