Skip to content

Conversation

@bjoydeep
Copy link

As we have been discussing, here is the initial proposal for the Management Cluster Reconciler.

```
Current: 85% capacity
Threshold: 90% capacity
Action: Wait until threshold is hit, then provision new cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90% capacity, it could refer to "90% of the capacity" or "90% remaining". Maybe the word utilization is better than capacity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Utilization is better. Thanks.

Each management cluster has a finite capacity for hosted control planes, determined by both resource constraints (CPU, memory) and infrastructure limitations (VPC limits, network capacity, etc.). When we approach this capacity limit, we need to create a new management cluster. Creating this management cluster takes time. Therefore we need to be ready in advance - just in time though to avoid wasting of resources.

## SLO Constraints
- **Hosted cluster provisioning SLO:** 10 minutes (hard requirement - cannot be broken)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the definition for this SLO? I am also curious how the 10 minutes comes?

Demand Analysis: Monday spikes average 12-18 clusters
Sizing Decision: Provision larger clusters (25 capacity) vs standard (15 capacity)
Trade-off: Less frequent provisioning vs potential over-provisioning
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current examples focus solely on proactive provisioning (Scale Up) to meet the SLO. To fully address the Secondary Objective (Cost Optimization) of minimizing unused capacity, is it better to add a scenario demonstrating Scale Down / Resource Reclamation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants