-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[design] Cluster UX long term vision #22123
base: main
Are you sure you want to change the base?
Changes from 1 commit
eb71f7c
ebf29dc
ebad11e
fb322ba
3b80531
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Cluster UX Long Term Vision | ||
|
||
- Associated: [Epic](https://github.com/MaterializeInc/materialize/issues/22120) | ||
|
||
<!-- | ||
The goal of a design document is to thoroughly discover problems and | ||
examine potential solutions before moving into the delivery phase of | ||
a project. In order to be ready to share, a design document must address | ||
the questions in each of the following sections. Any additional content | ||
is at the discretion of the author. | ||
|
||
Note: Feel free to add or remove sections as needed. However, most design | ||
docs should at least keep the suggested sections. | ||
--> | ||
|
||
## The Problem | ||
We need a documented vision for the cluster UX in the long term which covers both | ||
the "end state" goal as well as the short and medium states in order to: | ||
* Make product prioritization decisions around cluster work | ||
* Communicate to customers what to expect around cluster management | ||
* Set expectations for other projects on how they should be interacting with clusters | ||
|
||
Epic: https://github.com/MaterializeInc/materialize/issues/22120 | ||
|
||
## Success Criteria | ||
Primarily, a merged design doc that is reviewed and approved by EPD leadership, | ||
and is socialized to GTM. | ||
|
||
Secondarily, a roadmap for cluster work for the next quarter. | ||
|
||
Qualitatively, positive feedback from EPD leadership and GTM folks that they | ||
have clarity [TODO(chaas) define this more explicitly]. | ||
|
||
## Out of Scope | ||
Designing the actual cluster API changes themselves, or proposing implementation details. | ||
|
||
## Solution Proposal | ||
The objectives we are striving for with the cluster UX: | ||
* Easy to use and manage | ||
* Maximize resource efficiency/minimize unused resource cost | ||
* Enable fault tolerance/use-case isolation | ||
|
||
### Declarative vs Imperative | ||
We should move toward a declarative API for managing clusters, where: | ||
|
||
Declarative is like `CREATE CLUSTER` with managed replicas and \ | ||
Imperative is like `CREATE/DROP CLUSTER REPLICA`. | ||
|
||
This means deprecating manual cluster replica management. \ | ||
We believe this is easier to use and manage. | ||
|
||
The primary work item for this is **graceful rehydration**. At the moment, a change in size causes downtime until the new replicas are hydrated. As such, customers still want the flexibility to create their own replicas for graceful resizing. We can avoid this by leaving a subset of the original replicas around until the new replicas are hydrated. \ | ||
chaas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This requires us to 1) detect when hydration is complete and 2) trigger database object changes based on this event (without/based on an earlier DDL statement). | ||
|
||
Another consideration is internal use-cases, such as unbilled replicas. We may want to keep around an imperitive API for internal (support) use only. | ||
|
||
To be determined: whether replica sets fits into this model, either externally exposed or internal-only. Perhaps they are a way we could recover clusters with heterogeneous replicas while retaining a declarative API. | ||
|
||
### Resource usage | ||
The very long-term goal is clusterless Materialize, where Materialize does automatic workload scheduling for the customer. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @frankmcsherry on this point in particular. We may want to try to clarify long-term (i.e, at least how many years away is it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, I'm ok with it being infinity years away. :D At least, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm personally fine with "infinity"! But @antiguru was excited about th eprospect. I think we should align across Materialize on whether clusterless Materialize is something we want to pursue soon-ish, eventually, or never. That will inform how seriously we need to consider the possibility of its existence in today's designs.
I think @antiguru had something more elaborate in mind, where dataflows would move between clusters as necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with it staying as it currently is, i.e., we use clusters as a user-indicated boundary between resources. One problem that I'd eventually like to see vanish is how do users determine the right cut in their dependency graph such that they can use the least amount of resources while achieving their availability goals. From what I observed, this is a recurring problem which needs some explaining for users to get right. How we get there is a different question. One take could be that there's something that indicates a resource assignment, but I have no strong preference whether this would be part of a component within Materialize or something on top only giving recommendations. The latter seems more practical and potentially less dangerous, at least until we figure out how to write a controller for Materialize (which we currently don't know.) TL;DR, happy to delay this infinitely, but we should be aware of the challenge users face. |
||
|
||
An intermediary solution, which is also far off is autoscaling of clusters, where Materialize automatically resizes clusters based on the observed workload. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this needs to be that far off! We could plausibly do this next year. Whereas I don't think clusterless Materialize is something we do in the next two years. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "auto" part here is the scary part. Just about everyone gets it wrong, and the whole control theory part of whether you should/shouldn't scale is something MZ humans need to understand first, and I think that's still a ways off. |
||
|
||
A more achievable offering in the short-term is automatic shutdown of clusters, where Materialize can spin down a cluster to 0 replicas based on certain criteria, such as a scheduled time or amount of idle time. \ | ||
This would reduce resource waste for development clusters. The triggering mechanism from graceful rehydration is also a requirement here. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏽 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This relates to |
||
|
||
### Data model | ||
We should move toward prescriptive guidance on how users should configure their clusters with respect to databases and schemas, \ | ||
e.g. should clusters typically be scoped to a single schema. | ||
|
||
We should also be more prescriptive about what data should be colocated, \ | ||
e.g. when should the user create a new cluster for their new sources/MVs/indexes versus increase the size of their existing cluster. | ||
|
||
We believe this will make it clearer how to achieve appropriate fault tolerance and maxmimize resource efficiency. | ||
|
||
### Support & testing | ||
Support is able to create create unbilled or partially billed cluster resources for resolving customer issues. This is soon to be possible via unbilled replicas [#20317](https://github.com/MaterializeInc/materialize/issues/20317). | ||
|
||
Engineering is also able to create additional unbilled shadow replicas for testing new features and query plan changes, which do not serve customers' production workflows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not yet fully convinced if shadow replicas are the mechanism we'd like to have to do A/B testing. An alternative are shadow environments where all parts are cloned and we don't risk taking down the environment through a misbehaving shadow replica. All I mean to say is that we might want to leave it outside of this design! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, I can leave this out then if we're not sure yet There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can leave out the entire "Support & testing" section if the content there is too narrow of a view of how we can support customers in the long term There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I think it's good! TBH, I'd bring back the bit about shadow replicas and just add a caveat like "if they can be made safe." But I think it's absolutely right that we want some way to test new releases/candidate changes on real production workloads, if we can find a way to do so without putting those environments at risk. |
||
|
||
### Roadmap | ||
**Now** | ||
* @antiguru to complete `ALTER...SET CLUSTER` [#20841](https://github.com/MaterializeInc/materialize/issues/20841), without graceful rehydration. | ||
chaas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* @antiguru to continue in-flight work on multipurpose clusters [#17413](https://github.com/MaterializeInc/materialize/issues/17413) - TODO(@antiguru): fill in details. | ||
chaas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* @ggnall to do discovery on the prescriptive data model as part of Blue/Green deployments project [#19748](https://github.com/MaterializeInc/materialize/issues/19748) | ||
|
||
**Next** | ||
* Graceful rehydration, to support graceful manual execution of `ALTER...SET CLUSTER` and `ALTER...SET SIZE`. | ||
chaas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Deprecate `CREATE/DROP CLUSTER REPLICA` for users. | ||
|
||
**Later** | ||
* Auto-shutdown of clusters. | ||
* Shadow replicas. | ||
|
||
**Much Later** | ||
* Autoscaling clusters / clusterless. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can promote autoscaling clusters to "later"! |
||
|
||
## Minimal Viable Prototype | ||
|
||
<!-- | ||
Build and share the minimal viable version of your project to validate the | ||
design, value, and user experience. Depending on the project, your prototype | ||
might look like: | ||
|
||
- A Figma wireframe, or fuller prototype | ||
- SQL syntax that isn't actually attached to anything on the backend | ||
- A hacky but working live demo of a solution running on your laptop or in a | ||
staging environment | ||
|
||
The best prototypes will be validated by Materialize team members as well | ||
as prospects and customers. If you want help getting your prototype in front | ||
of external folks, reach out to the Product team in #product. | ||
|
||
This step is crucial for de-risking the design as early as possible and a | ||
prototype is required in most cases. In _some_ cases it can be beneficial to | ||
get eyes on the initial proposal without a prototype. If you think that | ||
there is a good reason for skpiping or delaying the prototype, please | ||
explicitly mention it in this section and provide details on why you you'd | ||
like to skip or delay it. | ||
--> | ||
|
||
## Alternatives | ||
|
||
<!-- | ||
What other solutions were considered, and why weren't they chosen? | ||
|
||
This is your chance to demonstrate that you've fully discovered the problem. | ||
Alternative solutions can come from many places, like: you or your Materialize | ||
team members, our customers, our prospects, academic research, prior art, or | ||
competitive research. One of our company values is to "do the reading" and | ||
to "write things down." This is your opportunity to demonstrate both! | ||
--> | ||
chaas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Open questions | ||
|
||
<!-- | ||
What is left unaddressed by this design document that needs to be | ||
closed out? | ||
|
||
When a design document is authored and shared, there might still be | ||
open questions that need to be explored. Through the design document | ||
process, you are responsible for getting answers to these open | ||
questions. All open questions should be answered by the time a design | ||
document is merged. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong disagree here. There's maybe a false dichotomy at play, as there is a middle ground between "deprecate manual cluster management" and "default to manual cluster management". As long as MZ has downtime on a thing that could have been done manually, it's a real hard sell that we should forbid doing the manual thing (e.g. resizing).
An alternative would be "teach people to type
ALTER CLUSTER REPLICAS
rather thanCREATE CLUSTER REPLICA
andDROP CLUSTER REPLICA
", which is six of one half dozen of another to me. Still mostly imperative (a human types a command, just about the goal state rather than the transition) but with less cognitive overhead. But stops short of "no manual replica management".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing else, it would be helpful to unpack the intended "imperative" vs "declarative" distinction. SQL's command language, for example, is painfully imperative and not at all declarative. But it's hard for me to understand at this point what the distinction is other than removing a user's ability to control the assignment of their money (in the form of replicas) to their work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe declarative vs imperative is the wrong framing. For me, the compelling reason to move away from
CREATE CLUSTER REPLICA
is about not having to immediately teach people about replicas. We've seen repeatedly replicas be a major source of confusion for those new to Materialize. Common questions:It is much easier to explain the new (what we've been calling "declarative") API:
CREATE CLUSTER
provisions such hardware with resources proportional to your desiredSIZE
."CREATE CLUSTER ... REPLICATION FACTOR = 2
."This framing makes clear that
ALTER CLUSTER REPLICAS
would have the same issue as the current API: it requires that users think in terms of individual replicas, rather than a cluster with a replication factor.I think this is a fair take, as resizing a cluster is a "production workflow", and so we could make "Materialize supports graceful reconfiguration during resizing" a requirement for removing the manual cluster replica DDL statements.