Skip to content

[DO NOT MERGE] featureflag: prototype license-aware feature gates#171314

Draft
vishalv1994 wants to merge 7 commits into
cockroachdb:masterfrom
vishalv1994:license-feature-gates-prototype
Draft

[DO NOT MERGE] featureflag: prototype license-aware feature gates#171314
vishalv1994 wants to merge 7 commits into
cockroachdb:masterfrom
vishalv1994:license-feature-gates-prototype

Conversation

@vishalv1994

Copy link
Copy Markdown
Contributor

Summary

Interface-skeleton prototype of a license-aware feature-gate model: extend the
existing featureflag package so a single gate can combine a license-entitlement
check (read directly from the license proto — not denormalized settings) with the
existing optional operator cluster setting, plus experimental/cloud-only stubs.

  • licensepb: add a top-level Feature enum and repeated Feature features = 12
    (field 12 chosen to stay wire-compatible with the managed-service proto's 8-11).
  • featureflag: add FeatureGate with Register + functional options
    (WithSetting/WithName/WithExperimental/WithCloudOnly) and Enabled,
    which evaluates the license gate then the operator setting.
  • server/license: populate featureflag.GetLicenseHook from init() — the
    acyclic dependency direction (server/licensefeatureflag).
  • A compiling example test exercises both gate kinds (license-only and
    license+setting) plus the license-denial path.

This deliberately does NOT migrate the ~9 existing real call sites (changefeed,
backup, etc.); that migration is a follow-up. Early commits add the proto + gate
machinery; the later commits wire the hook and add tests.

Open questions (for review)

  1. Backward-compat default: when a license is present but features is empty
    (old-style license), should gates be permissive or strict? Prototype is
    permissive and flags the decision inline.
  2. Numeric constraints (e.g. replication-factor min/max) don't fit a
    repeated Feature enum — they need typed fields and a non-boolean gate
    method. Deferred.
  3. WithExperimental / WithCloudOnly semantics: what do they evaluate (build
    tag? version gate? license environment? injected setting?). Stubbed.
  4. managed-service proto mirror: the control plane must add features = 12 and
    resolve edition+add-ons into it. Out of scope for this skeleton; required
    counterpart.

Epic: none

Release note: None

@trunk-io

trunk-io Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@vishalv1994 vishalv1994 changed the title featureflag: prototype license-aware feature gates [DO NOT MERGE] featureflag: prototype license-aware feature gates Jun 1, 2026
@blathers-crl

blathers-crl Bot commented Jun 1, 2026

Copy link
Copy Markdown

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

@dt dt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more or less like what I as was picturing. I'd say it's OK to change the existing featureflag call sites even more if we wanted, e.g. to remove the explicit setting registration they're doing and have WithSetting subsume the setting registration too, but that can easily happen later too.

Comment thread pkg/featureflag/feature_gate.go Outdated
// gate does not register a new setting; the caller supplies one of the existing
// feature.*.enabled bools. When the setting is false, Enabled denies the
// feature even if the license permits it.
func WithSetting(s *settings.BoolSetting) Option {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this could take the setting name and register it for them (since Register is an init-time call too) if we wanted to really slim down the boilerplate of defining a feature flag that has the option of operator control.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated with the latest push. WithSetting now takes just the name and handles registration internally.

// features lists the entitlements this license grants its holder. Field
// number 12 is chosen to stay wire-compatible with the managed-service
// proto, which already uses field numbers 8 through 11.
repeated Feature features = 12;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl nit: I think we should switch this to a compact []byte packed a bitflag (using emum IDs for shifts) since 4 bytes or 8 bytes per bool is pretty bloated for something that gets moved around by humans as a string on their terminal or clipboard. wire-compat with manage-service doesn't seem worth a worse UX for our users (of having big, multi-kb license strings) so we could either just not worry about identical wire format or, better yet, managed-service could switch to the superior representation as well if/when we decide we want them to match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! The bitflag approach is definitely more efficient. My main question is whether we should make that change now or defer it.

With only 8 features today, the difference in license string length is roughly 10 characters, so it’s not really a UX concern yet.

If we ship the repeated Feature encoding first and decide to move to a bitmask later, we’ll either need to support decoding both formats or reissue licenses. Since there’s currently no migration cost, I’m leaning toward doing the bitflag encoding now.

Does that align with your thinking?

@dt dt Jun 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 8 features today

Alternative framing : we haven't even launched this yet or incorporated it into every DB team's day to day workflows and we're already at 8 -- and 15% increase in the user-visible/user-handled license strings. It seems likely that we'll want teams to put every feature/option/aspect of a feature that we might want to at some point unbundle/make an add-on/repackage behind its own discrete flag to ensure those decisions can be made externally at lic-gen time, so we should probably expect this list to only grow from here.

currently no migration cost, I’m leaning toward doing the bitflag encoding now.

Agreed: it's much easier to make the switch now before we externally release anything than do it later when it'll become a Migration™. There's ~no reason not to just do it now, it's already a win for the lic size and will only be moreso later.

Comment thread pkg/featureflag/feature_gate.go
// installed. It is populated by an init() in pkg/server/license to avoid an
// import cycle (server/license imports featureflag, not the reverse). When
// nil, the license gate is permissive.
var GetLicenseHook func(st *cluster.Settings) (*licensepb.License, error)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocking concern right now but eventually I think this would be cleaner if we could have featureflag depend on the license setting; perhaps that means the setting is in the wrong place if pkg/server/license is depending on featureflag and we'd want to move the setting itself to a util pkg separate from the the higher level code in pkg/server that is probably what's pulling in featureflag.

but again, not blocking for now -- just something i'd maybe have in a TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the hook is there purely to break the cycle.

Moving the license setting (or at least the GetLicense accessor) to a lower-level package that featureflag can import directly would be cleaner. I'll add a TODO for this. It'll make the code more straightforward once we sort out the package layering.

@dt

dt commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Numeric constraints (e.g. replication-factor min/max) don't fit a
repeated Feature enum — they need typed fields and a non-boolean gate
method. Deferred.

I think those are't feature flags; I see two answers there:

a) can we redefine them as feature flags? e.g. would product be able to define their segmentation as "replication factors > 3 are now defined as 'enhanced replication controls', which is a feature that has to be granted" rather than retaining control over specific numbers? that seems like it could work if there are one or two step-change/categorical values (e.g. >3, >5).

b) If not, I'd say calling code should just call GetLicense and read the specific constant, since this is so specific to that exact case that there isn't really a generalizable, reusable pattern for us to encode in common infra here.

@vishalv1994

Copy link
Copy Markdown
Contributor Author

Numeric constraints (e.g. replication-factor min/max) don't fit a
repeated Feature enum — they need typed fields and a non-boolean gate
method. Deferred.

I think those are't feature flags; I see two answers there:

a) can we redefine them as feature flags? e.g. would product be able to define their segmentation as "replication factors > 3 are now defined as 'enhanced replication controls', which is a feature that has to be granted" rather than retaining control over specific numbers? that seems like it could work if there are one or two step-change/categorical values (e.g. >3, >5).

b) If not, I'd say calling code should just call GetLicense and read the specific constant, since this is so specific to that exact case that there isn't really a generalizable, reusable pattern for us to encode in common infra here.

I agree with this approach. Looking at the edition and add-on features we’ve identified so far, representing them as boolean flags seems like a clean and straightforward solution.

Keeping any capabilities that don’t naturally fit into the FeatureGate model separate also makes sense, especially since I don’t expect there to be many of them.

vishalv1994 and others added 7 commits June 4, 2026 16:18
Add a top-level Feature enum and a repeated Feature features field
(number 12, wire-compatible with the managed-service proto's 8-11) so a
license can carry the set of entitlements it grants. This is the first
building block for license-aware feature gates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a FeatureGate type that combines a license-entitlement check (read
directly from the license proto via a hook) with an optional operator
cluster setting, plus experimental/cloud-only stubs. This is the core of
the license-aware feature-gate prototype; existing call sites are not yet
migrated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Populate featureflag.GetLicenseHook from an init() so feature gates can
read license entitlements. The dependency direction is server/license ->
featureflag, avoiding an import cycle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Demonstrate the feature-gate API shape with a license-only gate
(multi-region) and a license+setting gate (changefeed), exercising
Register, WithSetting, and Enabled without touching real call sites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generated by ./dev generate bazel as a consequence of adding the
featureflag example test target.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…enial

Reword the WithSetting comment to describe current behavior rather than
narrate an alternative, and add a test that overrides GetLicenseHook to
exercise the license-entitlement deny path (a license granting BACKUP but
not CHANGEFEED), demonstrating the core gating mechanism.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address review feedback: WithSetting now takes the setting name (string)
and calls settings.RegisterBoolSetting on the caller's behalf, removing
the need for each call site to separately register a feature.*.enabled
bool. A new Setting() accessor exposes the registered *BoolSetting for
direct reads or test overrides.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@vishalv1994 vishalv1994 force-pushed the license-feature-gates-prototype branch from fedc50a to 6ddd2ba Compare June 4, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants