Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions gcp/modules/dex/gcs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/**
* Copyright 2026 The Sigstore Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

locals {
workload_iam_member_id = format("principal://iam.googleapis.com/projects/%s/locations/global/workloadIdentityPools/%s.svc.id.goog/subject/ns/%s/sa/%s", var.project_number, var.project_id, var.cluster_namespace, var.cluster_service_account)
}

# Grant the K8s workload identity direct permission to push keys to the bucket
resource "google_storage_bucket_iam_member" "k8s_pusher_access" {
count = var.single_region ? 0 : 1

bucket = var.bucket_name
role = "roles/storage.objectUser"
member = local.workload_iam_member_id
}

data "archive_file" "function_source" {
type = "zip"
source_dir = "${path.module}/src/jwks-merger"
output_path = "${path.module}/jwks-merger.zip"
}

resource "google_storage_bucket_object" "function_zip" {
count = var.single_region ? 0 : 1

name = "source/jwks-merger-${data.archive_file.function_source.output_md5}.zip"
bucket = var.bucket_name
source = data.archive_file.function_source.output_path
}

resource "google_cloudfunctions2_function" "jwks_merger" {
count = var.single_region ? 0 : 1

project = var.project_id

name = "dex-jwks-merger"
location = var.region

build_config {
runtime = "go125"
entry_point = "MergeKeys"
source {
storage_source {
bucket = var.bucket_name
object = google_storage_bucket_object.function_zip[count.index].name
}
}
}

event_trigger {
trigger_region = "us"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this retry automatically?

Can we end up in a stale state on dex key rotation if the cloud function triggered run fails?

I imagine we don't change those keys that often, but I'm not sure I understand the full failure case here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could we have a race condition if both regions updated keys at the same time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this retry automatically?

As-is this doesn't retry automatically but it looks like that's easy to add so I'll do that.

Can we end up in a stale state on dex key rotation if the cloud function triggered run fails?

Yes, but there is an alert set up to prevent that from becoming a problem.

I imagine we don't change those keys that often

We don't change the keys ourselves, Dex does that itself every 24 hours. So the failure case would be something like:

  1. Dex pod pushes a bad update. Cloud Function triggers, and fails.
  2. The Dex cronjob triggers every 1 minute, so depending on the failure it may self-resolve if the Cloud Function then triggers again and succeeds
  3. If not, the last known keys are untouched and will remain valid for a while, even after Dex rotates its key the old one will still be valid for 24 hours.
  4. The stale keys alert will trigger an hour after the bad push, so someone will check the problem and fix it before the keys expire.

Copy link
Copy Markdown
Contributor Author

@cmurphy cmurphy May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could we have a race condition if both regions updated keys at the same time?

Each region will push updates to keys/[region].json, so there is no race condition on push. Both Cloud Functions will trigger the merge update no matter which region updated their keys, so it is probable that both will run at about the same time, but this should be safe because of GCS's strong consistency and because the output is idempotent, so it doesn't matter which region finishes first or last.

Cloud Functions have to be regional because of the compute resources they use, there's no global version. We could just have one in one region to avoid the duplication and unnecessary dual writes, but that kind of defeats the purpose of the redundant infrastructure we're building.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, the last known keys are untouched and will remain valid for a while, even after Dex rotates its key the old one will still be valid for 24 hours.

What's the delay between key rotation and when the instance starts using the new key to sign?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's instantaneous. So what I think you're pointing out is that there could be a gap between when the key starts being used to sign and when it's actually available in the bucket. So I'll need to figure something out to avoid having identity tokens fail to verify for one minute every 24 hours.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah yeah I guess I'm trying to understand what could go wrong?

event_type = "google.cloud.storage.object.v1.finalized"
event_filters {
attribute = "bucket"
value = var.bucket_name
}
}
}
56 changes: 56 additions & 0 deletions gcp/modules/dex/global/gcs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/**
* Copyright 2026 The Sigstore Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

resource "google_storage_bucket" "auth_bucket" {
project = var.project_id

name = "${var.project_id}-dex-jwks-storage"
location = "US"

uniform_bucket_level_access = true
}

Check notice

Code scanning / defsec

Cloud Storage buckets should be encrypted with a customer-managed key. Note

Storage bucket encryption does not use a customer-managed key.
Comment on lines +17 to +24

# Grant the global internet permission to read the merged keys file via the CDN
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the global internet need to read these? Dex has its own .well-known to really access those values. And I think somewhat importantly, how are client supposed to access this info? Clients are designed to use the .well-known to validate dex tokens before they send them off to fulcio.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the global internet need to read these? Dex has its own .well-known to really access those values.

The .well-known configuration provides the jwks_uri to point to the keys endpoint but it doesn't provide the keys themselves. The public keys are public information and the current Dex instance already exposes this publicly (https://oauth2.sigstage.dev/auth/keys).

And I think somewhat importantly, how are client supposed to access this info?

They'll access the .well-known the same way, and the jwks_uri points to the same address as it always has. The URL map configuration makes this completely transparent.

before they send them off to fulcio.

My understanding is that Fulcio also has to validate the tokens too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm trying to understand this:

  1. the client requests a token from oauth2.sigstage.dev
  2. the client gets a token, it then validates the token before sending it to fulcio (this is what sigstore java does atleast).

The issue I'm concerned about is client behavior: can it correctly find the /.well-known/openid-configuration endpoint if it's hidden behind the LB proxy? it should be asking the specific dex instance for this info?

Is our config that both/all dex instances expose the exact same /.well-known/openid-configuration with /.well-known/keys just pointing to the consolidated keys file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .well-known/openid-configuration isn't hidden, it is passed through to the Dex pod by https://github.com/sigstore/terraform-modules/pull/195/changes/BASE..c0f76e2c7dc6926b7b43c069c38d459d8380d6de#diff-8285fd7d43c899c705485c388abca03ab5bd1342b6c42cf2c43ce252f27c1429R241 which still serves that well-known endpoint. It's only if you make a request to /auth/keys that you'll be routed to this bucket. All the Dex instances will be serving their own .well-known configs, and they will happen to be identical because all of the URLs in it will be the same.

Copy link
Copy Markdown
Member

@loosebazooka loosebazooka May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah "hidden" was probably the wrong word. I wasn't fully grasping how a client wanting to verify the token would interact with the system. What interesting to me (now that I've looked at it more) is the token and auth endpoints obtained to use for a web flow:

In the flow (at least how sigstore-java does it).

  1. go ask for /.well-known/openid-configuration and parse out authorization_endpoint and token_endpoints which we need for the web flow.
  2. Use authEndpoint to initiate login and get an auth code
  3. Use tokenEndpoint to trade the auth token for an id token

If these endpoints are mixed up or not routed to the same oauth2 server, I think things will get wonky. What I'm trying to figure out here is if all contact with the global dex endpoint will hit the same dex instance over the course of the auth flow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding this correctly, authEndpoint and tokenEndpoint should be instance specific.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is a valid concern and a flaw in my design. I was only thinking about Fulcio verifying the token and not the token acquisition itself. I'll have to think through this and come up with an alternative.

resource "google_storage_bucket_iam_member" "public_read_access" {
bucket = google_storage_bucket.auth_bucket.name
role = "roles/storage.objectViewer"
member = "allUsers"
}

Check failure

Code scanning / defsec

Ensure that Cloud Storage bucket is not anonymously or publicly accessible. Error

Bucket allows public access.
Comment thread
cmurphy marked this conversation as resolved.
Dismissed

# Grant the default compute service account (which runs the Cloud Function) access
resource "google_storage_bucket_iam_member" "function_bucket_access" {
bucket = google_storage_bucket.auth_bucket.name
role = "roles/storage.objectUser"
member = "serviceAccount:${var.project_number}[email protected]"
}

Check warning

Code scanning / defsec

Roles should not be assigned to default service accounts Warning

Role is assigned to a default service account at project level.
Comment on lines +34 to +38

# Grant the Eventarc Service Agent its required project-level role
resource "google_project_iam_member" "eventarc_service_agent" {
project = var.project_id
role = "roles/eventarc.serviceAgent"
member = "serviceAccount:service-${var.project_number}@gcp-sa-eventarc.iam.gserviceaccount.com"
}

data "google_storage_project_service_account" "gcs_account" {
project = var.project_id
}

# Grant the project's hidden Cloud Storage agent permission to publish to Pub/Sub
resource "google_project_iam_member" "gcs_pubsub_publishing" {
project = var.project_id
role = "roles/pubsub.publisher"
member = "serviceAccount:${data.google_storage_project_service_account.gcs_account.email_address}"
}
Loading